Ensimag Rubrique Formation 2022

Distributed Systems for Data Management - 5MMSDTD7

  • Number of hours

    • Lectures -
    • Projects 27.0
    • Tutorials -
    • Internship -
    • Laboratory works -
    • Written tests -

    ECTS

    ECTS 2.0

Goal(s)

The goal of this project is to design and automatically deploy a distributed data processing application. The application will be based on the main frameworks used in the Big Data community. The application will be automatically deployed in a public Cloud infrastructure.

The students will work in teams of 5 students.

Responsible(s)

Thomas ROPARS

Content(s)

The students will build a distributed data processing system. These systems are very often used today in different domains (analysis of the stock market, analysis of sensors data, analysis of data coming from tracking systems, etc.). The students will be free to pick the domain targeted by their application.

A data processing system includes several components, each of them being distributed over several machines:

  • A data ingestion component
  • A data storage component
  • One or several data processing components
  • A visualization component

For this project, the students will use the standard technologies that are used by the main companies in the domain (Google, Facebook, LinkedIn, etc.). For example, the students could use:

  • Kafka or Samza for data ingestion
  • Spark or Flink for data processing
  • Cassandra, MongoDB or InfluxDB for storing data

Furthermore, the students will have to set up the software infrastructure that will allow to configure, deploy and automatically reconfigure their application to be able to execute it on a Cloud computing platform (Ex: AWS, Azure, etc.). The tools used for this stage could include:

  • Resource provisioning and configuration tools (ex: Ansible)
  • Software configuration and deployment tools (ex: Docker)
  • Orchestration tools (ex: Kubernetes)

Prerequisites

Networks, distributed systems, databases.

Test

Demo of the running application. Report and documentation.

    • MCC en présentiel et distanciel **
      N1=P
      pas de rattrapage

Calendar

The course exists in the following branches:

see the course schedule for 2023-2024

Additional Information

Course ID : 5MMSDTD7
Course language(s): FR

The course is attached to the following structures:

You can find this course among all other courses.