- CM 18.0
Crédits ECTSCrédits ECTS 1.75
The era of Big Data, in which petabytes of information are accumulating at an accelerating rate calls to the massive use of techniques to manage (store, index, shard, duplicate), query, and analyse them. Processing billions of web pages, photos, log entries calls for the development of new tools and the proposal of new programming paradigms.
Cloud computing is emerging as a relatively new approach for dealing and facilitating unlimited access to computing and storage resources for building applications. The very basic principle of cloud computing is to assume that applications, accessible through a network, are built upon a service oriented infrastructure dedicated to provide them with the necessary (not more not less) computing, storage and network resources. Instead of having one for all computer or server, the computing context is configured according to the characteristics of the application. Instead of buying one computer/server resources are provided (and bought) on demand.
This course will focus on data integration and management on cloud service oriented architectures. Therefore, the course will briefly introduce cloud computing fundamental concepts and will address data and services management on Cloud using practical examples based on cloud existing environments and execution models like (i) ETL and federation tools; (ii) Mapreduce and its implementation Hadoop, the most prominent open-source ecosystem of tools for working with exciting new large-scale datasets and, (iii) “No SQL databases”.
- Understand the emerging area of "cloud computing" and how it relates to traditional models of computing.
- To gain competence in
o ETL approaches for collecting large data collections
o MapReduce as a programming model for distributed processing of datasets
o NoSQL databases systems for the management and querying of data collections according to applications requirements
- To have an experience in building, accessing, and using cloud services by adopting a service oriented approach.
Contact Christine COLLET
The course is built on the idea of getting students to understand the various aspects of data and services management throug a problem-solution approach. Thus, we wiil propose problems that might be encountered in the development and deployment of data-centric applications (with large collections of data and services) within a Cloud and we will guide students for proposing and programming solutions using specific tools.
- I. Data, Services and Cloud Computing
a. Introduction on massive data, services and Cloud
b. Get a first touch in developing and deploying a service for extracting and loading big collections of data in a SQL Server on the Cloud
- II. Data Modelling and processing
a. Aggregate-oriented data models
b. Map-reduce programming model and the Hadoop framework
- III. Data distribution
a. Replication and sharding
b. Consistency issues
- IV. Choosing the right data store
a. Polyglot Persistence
b. NoSQL databases observation: main actors, application types
- V.Data service oriented computing
a. Building data services for cloud
b. Observing and managing data services: integration and coordination
- VI.More on Cloud
a. Wrap up and open issues
b. Academic and industrial perspectives
Courses "Principes des SGBD", "Clefs pour l'administration des SGBD relationnels et Objet" and "Distributed Databases"
Examen ou exposé et/ou travail pratique (projet personnel)
Bibliographie / textbooks :
[ 1 ] http://deoracle.org/online-pedagogy/teaching-strategies/applying-cloud-computing.html
[ 2 ] http://blogs.msdn.com/b/brunoterkaly/archive/2010/10/05/how-to-teach-cloud-computing-the-windows-azure-platform-step-1.aspx
[ 3 ] http://sites.google.com/site/freeonlineteachingtools/cloud
[ 4 ] http://aws.amazon.com/education/
[ 5 ] http://www.umiacs.umd.edu/~jimmylin/cloud-2008-Fall/index.html
[ 6 ] www.windowsazure.com/fr
[ 7 ] NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence by Pramod J. Sadalage, Martin Fowler
[ 8 ] Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, by: Eric Redmond and Jim R. Wilson
[ 9 ] Thanks to J. Ullmann, P. Valduriez, Cl. Roncancio, Ch. Bobineau, JL. Zechinelli, R. Lozano for the slides provided
Divers articles et notes de cours