Ensimag Rubrique Formation 2022

Data mining and multivariate statistical analysis - 4MMFDAS6

  • Number of hours

    • Lectures 13.0
    • Tutorials 4.5
    • Laboratory works 15.5

    ECTS

    ECTS 3.0

Goal(s)

The aim of this course is to present the statistical approaches for analysing multivariate data. The information age has resulted in masses of multivariate data in many different field: finance, marketing, economy, biology, environmental sciences,...The theoretical and practical aspects of multivariate data analysis are given equal importance. This balance is achieved through practicals involving actual data analysis using the R software.

Contact Jean-Baptiste DURAND

Content(s)

1. Multiple linear regression. Least squares, Gaussian linear model, test of linear hypotheses
2 One-way and two-way analysis of variance.
3. Principal Components Analysis (PCA).
4. Classification, supervised classification, linear discriminant analysis, unsupervised classification, K-means.
5. Document and pattern mining.



Prerequisites

Applied Probability 2 (1st year), Statistical Principles and Methods (Semester 2)

Test

Practical exam with R (2 h) and 3 reports on supervised practicals.



N1=1/2E1+1/2P
N2=E2

Additional Information

Curriculum->Information Systems Engineering->Semester 4
Curriculum->Financial Engineering->Semester 4
Curriculum->Math. Modelling, Image & Simulation->Semester 4

Bibliography

CM BISHOP (2006) Pattern recognition and machine Learning. Springer
http://research.microsoft.com/en-us/um/people/cmbishop/prml/

C. CHATFIELD and AJ COLLINS (1980) Introduction to multivariate analysis. Science paperbacks

T HASTIE, R TIBSHIRANI, and J FRIEDMAN (2009). The Elements of Statistical Learning, 2d ed, Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/

G. SAPORTA : Probabilités, statistique et analyse des données, Technip, 2006.