# Data mining and multivariate statistical analysis - 4MMFDAS6

• #### Number of hours

• Lectures : 13.0
• Tutorials : 4.5
• Laboratory works : 15.5
ECTS : 3.0

### Goals

The aim of this course is to present the statistical approaches for analysing multivariate data. The information age has resulted in masses of multivariate data in many different field: finance, marketing, economy, biology, environmental sciences,...The theoretical and practical aspects of multivariate data analysis are given equal importance. This balance is achieved through practicals involving actual data analysis using the R software.

Contact Jean-Baptiste DURAND

Content

1. Multiple linear regression. Least squares, Gaussian linear model, test of linear hypotheses
2 One-way and two-way analysis of variance.
3. Principal Components Analysis (PCA).
4. Classification, supervised classification, linear discriminant analysis, unsupervised classification, K-means.
5. Document and pattern mining.

Prerequisites

Applied Probability 2 (1st year), Statistical Principles and Methods (Semester 2)

Tests

Practical exam with R (2 h) and 3 reports on supervised practicals.

N1=1/2E1+1/2P
N2=E2

Curriculum->Information Systems Engineering->Semester 4
Curriculum->Financial Engineering->Semester 4
Curriculum->Math. Modelling, Image & Simulation->Semester 4

Bibliography

CM BISHOP (2006) Pattern recognition and machine Learning. Springer
http://research.microsoft.com/en-us/um/people/cmbishop/prml/

C. CHATFIELD and AJ COLLINS (1980) Introduction to multivariate analysis. Science paperbacks

T HASTIE, R TIBSHIRANI, and J FRIEDMAN (2009). The Elements of Statistical Learning, 2d ed, Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/

G. SAPORTA : Probabilités, statistique et analyse des données, Technip, 2006.

Date of update January 15, 2017

