Scientific Methodology, Regulatory and ethical data usage - WMM9MO61

Number of hours
- Lectures 36.0
- Projects -
- Tutorials -
- Internship -
- Laboratory works -
- Written tests -
ECTS
ECTS 6.0

Goal(s)

The course aims to provide the fundamental basis for a sound scientific methodology of experimental evaluation in computer science. This lecture emphasizes on methodological aspects of measurement and on the statistics needed to analyze computer systems, human-computer interaction systems, and machine learning systems. We first sensibilize the audience to reproducibility issues related to empirical research in computer science as well as to ethical and scientific integrity aspects. Then we present tools that help address the aforementioned issues and we give the audience the basis of probabilities and statistics required to develop sound experiment designs. The content of the lecture is therefore both theoretical and practical, illustrated by a lot of case studies and practical sessions. The goal is not to provide analysis recipes or techniques that researchers can blindly apply but to make students develop critical thinking and understand some simple (and possibly not-so-simple) tools so that they can both readily use and explore later on.

Responsible(s)

Jean-Marc VINCENT

Content(s)

Here are the topics that will be covered during the lecture. The exact order and division is still under discussion.

Épistemology, publications, éthique/intégrité/déontologie
Computer Science is an Experimental Science: Randomness is unavoidable whenever human beings are involved but can also not be ignored anymore given the complexity of modern computer systems (network, cpus, hardware/software stack) or when working in a machine learning context which relies on observational data and remains empirical.
Science is defined by its method, not by its results: Claude Bernard, Karl Popper, Kuhn, Latake, ...
Credibility crisis, Ethics, scientific integrity, deontology
Open Science and Reproducible Research
Laboratory notebook
Version control and archiving
Data management
Computational document (jupyter, Rstudio, orgmode)
Software environment control (containers, package management systems)
Ethical and legal data usage (data management plan, consent form, ...)
Exploratory Data Analysis
Data curation (missing data, outliers, typing issues)
Data visualisation and hypothesis checking
Data processing pipelines
Communicating results
Introduction to statistics
Random variables, central limit theorem, confidence interval, statistical test
Bayesian framework: Bayes rules, Maximum likelihood vs. Posterior sampling, Credible interval, Hierarchical modeling principles (exemple with clustering)
ANOVA, Linear regression and extensions (mostly logistic)
Gaussian Process
Observation vs. Experiment
Correlation, Causation: mostly "dont's"
Notions of bias (statistical, experimental, observationnal/sampling, etc.)
Metrology: measurement and tracing, precision, practical computer science issues and tools
Counter-factual/causal analysis
Experimental Design
Méthodology (fishbone, experiment structure)
Difference between quantitative/qualitative observational/experimental data/analysis
Sequential vs. incremental approach
2-level factorial designs, screening designs, LHS/MaxiMin designs
Active/online learning with bandits (\epsilon-Greedy, UCB, Thompson) and extensions (surrogates: GP-UCB, EI)

Prerequisites

The lecture is self-content and targets 2nd year master students in computer science. We will mostly use the R language during the lecture but most programs will be a few lines of script and we will provide references to learn the basics.

Test

Several homeworks and practical evaluations counting for 50%,
and a 3 hours final written exam counting for 50%

The exam is given in english only

Calendar

The course exists in the following branches:

Curriculum - Master in Computer Science - Semester 9 (this course is given in english only )

see the course schedule for 2022-2023

Additional Information

Course ID : WMM9MO61
Course language(s):

You can find this course among all other courses.

Update - 21/09/2022

Scientific Methodology, Regulatory and ethical data usage - WMM9MO61

Number of hours

ECTS

Goal(s)

Content(s)