Multimodal speech synthesis - 5MMPLTVS

Number of hours
- Lectures 18.0
ECTS
ECTS 1.75

Goal(s)

This course gives an introduction to speech technologies (speech coding, synthesis and recognition) that process audible (acoustic signal) and visible (lip movements, etc.) consequences of underlying articulatory movements (produced by the jaw, the tongue, the larynx, the velum, etc.). We first introduce the basic knowledge in physiology, phonetics, phonology and linguistics necessary to understand the mechanisms underlying speech production, perception and comprehension. Then fundamentals in signal processing, representation and modeling are presented. We go on with a review of current systems that enable spoken interaction embodied by anthropoids or virtual conversational agents.

Contact Pascal PERRIER

Content(s)

• Multimodal speech production and perception
• Phonological structures of world’s languages. Example of French
• Phonetic representations and speech processing
• Text-to-speech systems and facial animation
• Audiovisual speech recognition
• Systems for situated verbal interaction

Prerequisites

None

Test

Written exam

N1=E1
N2=E2

Additional Information

Course list

Curriculum->MMIS.->Semester 5

Team->Image Vision Interaction Multimedia Bioinfo

Bibliography

Dutoit, T. (1997) An introduction to text-to-speech synthesis. Dordrecht/ Boston/ London: Kluwer Academic.
Parke, F.I. and K. Waters (1996) Computer Facial Animation.Wellesley, MA, USA: A.K. Peters
O'Shaughnessy, D. (2nd edition, 2000) Speech Communication - Human and Machine.New York: IEEE Press

Update - 01/01/1970

Version française

Voir la version française de cette page

Multimodal speech synthesis - 5MMPLTVS

Number of hours

ECTS

Goal(s)

Content(s)

Version française