Ensimag Rubrique Formation 2022

Multimodal speech synthesis - 5MMPLTVS

  • Number of hours

    • Lectures 18.0

    ECTS

    ECTS 1.75

Goal(s)

This course gives an introduction to speech technologies (speech coding, synthesis and recognition) that process audible (acoustic signal) and visible (lip movements, etc.) consequences of underlying articulatory movements (produced by the jaw, the tongue, the larynx, the velum, etc.). We first introduce the basic knowledge in physiology, phonetics, phonology and linguistics necessary to understand the mechanisms underlying speech production, perception and comprehension. Then fundamentals in signal processing, representation and modeling are presented. We go on with a review of current systems that enable spoken interaction embodied by anthropoids or virtual conversational agents.

Contact Pascal PERRIER

Content(s)

• Multimodal speech production and perception
• Phonological structures of world’s languages. Example of French
• Phonetic representations and speech processing
• Text-to-speech systems and facial animation
• Audiovisual speech recognition
• Systems for situated verbal interaction



Prerequisites

None

Test

Written exam



N1=E1
N2=E2

Additional Information

Curriculum->MMIS.->Semester 5

Bibliography

Dutoit, T. (1997) An introduction to text-to-speech synthesis. Dordrecht/ Boston/ London: Kluwer Academic.
Parke, F.I. and K. Waters (1996) Computer Facial Animation.Wellesley, MA, USA: A.K. Peters
O'Shaughnessy, D. (2nd edition, 2000) Speech Communication - Human and Machine.New York: IEEE Press