COmpression et REprésentation des Signaux Audiovisuels>

FR EN

Keynote conferences

Marie Tahon is currently a full professor at Le Mans University and conducts her research at the Laboratoire d’Informatique de l’Université du Mans (LIUM). She received her engineering degree from École Centrale de Lyon as well as a Master’s degree in acoustics from École Centrale de Lyon and INSA Lyon in 2007. She obtained her Ph.D. in Computer Science from Paris-Sud University in 2012.

She worked at LIMSI–CNRS (Orsay) on automatic emotion recognition in speech during both her Ph.D. and postdoctoral research. She also held a teaching and research position (ATER) at LMSSC / CNAM (Paris) in acoustics, and later conducted postdoctoral research at IRISA (Lannion) with the “Expression” research team.

Her research focuses on expressive speech processing, particularly in the areas of speech synthesis, emotion recognition, and speaker identification. She has also conducted research in musical acoustics, including the automatic analysis of traditional singing and studies in instrumental acoustics and organology.

Title: Automatic processing of spontaneous speech: Applications to media data and telephone conversations.

Abstract: Spontaneous speech is what we use to communicate in everyday life. Both the linguistic content and the manner of expression (prosody) are modulated according to the context of the interaction. These modulations occur at very different levels of speech. First, floor taking must occur at moments that are relevant to the conversation; one may also interrupt the other person's speech. Second, speech production can be affected by an emotional state and lead to disfluencies, phonetic, or prosodic variations. One important issue is to characterize such spontaneous speech with data-driven automatic processing models.

This presentation will address the automatic processing of spontaneous speech in two application cases: the processing of media speech, and in particular interruptions; and the estimation of the degree of frustration in telephone calls from call centers. These two application frameworks provide a context for presenting a methodology for collecting and annotating subjective data in order to train neural networks. In the case of media speech, these models will be used to identify when an interruption occurs in a speech signal. In the case of telephone speech, models optimized to predict the degree of satisfaction will be discussed.

Ghassan AlRegib is currently the John and Marilu McCarty Chair Professor at the Georgia Institute of Technology. In the Omni Lab for Intelligent Visual Engineering and Science (OLIVES), he and his group work on robust and interpretable machine learning algorithms, uncertainty and trust, multi-modal learning, and human in the loop algorithms. The group has demonstrated their work on a wide range of applications such as Autonomous Systems, Medical Imaging, and Subsurface Imaging. The group is interested in advancing the fundamentals as well as the deployment of such systems in real-world scenarios. Prof. AlRegib has been issued several U.S. patents and invention disclosures. He is an IEEE Fellow. He received several best paper awards and recognitions; the latest is the College of Engineering Outstanding Teacher (Midcareer and Senior) Award in Spring 02025. He is the 2026 Distinguished Lecturer in the IEEE Signal Processing Society. Prof. AlRegib served on the editorial board of several transactions and served as the TPC Chair for ICIP 2020, ICIP 2024, and GlobalSIP 2014. He was area editor for the IEEE Signal Processing Magazine.

Privacy | Accessibility