Challenges & Workshops

MuSe 2021

The Multimodal Sentiment Analysis Challenge ( MuSe 2021) focuses on multimodal sentiment recognition of user-generated content and in stress-induced situations. The competition is a satellite event of the 29th ACM International Conference on Multimedia (Chengdu, China), aimed to compare multimedia processing and deep learning methods for automatic audiovisual, biological, and textual based sentiment and emotion sensing, under a common experimental condition set.

 

The goal of the MuSe is to provide a common benchmark test set for multimodal information processing and to bring together the Affective Computing, Sentiment Analysis, and Health Informatics research communities, to compare the merits of multimodal fusion for a large amount of modalities under well-defined conditions.

 

MuSe 2021 featured four sub-challenges:

  • Based on last years' MuSe-CaR dataset, extended by a novel gold standard fusion method:
    • Multimodal Continuous Emotions in-the-Wild Sub-challenge (MuSe-Wilder): Predicting the level of emotional dimensions (arousal, valence) in a time-continuous manner from audio-visual recordings.
    • Multimodal Sentiment Sub-challenge (MuSe-Sent): Predicting advanced intensity classes of emotions based on valence and arousal for segments of audio-visual recordings.
  • Based on the novel audio-visual-text Ulm-TSST dataset, covering people in stressed dispositions:
    • Multimodal Emotional Stress Sub-challenge (MuSe-Stress): Predicting the level of emotional arousal and valence in a time-continuous manner from audio-visual recordings.
    • Multimodal Physiological-Arousal Sub-challenge (MuSe-Physio): Predicting the level of psycho-physiological arousal from a) human annotations fused with b) galvanic skin response (also known as Electrodermal Activity (EDA)) signals from the stressed people as regression task. Audio-visual recordings as well as other biological signals (heart rate and respiration) are offered for modelling.

INTERSPEECH 2021 Computational Paralinguistics Challenge (ComParE)

The INTERSPEECH 2021 Computational Paralinguistics ChallengE (ComParE) is an open Challenge dealing with states and traits of speakers as manifested in their speech signal’s properties. There have so far been twelve consecutive Challenges at INTERSPEECH since 2009 (cf. the repository), but there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena. Thus, we introduce four new tasks by the COVID-19 Cough Sub-Challenge, the COVID-19 Speech Sub-Challenge, the Escalation Sub-Challenge, and the Primates Sub-Challenge (see also for the data descriptions). For the tasks, the data are provided by the organisers. The COVID-19 related Sub-Challenges will only be open to academic research participation. The Escalation and Primates Sub-Challenges are generally open for participation.

 

The INTERSPEECH 2021 Computational Paralinguistics Challenge (ComParE) shall help bridging the gap between excellent research on paralinguistic information in spoken language and low compatibility of results. The results of the Challenge were presented at Interspeech 2021 online. Prizes were awarded to the Sub-Challenge winners. 

MuSe 2020

The  Multimodal Sentiment Analysis Challenge and Workshop (MuSe 2020) focusing on the tasks of sentiment recognition, as well as topic engagement and trustworthiness detection is a satellite event of  ACM MM 2020, (Seattle, US, October 2020), and the first competition aimed at comparison of multimedia processing and deep learning methods for automatic, integrated audiovisual, and textual based sentiment and emotion sensing, under a common experimental condition set.

 

The goal of the Challenge is to provide a common benchmark for multimodal information processing and to bring together the Affective Computing, and Sentiment Analysis communities, to compare the merits of multimodal fusion for the three core modalities under well-defined conditions. Another motivation is the need to advance sentiment and emotion recognition systems to be able to deal with fully, previously unexplored naturalistic behaviour in large volumes of in-the-wild data, as this is exactly the type of data that both multimedia and human-machine/ human-robot communication interfaces have to face in the real world.

 

We are calling for teams to participate in three Sub-Challenges:

 

  • Multimodal Sentiment in-the-Wild Sub-challenge (MuSe-Wild): Predicting the level of emotional dimensions (arousal, valence) in a time-continuous manner from audio-visual recordings.
  • Multimodal Emotion-Target Sub-challenge (MuSe-Topic): Predicting 10-class domain-specific topics as the target of 3-class (low, medium, high) emotions of valence and arousal.
  • Multimodal Trustworthiness Sub-challenge (MuSe-Trust): Predicting the level of trustworthiness of user-generated audio-visual content in a sequential manner utilising a diverse range of features and (optional) emotional (arousal and valence) predictions.

 

INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE)

 

Elderly Emotion, Breathing & Masks

 

The INTERSPEECH 2020 Computational Paralinguistics ChallengE (ComParE) is an open challenge dealing with states and traits of speakers as manifested in their speech signal’s properties. There have so far been eleven consecutive challenges at INTERSPEECH since 2009, but there still exists a multiplicity of not yet covered, but highly relevant, paralinguistic phenomena. Thus, we introduce three new tasks by the Elderly Emotion Sub-Challenge, the Breathing Sub-Challenge, and the Mask Sub-Challenge (see also for the data descriptions). For the tasks, the data are provided by the organisers. 

 

This year’s challenge will help bridge the gap between excellent research on paralinguistic information in spoken language and low compatibility of results. The results of the challenge will be presented at  INTERSPEECH 2020 in Shanghai, China. Prizes will be awarded to the Sub-Challenge winners. 

 

ICMI 2018 Eating Analysis & Tracking Challenge

© University of Augsburg

 

The ICMI 2018 Eating Analysis & Tracking Challenge is an open research competition dealing with Machine Learning for audio/visual tracking of human subjects’ recorded while eating different types of food during speaking.

 

The challenge features three sub-challenges:

  • Food-type Sub-Challenge, Recognition of one of 6 food types (or no food)
  • Food-likability Sub-Challenge, Recognition of the subjects’ food likability rating
  • Chew and Speak Sub-Challenge, Recognition of the subjects’ eating difficulty

 

Links

 

The Audio/Visual Emotion Challenge and Workshop (AVEC 2018)

The Audio/Visual Emotion Challenge and Workshop (AVEC 2018) “Bipolar Disorder and Cross-cultural Affect” is a satellite event of ACM MM 2018 and the eighth competition aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual, and audio-visual health and emotion sensing, with all participants competing under strictly the same conditions.

 

The goal of the challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, visual and audio-visual affect recognition communities, to compare the relative merits of the approaches to automatic health and emotion analysis under well-defined conditions. Another motivation is the need to advance health and emotion recognition systems to be able to deal with fully naturalistic behaviour in large volumes of un-segmented, non-prototypical and non-preselected data, as this is exactly the type of data that both multimedia and human-machine/human-robot communication interfaces have to face in the real world.

 

We are calling for teams to participate in three sub-challenges:

  • Bipolar Disorder Sub-Challenge, Patients suffering from bipolar disorder – as defined by the DSM-5 – need to be classified into remission, hypo-mania, and mania, from audio-visual recordings of structured interviews (BD corpus); performance is measured by the unweighted average recall (UAR) over the three classes.
  • Cross-cultural Emotion Sub-Challenge, Dimensions of emotion need to be predicted time-continuously in a cross-cultural setup (German => Hungarian) from audio-visual data of dyadic interactions (SEWA corpus); performance is the concordance correlation coefficient (CCC) averaged over the dimensions.
  • Gold-standard Emotion Sub-Challenge, Individual annotations of dimensional emotions need to be processed to create a single time series of emotion labels termed as “gold-standard”. Performance (CCC) is measured by a baseline system trained and evaluated from multimodal data with the generated gold-standard (RECOLA corpus).

 

Links

Interspeech ComParE 2018 - COMPUTATIONAL PARALINGUISTICS CHALLENGE

The Interspeech 2018 Computational Paralinguistics Challenge (ComParE) is an open challenge dealing with states and traits of speakers as manifested in their speech signal’s acoustic properties. There have so far been nine consecutive Challenges at INTERSPEECH since 2009 (cf. the challenge series‘ repository at http://www.compare.openaudio.eu), but there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena. Thus, in this year’s 10th anniversary edition, we introduce four new tasks.

 

The following sub-challenges are addressed:

  • Atypical Affect Sub-Challenge, emotion of disabled speakers is to be recognised.
  • Self-Assessed Affect Sub-Challenge, self-assessed affect shall be determined.
  • Crying Sub-Challenge, mood-related types of infant vocalisation have to be classified.
  • Heart Beats Sub-Challenge, types of Heart Beat Sounds need to be distinguished. 

Search