2019

audEERING, Research & Product Development

 

Titel: audEERING, Research & Product Development

Dozent(in): Johannes Wagner

Termin: Dienstag 17. Dezember 2019, 16:00

 

Titel: audEERING, Research & Product Development

Dozent(in): Felix Burkhardt

Termin: Dienstag 17. Dezember 2019, 15:30

Multi-instance learning for bipolar disorder diagnosis using weakly labelled speech data

Abstract

While deep learning is the predominant learning technique across speech processing, it is still not widely used in health-based applications. The health based corpora available are often small, both concerning the total amount of data and the number of individuals. The Bipolar Disorder corpus, used in the 2018 Audio/Visual Emotion Challenge, contains only 218 audio samples from 46 individuals. Herein, we present a multi-instance learning framework aimed at constructing more reliable deep learning-based models.

 

Dozentin: Zhao Ren

Termin: 14:00, Freitag 15. November 2019

Raum: EIHW Gemeinschaftsraum, 305

Technical writing tips - some examples and experiences

Titel: Technical writing tips - some examples and experiences

Dozent(in): Jing Han

Termin: Freitag 8. November 2019, 14:00

A Deep Learning Approach for Location Independent Throughput Prediction

Abstract

Mobile communication has become a part of everyday life and is considered to support reliability and safety in traffic use cases such as conditionally automated driving. Nevertheless, prediction of Quality of Service parameters, particularly throughput, is still a challenging task while on the move. Whereas most approaches in this research field rely on historical data measurements, mapped to the corresponding coordinates in the area of interest, this paper proposes a throughput prediction method that focuses on a location independent approach. In order to compensate the missing positioning information, mainly used for spatial clustering, our model uses low-level mobile network parameters, improved by additional feature engineering to retrieve abstracted location information, e. g., surrounding building size and street type. Thus, the major advantage of our method is the applicability to new regions without the prerequisite of conducting an extensive measurement campaign in advance. Therefore, we embed analysis results for underlying temporal relations in the design of different deep neuronal network types. Finally, model performances are evaluated and compared to traditional models, such as the support vector or random forest regression, which were harnessed in previous investigations.

 

Dozent(in): Josef Schmid, Ostbayrische Technische Hochschule

Termin: 14:00, Freitag 18. Oktober 2019

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Autonomous Emotion Learning in Speech

Dozent(in): Xinzhou Xu

Termin: Freitag 26. Juli 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Lesion Detection on CT Images: Convolutional Neural Networks and Dynamic Computation Graphs

Abstract

Detecting lesions in computed tomography (CT) images is a highly important but difficult task. A well-performing detection system renders essential support for doctors by providing objective support to aid diagnosis. In this thesis, a new technique for general object detection is introduced and applied as a universal lesion detector. Specifically, a static scaling mechanism, \emph{Region of Interest} (RoI) pooling, of the existing \emph{Region Convolutional Neural Network} (R-CNN) object detection framework family is addressed and replaced by a more dynamic approach. Both frameworks work similarly, by first generating a set of region proposals which then are processed by a regression and a classification network. Instead of statically warping all differently shaped region proposals to the same size for further processing by fully connected layers, the proposed approach employs a set of convolutional layers that can be dynamically parametrised to match each unique input shape. Experimental results suggest that the proposed approach has the potential of achieving state-of-the-art detection rates at a lower parameter count. Yet, future work is needed to reduce currently existing flaws, such as the increased hardware resource requirements.

 

Dozent(in): Thomas Wiest

Termin: Dienstag 16. Juli 2019, 14:30

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Research Practices in Machine Learning

Dozent(in): Nicholas Cummins

Termin: Freitag 12. Juli 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Audio-based Recognition of Bipolar Disorder Utilising Capsule Networks

Abstract

Bipolar disorder (BD) is an acute mood condition, in which states can drastically shift from one extreme to another, considerably impacting an individual’s wellbeing. Automatic recognition of a BD diagnosis can help patients to obtain medical treatment at an earlier stage and therefore have a better overall prognosis. With this in mind, in this study, we utilise a Capsule Neural Network (CapsNet) for audio-based classification of patients who were suffering from BD after a mania episode into three classes of Remission, Hypomania, and Mania. The CapsNet attempts to address the limitations of Convolutional Neural Networks (CNNs) by considering vital spatial hierarchies between the extracted images from audio files. We develop a framework around the CapsNet in order to analyse and classify audio signals. First, we create a spectrogram from short segments of speech recordings from individuals with a bipolar diagnosis. We then train the CapsNet on the spectrograms with 32 low-level and three high-level capsules, each for one of the BD classes. These capsules attempt both to form a meaningful representation of the input data and to learn the correct BD class. The output of each capsule represents an activity vector. The length of this vector encodes the presence of the corresponding type of BD in the input, and its orientation represents the properties of this specific instance of BD. We show that using our CapsNet framework, it is possible to achieve competitive results for the aforementioned task by reaching a UAR of 46.2 % and 45.5 % on the development and test partitions, respectively. Furthermore, the efficacy of our approach is compared with a sequence to sequence autoencoder and a CNN-based neural network. [This paper was presented at the 32nd International Joint Conference on Neural Networks (IJCNN) in July 2019 in Budapest, Hungary]

 

Dozent(in): Shahin Amiriparian

Termin: Freitag 5. Juli 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes

Abstract

The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio waveform has been recorded. Recently, deep neural networks have been applied to ASC and have achieved state-of-the-art performance. However, few works have investigated how to visualise and understand what a neural network has learnt from acoustic scenes. Previous work applied local pooling after each convolutional layer, therefore reduced the size of the feature maps. In this paper, we suggest that local pooling is not necessary, but the size of the receptive field is important. We apply atrous Convolutional Neural Networks (CNNs) with global attention pooling as the classification model. The internal feature maps of the attention model can be visualised and explained. On the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 dataset, our proposed method achieves an accuracy of 72.7 %, significantly outperforming the CNNs without dilation at 60.4 %. Furthermore, our results demonstrate that the learnt feature maps contain rich information on acoustic scenes in the time-frequency domain. [This paper was presented at the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) in May 2019 in Brighton, UK]

 

Dozent(in): Zhao Ren

Termin: Freitag 28. Juni 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Implicit Fusion by Joint Audiovisual Training for Emotion Recognition in Mono Modality

Abstract

Despite significant advances in emotion recognition from one individual modality, previous studies fail to take advantage of other modalities to train models in mono-modal scenarios. In this work, we propose a novel joint training model which implicitly fuses audio and visual information in the training procedure for either speech or facial emotion recognition. Specifically, the model consists of one modality-specific network per individual modality and one shared network to map both audio and visual cues into final predictions. In the training process, we additionally take the loss from one auxiliary modality into account besides the main modality. To evaluate the effectiveness of the implicit fusion model, we conduct extensive experiments for mono-modal emotion classification and regression, and find that the implicit fusion models outperform the standard mono-modal training process. [This paper was presented at the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) in May 2019 in Brighton, UK]

 

Dozent(in): Jing Han

Termin: Freitag 21. Juni 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

End-to-end Audio Classification with Small Datasets

Dozent(in): Maximilian Schmitt

Termin: Freitag 14. Juni 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

A Comparison of AI-Based Throughput Prediction for Cellular Vehicle-To-Server Communication

Abstract

In order to reliably plan vehicle-to-server communication traffic on a fluctuating mobile radio channel, various approaches to throughput prediction are carried out. On the one hand, there are models based on aggregation depending on the position, e.g. connectivity maps. On the other hand, there are traditional machine learning approaches such as support vector regression. This work shows a implementation of the latter, including OSM-based feature engineering, and performs a comprehensive comparison of the performance of these models using a unified data set. [Presented at the 15th International Wireless Communications & Mobile Computing Conference (IWCMC) in June 2019 in Tangier, Morocco]

 

Dozent(in): Josef Schmid

Termin: Freitag 7. Juni 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Audiovisual Database Recording Plan - Chinese Teenagers' Spoken English Skills

Dozent(in): Meishu Song

Termin: Dienstag 4. Juni 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Performance Analysis of Unimodal and Multimodal Models in Valence-Based Empathy Recognition

The human ability to empathise is a core aspect of successful interpersonal relationships. In this regard, human-robot interaction can be improved through the automatic perception of empathy, among other human attributes, allowing robots to affectively adapt their actions to interactants' feelings in any given situation. This paper presents our contribution to the generalised track of the One-Minute Gradual (OMG) Empathy Prediction Challenge by describing our approach to predict a listener's valence during semi-scripted actor-listener interactions. We extract visual and acoustic features from the interactions and feed them into a bidirectional long short-term memory network to capture the time-dependencies of the valence-based empathy during the interactions. Generalised and personalised unimodal and multimodal valence-based empathy models are then trained to assess the impact of each modality on the system performance. Furthermore, we analyse if intra-subject dependencies on empathy perception affect the system performance. We assess the models by computing the concordance correlation coefficient (CCC) between the predicted and self-annotated valence scores. The results support the suitability of employing multimodal data to recognise participants' valence-based empathy during the interactions, and highlight the subject-dependency of empathy. In particular, we obtained our best result with a personalised multimodal model, which achieved a CCC of 0.11 on the test set. [Presented at the 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) in May 2019 in Lille, France]

 

Dozent(in): Adria Mallol-Ragolta

Termin: Freitag 31. Mai 2019, 14:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Diarization of Children Versus Adult Speech Incorporating Recent Deep Neural Network Architectures, Präsentation der Masterarbe

Dozent(in): Lukas Rust

Termin: Dienstag 26. Februar 2019, 10:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Entwicklung eines Bewertungssystems für Fußballspieler, Präsentation der Bachelorarbeit (auf Deutsch)

Dozent(in): Stephan Wolf

Termin: Dienstag 26. Februar 2019, 10:00

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

How can emotional AI help designers understand users' negative states?

Dozent(in): Meishu Song

Termin: 14:00, Freitag 11. Januar, 2019

Gebäude/Raum: EIHW Gemeinschaftsraum, 305

Search