Masterarbeiten

Empathetic Behaviour Analysis with Deep Learning

Description

Empathic behaviour analysis is one of the most overlooked mechanisms in intelligent systems today. It can be defined as a complex process whereby "an observer reacting emotionally because he perceives that another is experiencing or about to experience an emotion”. Future machines should be endowed the ability to behave in an empathic manner, aiming at establishing and maintaining positive and long-term relationships with users.

Task

(a) study the work in the literature where empathetic behaviour is detected automatically in text/speech/video, (b) a preliminary hands-on evaluation of a deep learning approach for empathetic behaviour detection from speech and facial expressions.

Utilises

Python, Tensorflow or pytorch

Requirements

Programming skills in Python

Languages

English

Supervisor

Jing Han (jing.han@informatik.uni-augsburg.de)

Cross-Culture Emotion Recognition in the Wild

Description

Detecting and understanding emotional states of humans automatically is essential to improve the effectiveness of intelligent systems and devices, via providing an affective-based personalised user experience. Thanks to the great developments of machine learning, innovative technologies and algorithms are urging to handle affective information. Yet, in affective computing, there is not a lot of study on emotion recognition under cross-culture or multi-culture scenarios, by taking the effect of culture into account.

Task

(a) survey on the present techniques and related works in this emerging research field, and (b) evaluation on an audiovisual emotional dataset with 6 cultures.

Utilises

Python, Tensorflow or Pytorch

Requirements

Preliminary knowledge of machine learning, good programming skills in Python

Languages

English

Supervisor

Jing Han (jing.han@informatik.uni-augsburg.de)

Audio-Based Depression Recognition App

Description

Depression recognition

Task

Develop an Android application using available machine learning models for depression recognition

Utilises

Android Neural Networks API (NNAPI)

Requirements

Basic programming knowledge

Languages

German, English

Supervisor

Shahin Amiriparian, M. Sc. (shahin.amiriparian@informatik.uni-augsburg.de)

Correlation Between Emotion and Deception

Description

Is deception emotional?

Task

An in-depth analysis of the correlation between emotion and deception

Utilises

-

Requirements

Basic programming knowledge

Languages

English

Supervisor

Shahin Amiriparian, M. Sc. (shahin.amiriparian@informatik.uni-augsburg.de)

Explainable AI for Health Sensing

Description

The success of machine learning research has lead to an increase in potential applications, especially in the health domain. However, many contemporary systems are essentially black boxes; the internal operations determining their outputs are not transparent. Especially in the health domain, those developing machine-learning systems should be able to explain their rationale and characterise their strengths and weaknesses.

Task

Explore the efficacy of different explainable AI techniques with a focus on health

Utilises

Python, potentially deep learning toolkits

Requirements

Machine learning knowledge a plus

Languages

English

Supervisor

Dr. Nicholas Cummins (nicholas.cummins@informatik.uni-augsburg.de)

Deep Learning for Health Sensing

Description

Deep learning has undoubtedly led to improvements in what is possible concerning system accuracy and performance in a range of signal analysis tasks. However, the benefits of contemporary deep learning solutions can provide in the analysis different health states based on audio, visual and/or biosignals are yet to be fully explored.

Task

Application of deep learning to a range of different health detection tasks such as detection of different health states, abnormal heart beat detection & medical image analysis.

Utilises

Python, Tensorflow/Keras

Requirements

Prior machine learning knowledge related programming skills a plus

Languages

English

Supervisor

Dr. Nicholas Cummins (nicholas.cummins@informatik.uni-augsburg.de)

Denoising Audio Signals from in-the-wild Youtube Videos utilising Deep Learning

Description

In recent years, the use of deep learning has rapidly increased in many research areas and industry, pushing the boundaries of automated data analysis. Large data companies (e.g. Google, Facebook) have a huge amount of data to train stable and versatile models and, thus, inspire many fields and architectures in deep learning. In contrast, generic research is tailored to very specific areas, such as emotion recognition, and models have been trained under laboratory conditions on academic datasets to learn domain-specific, valuable features.

The use of large in-the-wild datasets is beneficial for both sides. On the one hand, from a purely research perspective, they enable specific and, at the same time, stable models. On the other hand, industry can transfer pre-trained models, architectures and feature extraction frameworks to new applications. In-the-wild data, however, has a higher granularity and noise than laboratory data. In order to facilitate its use in both sectors, noise and particularly deleterious training influences have to be automatically detected, extracted and removed.

The aim of this study is to adapt one or more deep learning architectures for audio denoising, enhance them for a specific domain and tune them, identifying appropriate parameters. Recently, WaveNet [1] showed promising performance on a similar task [2] and will be analyzed regarding its utilizability. Audio examples [3] and a first implementation [4] are also available. The dataset that will be used in this project comprises Youtube videos capturing emotional car reviews (EmoCaR). Further data , e.g. to add natural noise, are available from the Diverse Environments Multichannel Acoustic Noise Database (DEMAND). Typical noise patterns in the original videos are background music or car sounds.

[1] https://deepmind.com/blog/wavenet-generative-model-raw-audio/

[2] https://arxiv.org/pdf/1706.07162.pdf

[3] http://www.jordipons.me/apps/speech-denoising-wavenet/25.html

[4] https://github.com/drethage/speech-denoising-wavenet

Task

In this thesis, the student(s) will develop a state-of-the-art deep learning audio denoising technique.

Utilises

audio, deep neural networks, WaveNet, encoder-decoder, CNN-based

Requirements

Preliminary knowledge in deep learning and audio processing, good programming skills (e.g. Python, C++).

Languages

German or English

Supervisor

Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)

Visual Representation Learning from Text Input

Description

In recent years, deep learning has become very popular in the research community leading to a steady advancement in the development of new neural network architectures. Many architectures have emerged in the field of computer vision and have subsequently been adapted for further modalities such as text and audio. For example, convolutional neural networks, which were originally derived from human visual perception, also achieve state-of-the-art performance on many text classification tasks. Most of the approaches using 1D convolutions on pre-trained word embedding, but do not utilise the full potential of visual layers. As an example, reference can be made to (visual) character quantization (https://arxiv.org/pdf/1502.01710.pdf). The aim of this study is to explore new ways to learn visual text representation complementary to word embeddings utilising typical image properties (such as colour) and compare them on a simple CNN.

Task

In this thesis, the student will design and implement a new text representation suitable for a visual input layer. In addition, this text representation will be compared to others like word2vec visual embeddings and traditional word embeddings. For this purpose, a benchmark will be performed on the popular NLP tasks; e.g., Text Sentiment Classification based on Amazon Review 5-class polarity dataset.

Utilises

Tensorflow/ Keras, Visual representation Learning for Text, CNNs

Requirements

Advanced knowledge in machine learning and natural language processing, Good programming skills (e.g. Python, C++)

Languages

German or English

Supervisor

Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)

Text Representation Learning for Matrix CapsNets

Description

In recent years, deep learning has become very popular in the research community, both in fundamental and applied research. This has lead to a steady advancement in the development of existing and new neural network architectures. Many architectures emerging in the field of computer vision are subsequently adapted for further modalities such as text and audio. For example, convolutional neural networks, which were originally derived from human visual perception, also achieve state-of-the-art performance on many text classification tasks.

A very new and unique innovation for image data are the so-called Capsnets [1] and their improved (improved calculation of the network forward pass) 2nd generation the Matrix Capsules [2]. First attempts have already been made to adapt them for text and achieved some remarkable results on topic or sentiment classification tasks [3] [4]. The aim of this study is to explore new ways to learn text representation such as (visual) character quantization [5]with matrix capsules and, thus, to investigate more deeply the combination of text modality and capsule networks.

Task

In this thesis the student will design and implement a new text representation suitable for a visual input layer. In addition, this text representation will be compared to other others like word2vec visual embeddings and N-gram convolutional layer previously used with Capsnets. For this purpose, a benchmark is performed on the popular NLP Tasks Text Sentiment Classification based on Amazon Review 5-class polarity dataset. The implementation of the Matrix Capsules in Tensorflow can be based on [6].

Utilises

Tensorflow, Matrix CapsNets, Representation Learning.

Requirements

Advanced knowledge in machine learning and natural language processing, good programming skills (e.g. Python, C++).

Languages

German or English

Supervisor

Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)

Shahin Amiriparian, M. Sc. (shahin.amiriparian@informatik.uni-augsburg.de)

Investigation and Optimization of Annotations for training Neural Networks for Emotion Recognition in Videos

Description

To perform the most common form of deep learning called supervised learning, it requires labels and annotation of the data. These are the prediction target and the training stimulus of the neural network. The neural network is optimized to reduce the distance between the real target (labels, annotation) and the predicted label. For example, in object recognition & localization, squares or polygons frame the objects to be recognized. The network has to learn the most decisive features to predict these.

A big challenge remains the high cost of annotation. In Affective Computing these costs are many times higher compared to images, because a) the data are videos and b) emotions are perceived differently by people, therefore each video has to be labelled by 5 different annotators for the same type of emotion and subsequently, this annotation has to be merged into a single golden label. The continuous annotations are annotated with a joystick while playing the video.
For this reason, during the creation of our last database EmCaR (Emotional Car Reviews) we collected discrete metadata per annotation in addition to the continuous AC annotations.
In this thesis the student designs and implements a method suitable for analysing complex discrete and continuous emotional annotations. In addition, state-of-the-art neural networks (e.g. Transformer) are trained and benchmarked on differently generated annotations (and golden labels).

Task

In this thesis, the student will design and implement a new text representation suitable for a visual input layer. In addition, this text representation will be compared to others like word2vec visual embeddings and traditional word embeddings. For this purpose, a benchmark will be performed on the popular NLP tasks; e.g., Text Sentiment Classification based on Amazon Review 5-class polarity dataset.

Utilises

Tensorflow/Pytorch, Neural networks, Statistical Correlations Methods

Requirements

Fundamental knowledge in Machine Learning and Statistics, Good programming skills (e.g. Python).

Languages

German or English

Supervisor

Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)

Empirical Comparison of Context and Transformer Word Embeddings on Few-Shot Learning Tasks

Description

Google has recently demonstrated a new method to learn word embeddings through transformer networks (BERT – https://github.com/google-research/bert), which obtain state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. Word embeddings are the fundamental feature sets for any NLP task and especially important to avoid elaborate representational learning. Furthermore, deep learning approaches suffer from poor sampling efficiency in contrast to human perception. One- and few-shot learning tries to learn representations from only a few samples and is often used in tasks where only few data and targets are available. Recently, researchers also started to use these techniques on linguistic data.

Task

In this work, the student(s) will bring together two novel ways in NLP by empirically comparing different word embeddings (including BERT) in the context of few-shot learning.

Utilises

NLP, Transformer Word Embeddings, Few-shot learning

Requirements

Advanced knowledge in Machine Learning and Natural Language Processing, Good programming skills (e.g. Python, C++)

Languages

German or English

Supervisor

Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)

Banner and Advertisement Detection and Localisation in YouTube Videos Utilising Pseudo-Supervised Deep Learning

Description

The use of large in-the-wild datasets is beneficial for research and industry. In-the-wild data, however, have a higher granularity and noise than laboratory data. In order to simplify the joint use, noise and particularly disturbing training influences have to be automatically detected, extracted and removed. Data sources such as YouTube, represent a very good data source due to its public availability and extensive content. These videos, however, often include banner highlighting additional information in textual form. These video elements are disturbing training influences that can confuse feature extraction frameworks trained by deep learning models. Removing these banners by hand would be extra effort for the creators, reducing the chance of receiving permission of them to use their videos for research purposes.

The aim of this study is to automatically detect and localise distracting elements in videos utilising SOTA deep learning algorithms. For this purpose, a label generator has to be developed, which projects realistic boxes and texts on random positions and in different sizes into the video. These elements are used as pseudo labels in the subsequent training process. The developed neural network should learn to predict these elements and their position in a video sequence (see Pixel CNNs).

Task

In this thesis, the student(s) will develop a state-of-the-art data generator and deep learning method for banner detection.

Utilises

Advanced Data Augmentation, Video/Image Segmentation/Masking, R-CNN

Requirements

Preliminary knowledge in Deep Learning, Computer Vision, Good programming skills (e.g. Python, C++)

Languages

German or English

Supervisor

Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)

Conditional Generative Networks for Augmentation of Natural Soundscapes

Description

Our daily lives are surrounded by chaotic noise, methods to alter our sonic enviroments are needed urgently. Computational generation approaches for audio are becoming more robust, and offer the chance for emotion-based conditioning of high fidelty audio.

Task

Exploring methods to condition generative networks via emotion, and generate paired audio experiences. Provided with a dataset of emotional soundscapes (i.e. urban/ natural/ mechanical), evaluating meaningful methods to extract musicality/ rhythm/ genre, from the natural data (i.e. chroma features/ comb filters). Apply features as input, to known Gernative networks (inc. Generative Adversarial Networks, GANs) including startGAN.

Utilises

Python (librosa /madmom)

Requirements

knowledge of Deep Learning. Good programming skills (e.g. Python)

Languages

German or English

Supervisor

Alice Baird (alice.baird@informatik.uni-augsburg.de)

Lehrstuhl für Embedded Intelligence for Health Care and Wellbeing

Masterarbeiten

Empathetic Behaviour Analysis with Deep Learning

Description

Task

Utilises

Requirements

Languages

Supervisor

Cross-Culture Emotion Recognition in the Wild

Description

Task

Utilises

Requirements

Languages

Supervisor

Audio-Based Depression Recognition App

Description

Task

Utilises

Requirements

Languages

Supervisor

Correlation Between Emotion and Deception

Description

Task

Utilises

Requirements

Languages

Supervisor

Explainable AI for Health Sensing

Description

Task

Utilises

Requirements

Languages

Supervisor

Deep Learning for Health Sensing

Description

Task

Utilises

Requirements

Languages

Supervisor

Denoising Audio Signals from in-the-wild Youtube Videos utilising Deep Learning

Description

Task

Utilises

Requirements

Languages

Supervisor

Visual Representation Learning from Text Input

Description

Task

Utilises

Requirements

Languages

Supervisor

Text Representation Learning for Matrix CapsNets

Description

Task

Utilises

Requirements

Languages

Supervisor

Investigation and Optimization of Annotations for training Neural Networks for Emotion Recognition in Videos

Description

Task

Utilises

Requirements

Languages

Supervisor

Empirical Comparison of Context and Transformer Word Embeddings on Few-Shot Learning Tasks

Description

Task

Utilises

Requirements

Languages

Supervisor

Banner and Advertisement Detection and Localisation in YouTube Videos Utilising Pseudo-Supervised Deep Learning