Themen für Abschlussarbeiten

Aktuelle Themen

Human Pose Estimation (HPE) is the task of detecting human keypoints in images or videos. 2D Human Pose Estimation means the localization of these keypoints in pixel coordinates in the image or video frame. 3D Human Pose Estimation is the task of estimating a three dimensional pose of the humans in the image or video. Mostly, this task is accomplished by uplifting estimated 2D poses to the third dimension, e.g., by leveraging the time context in videos.

Transformer architectures are currently most common in these taks. They have the benefit to have a global view instead of the local view that convolution operations have. Thesis topics in this field could include analyzing 3D HPE architectures, improving/adapting them, e.g., for different domains or target applications, analyzing different input or training modes like semi-supervised learning, etc. 

Semi-Supervised Learning is an active research field in computer vision with the goal to train neural networks with only a small labeled dataset and a lot of unlabeled data. For human pose estimation, this means that a large dataset with images from people is available, but only a small subset has annotated keypoints. Semi-supervised human pose estimation uses different techniques to train jointly on labeled and unlabeled images in order to improve the detection performance of the network. Popular methods are pseudo labels - the usage of network predictions as annotations - and teacher-student-approaches, where one network is enhanced by being trained by a second network.   

 

If you are interested and want more information, please contact Katja Ludwig

The computer vision task of Human Pose Estimation estimates keypoints of humans in either 2D or 3D. These keypoints can be connected such that a skeleton model of the human can be created. This skeleton model is sufficient for some tasks, but does not reflect the body shape of the person. Human Mesh Estimation overcomes this issue. It estimates not only keypoints, but a whole mesh representing the pose and the body shape of humans. This task is more challenging than pure 3D Human Pose Estimation, as a lot more parameters need to be estimated. In order to keep the amount of parameters relatively small, body models like SMPL and its successors are common in this field. Thesis topics could include the analysis of Human Mesh architectures, slight adaptations to the models or training routines, analyses or conversion of body models, etc.    

 

If you are interested and want more information, please contact

Convolutional Neural Networks have been widely used in Computer Vision applications for their ability to learn meaningful features from images; However, with the recent success of Transformer architectures in various Natural Language Processing tasks, there has also been growing interest towards applying them to Computer Vision domains too. Though Transformers offer improved performance over CNNs, they come at a much higher computational cost. This thesis aims to decrease the computational cost by implementing Token Matching. This technique shows promising results in the task of image classification. However, it can not be applied directly to the task of semantic segmentation. In this thesis, we will explore multiple strategies to adjust the technique to the task of semantic segmentation.

For this thesis, previous knowledge with Python and PyTorch is recommended. If you are interested, write an email to Daniel Kienzle.

This topic is suitable for a master thesis.

Scene graph generation models are trained to find interactions and relationships in images. A relationship is defined as a triplet of subject-predicate-object, e.g. „person-playing-piano“.

However, current scene graph models are still limited to a fixed set of subject/object classes even though object detectors exist for open vocabulary classification and detection. Open vocabulary means that a model is not trained on a fixed set of classes but on arbitrary labels. Your task will be to integrate such an open vocabulary detection model into a scene graph generation pipeline.

Previous knowledge in PyTorch is required. Additionally, you should have experience with object detection models and how to incorporate large foreign code bases into your own work.

If you are interested or want more information, feel free to contact Julian Lorenz.

Scene graph generation models are trained to find interactions and relationships in images. A relationship is defined as a triplet of subject-predicate-object, e.g. "person-playing-piano". Recently, the HiLo architecture achieved a drastic improvement on panoptic scene graph generation.

Your task will be to build a new scene graph generation architecture, based on the state of the art HiLo model. Analyse HiLo's different building blocks and their effectiveness to find out which parts can be improved or even removed.

Previous fundamental knowledge in PyTorch is required, as well as working with foreign code bases. To build your own model, you will have to be creative and resourceful.

If you are interested or want more information, feel free to contact Julian Lorenz.

 

The access to masks for objects in images is of great importance to many computer vision tasks. Manually annotating such object masks (for example with polygon drawings), however, takes an extensive amount of time. In addition to this, the annotation of finely jagged edges and delicate structures poses a considerable problem. Interactive segmentation systems try to drastically ease this task by using forms of user guidance that can be annotated cheaply in order to predict an object mask. Usually this guidance takes the form of right/left mouse clicks to annotate single background/foreground pixels.

Semantic segmentation constitutes the task of classfiying every single pixel into one of several predefined classes. In consequence interactive segmentation systems constitute a combination of the two tasks: The segmentation happens on the basis of user guidance while the goal is to circumvent a costly annotation process. Instead of annotating single objects, the goal is to divide the entire input image into several class surfaces.

 

Literature:

[1] : https://ceur-ws.org/Vol-2766/paper1.pdf

[2] : https://arxiv.org/abs/2003.14200

 

If case of interest, contact Robin Schön (robin.schoen@uni-a.de)





 

Suche