Themen für Abschlussarbeiten
Aktuelle Themen
Scene Graph Generation is about detecting relationships in images. Relationships are described as triplets of subject, predicate, and object, e.g. "person-driving-car". Current methods are evaluated using the Recall@k metric and its variants. However, these metrics have different drawbacks which should be addressed in this thesis.
You will create an overview over existing scene graph generation metrics and compare their properties using various experiments on different datasets. Based on your findings, you will design a new metric that tackles the drawbacks of existing metrics. Your code will be published as Python package on PyPI to make it accessible for other researchers in the field.
To succeed, you should be proactive, creative, and bring your own ideas. Additionally, it is recommended to have a founded knowledge of neural networks for computer vision and be familiar with PyTorch.
If you are interested and want more information, feel free to contact Julian Lorenz.
Für lange Zeit wurden in der Bildverarbeitung ausschließlich Faltungsnetze eingesetzt, jedoch werden sie in letzter Zeit immer häufiger durch die Transformerarchitektur ersetzt oder mit ihr kombiniert. Ein großer Vorteil der Transformerarchitektur ist, dass sie sehr flexibel einsetzbar ist. Diese Flexibilität soll in dieser Arbeit ausgenutzt werden.
Wird ein Neuronales Netz auf einen Datensatz trainiert, so ist es anschließend in der Lage ausschließlich für die Klassen aus diesem Datensatz Vorhersagen zu Treffen. Möchte man weitere Klassen hinzunehmen, muss man das Netzwerk aufwändig auf alle Daten neu trainieren. Sind die alten Daten nicht mehr vorhanden, stößt man außerdem auf das Problem des "Catastrophic Forgetting", das Netzwerk liefert also auf die alten Daten nicht mehr so gute Ergebnisse. In dieser Arbeit soll die Flexibilität der Transformerarchitektur ausgenutzt werden um diesen Problemen in der Aufgabe der Semantischen Segmentierung entgegen zu wirken. Dadurch wird einerseits das "Catastrophic Forgetting" verhindert, andererseits wird ermöglicht neue Klassen zu lernen ohne das komplette Netzwerk neu trainieren zu müssen.
Diese Arbeit eignet sich besonders für das Projektmodul oder eine Masterarbeit. Das Thema ist sehr forschungsnah, das heißt, dass die zu erwarteten Ergebnisse ungewiss sind, jedoch ein großer Erkenntnisgewinn möglich ist.
Bei Interesse meldet euch bei Daniel Kienzle.
Human Pose Estimation is the task of detecting human keypoints in images or videos. 2D Human Pose Estimation means the localization of these keypoints in 2D coordinates in the image or video frame. Convolutional neural networks are the most common for such tasks. Recently, the Transformer architecture emerged from natural language processing tasks to vision tasks. It has the benefit to have a global view instead of the local view that convolution operations have. As it was originally not designed for vision tasks, some adaptations have to made to make this architecture feasible for vision tasks. A lot of variants have been proposed recently, but they are mostly not evaluated for Human Pose Estimation. Theses in this topic should analyze the performance of different Transformer variants for Human Pose Estimation. Variants could include different basic architectures, target heads, architecture nuances/hyperparameters etc.
If you are interested and want more information, please contact Katja Ludwig
Semi-Supervised Learning is an active research field in computer vision with the goal to train neural networks with only a small labeled dataset and a lot of unlabeled data. For human pose estimation, this means that a large dataset with images from people is available, but only a small subset has annotated keypoints. Semi-supervised human pose estimation uses different techniques to train jointly on labeled and unlabeled images in order to improve the detection performance of the network. Popular methods are pseudo labels - the usage of network predictions as annotations - and teacher-student-approaches, where one network is enhanced by being trained by a second network.
If you are interested and want more information, please contact Katja Ludwig
Convolutional Neural Networks have been widely used in Computer Vision applications for their ability to learn meaningful features from images; However, with the recent success of Transformer architectures in various Natural Language Processing tasks, there has also been growing interest towards applying them to Computer Vision domains too. Though Transformers offer improved performance over CNNs, they come at a much higher computational cost. This thesis aims to decrease the computational cost by implementing Token Matching. This technique shows promising results in the task of image classification. However, it can not be applied directly to the task of semantic segmentation. In this thesis, we will explore multiple strategies to adjust the technique to the task of semantic segmentation.
For this thesis, previous knowledge with Python and PyTorch is recommended. If you are interested, write an email to Daniel Kienzle.
Scene graph generation models can detect interactions and relationships in images. A relationship is defined as a triplet of subject-predicate-object, e.g. „person-driving-car“.
However, current scene graph datasets that are used to train such models are struggling with incomplete annotations and unbalanced predicate class distributions. Using synthetic datasets, we do not have these problems because we can generate as many images as we want and decide how they should look like.
To generate a synthetic dataset for scene graph generation, you will use the Unity game engine and its Perception Package. The Perception Package enables us to construct and automatically annotate synthetic data. However, only standard annotations like depth masks or bounding boxes are supported. You will have to write a custom extension to the Perception Package to support annotations for scene graph datasets.
Additionally, you will develop algorithms to position objects in a virtual environment to create images that contain predefined sets of predicate classes, useful for scene graph generation.
Finally, you will evaluate your dataset using a state of the art scene graph model to demonstrate the effectiveness of your dataset.
Previous knowledge in C# and the Unity game engine are recommended to quickly get started. To succeed, you will have to bring in your own ideas and be able to understand and modify existing code bases like the Perception Package.
If you are interested or want more information, feel free to contact Julian Lorenz.
This topic is suitable for a master thesis.
Scene graph generation models are trained to find interactions and relationships in images. A relationship is defined as a triplet of subject-predicate-object, e.g. „person-playing-piano“.
However, current scene graph models are still limited to a fixed set of subject/object classes even though object detectors exist for open vocabulary classification and detection. Open vocabulary means that a model is not trained on a fixed set of classes but on arbitrary labels. Your task will be to integrate such an open vocabulary detection model into a scene graph generation pipeline.
Previous knowledge in PyTorch is required. Additionally, you should have experience with object detection models and how to incorporate large foreign code bases into your own work.
If you are interested or want more information, feel free to contact Julian Lorenz.
Scene graph generation models are trained to find interactions and relationships in images. A relationship is defined as a triplet of subject-predicate-object, e.g. "person-playing-piano". Recently, the HiLo architecture achieved a drastic improvement on panoptic scene graph generation.
Your task will be to build a new scene graph generation architecture, based on the state of the art HiLo model. Analyse HiLo's different building blocks and their effectiveness to find out which parts can be improved or even removed.
Previous fundamental knowledge in PyTorch is required, as well as working with foreign code bases. To build your own model, you will have to be creative and resourceful.
If you are interested or want more information, feel free to contact Julian Lorenz.