Themen für Abschlussarbeiten

Aktuelle Themen

Für lange Zeit wurden in der Bildverarbeitung ausschließlich Faltungsnetze eingesetzt, jedoch werden sie in letzter Zeit immer häufiger durch die Transformerarchitektur ersetzt oder mit ihr kombiniert. Ein großer Vorteil der Transformerarchitektur ist, dass sie sehr flexibel einsetzbar ist. Diese Flexibilität soll in dieser Arbeit ausgenutzt werden.

Wird ein Neuronales Netz auf einen Datensatz trainiert, so ist es anschließend in der Lage ausschließlich für die Klassen aus diesem Datensatz Vorhersagen zu Treffen. Möchte man weitere Klassen hinzunehmen, muss man das Netzwerk aufwändig auf alle Daten neu trainieren. Sind die alten Daten nicht mehr vorhanden, stößt man außerdem auf das Problem des "Catastrophic Forgetting", das Netzwerk liefert also auf die alten Daten nicht mehr so gute Ergebnisse. In dieser Arbeit soll die Flexibilität der Transformerarchitektur ausgenutzt werden um diesen Problemen in der Aufgabe der Semantischen Segmentierung entgegen zu wirken. Dadurch wird einerseits das "Catastrophic Forgetting" verhindert, andererseits wird ermöglicht neue Klassen zu lernen ohne das komplette Netzwerk neu trainieren zu müssen.

Diese Arbeit eignet sich besonders für das Projektmodul oder eine Masterarbeit. Das Thema ist sehr forschungsnah, das heißt, dass die zu erwarteten Ergebnisse ungewiss sind, jedoch ein großer Erkenntnisgewinn möglich ist.

Bei Interesse meldet euch bei Daniel Kienzle.

Human Pose Estimation is the task of detecting human keypoints in images or videos. 2D Human Pose Estimation means the localization of these keypoints in 2D coordinates in the image or video frame. Convolutional neural networks are the most common for such tasks. Recently, the Transformer architecture emerged from natural language processing tasks to vision tasks. It has the benefit to have a global view instead of the local view that convolution operations have. As it was originally not designed for vision tasks, some adaptations have to made to make this architecture feasible for vision tasks. A lot of variants have been proposed recently, but they are mostly not evaluated for Human Pose Estimation. Theses in this topic should analyze the performance of different Transformer variants for Human Pose Estimation. Variants could include different basic architectures, target heads, architecture nuances/hyperparameters etc.

 

If you are interested and want more information, please contact Katja Ludwig

Semi-Supervised Learning is an active research field in computer vision with the goal to train neural networks with only a small labeled dataset and a lot of unlabeled data. For human pose estimation, this means that a large dataset with images from people is available, but only a small subset has annotated keypoints. Semi-supervised human pose estimation uses different techniques to train jointly on labeled and unlabeled images in order to improve the detection performance of the network. Popular methods are pseudo labels - the usage of network predictions as annotations - and teacher-student-approaches, where one network is enhanced by being trained by a second network.       

 

If you are interested and want more information, please contact Katja Ludwig

The segmentation of object surfaces on images constitutes a task of considerable importance in computer vision. Methods based on deep neural networks excel on this task, but require large amounts of annotated data to perform sufficiently. In order to construct an object mask the annotator has to mark every pixel belonging to the objects surface. With the aim of alleviating this laborious task, methods which partially automatize this task have been devised. These methods are called "Interactive Segmentation" methods. A lot of these methods rely on simple user interactions (clicks, scribbles, etc.) in order to create a segmentation mask for object surfaces.

 

One particular method, which is to be inspected in this thesis, aims for imporoved performance by positioning the pixels on opposing ends of the object.


In case of interest, or for additional information, contact: Robin Schön

 

 

Literature:

[1] : Camille Dupont et al.,  "UCP-Net: Unstructured Contour Points for Instance Segmentation", https://arxiv.org/abs/2109.07592

[2] : Konstantin Sofiiuk et al.,  "Reviving Iterative Training with Mask Guidance for Interactive Segmentation", https://arxiv.org/abs/2102.06583

Unsupervised Test Time Adaptation

 

When neural networks are deployed, they often get confronted with images that are visually different from any image that would be expected during training time. In some cases this alteration of the images’ visual details is due to a change in the weather, while in some other cases the change may appear due to image corruption or decreased camera quality. Additionally, we cannot expect the availability of labels in these new environments, since the model has already been deployed.

 

In order to be able to deal with these changes, methods for the adaptation of the networks during test time (i.e. after deployment) have been explored. Some of these methods seek to only change the network only by a little amount, in order to avoid the forgetting of knowledge previously acquired during training. This is necessary because of the unavailability of the data the network was originally trained on.

 

The aim of this bachelors’ thesis is the exploration of such methods, and the testing of this method on various datasets.


In case of interest, or for additional information, contact: Robin Schön

 

 

Literature:

Junha Song et al., „EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization“, https://arxiv.org/abs/2303.01904

Convolutional Neural Networks have been widely used in Computer Vision applications for their ability to learn meaningful features from images; However, with the recent success of Transformer architectures in various Natural Language Processing tasks, there has also been growing interest towards applying them to Computer Vision domains too. Though Transformers offer improved performance over CNNs, they come at a much higher computational cost. This thesis aims to decrease the computational cost by implementing Token Matching. This technique shows promising results in the task of image classification. However, it can not be applied directly to the task of semantic segmentation. In this thesis, we will explore multiple strategies to adjust the technique to the task of semantic segmentation.

For this thesis, previous knowledge with Python and PyTorch is recommended. If you are interested, write an email to Daniel Kienzle.

Suche