Unsupervised Domain Adaption

Domain Adaption Schematic. Images of urban scenes are taken from [1] and [2]. CC BY-NC-ND

The success of deep learning has been greatly beneficial various computer vision tasks such as object classification, semantic segmentation and object detection. The majority of the progress can be attributed to the advancements in deep convolutional neural networks (CNNs) and recently to vision transformers. Such models are usually trained in a supervised fashion, which has been made possible due to the availability of large datasets having thousands of images annotated with ground-truth labels. However, one of the major drawbacks is the poor generalization capability of neural networks to visually distinct images compared to the training images. For instance, a detection model trained with a dataset collected in Germany may not necessarily perform well on images from Tokyo due to the changes in the appearance of scenes, objects and maybe weather conditions. But it is not just changes in location that have an impact; for example, different weather conditions or the time of day can also greatly influence the model's performance. For example, if the data set consisted only of images taken during the day and in good weather, the model will usually not perform well at rainy situations or at night.


Of great interest is also the synthetic to real world adaptation problem, which is mainly motivated by the extremely laborious and time-consuming process of annotating data, usually performed by humans. In a synthetic environment, such as a computer game or simulation, the acquisition of annotated data can be done automatically without the need for human annotation. A simulation also has the advantage that certain traffic situations, for example, can be depicted more easily. This use case of Domain Adaptation primarily attempts to replace real human annotated data with synthetic automatically annotated data. 

In the domain adaptation problem, we consider two domains, namely source and target, denoted as S and T , respectively. The source and target domains are assumed to have different data distributions. Most domain adaptation formulations consider that the source dataset is label-rich, while the target dataset is label-scarce or even label-less. The most practical and challenging formulation is the unsupervised domain adaptation formulation, which assumes that source domain is fully annotated while no annotations are available for target domain.


For more information please contact Sebastian Scherer.





Sebastian Scherer, Robin Schön, Katja Ludwig and Rainer Lienhart. in press. Unsupervised domain extension for nighttime semantic segmentation in urban scenes. In Conference proceedings. SciTePress, Setúbal



[1] Stephan R. Richter, Vibhav Vineet, Stefan Roth, and VladlenKoltun.Playing for data: Ground truth from computergames. In Bastian Leibe, Jiri Matas, Nicu Sebe, and MaxWelling, editors,European Conference on Computer Vision(ECCV), volume 9906 ofLNCS, pages 102–118. SpringerInternational Publishing, 2016


[2] Marius Cordts, Mohamed Omran, Sebastian Ramos, TimoRehfeld, Markus Enzweiler, Rodrigo Benenson, UweFranke, Stefan Roth, and Bernt Schiele.The cityscapes dataset for semantic urban scene understanding. InProc.of the IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2016