Topics for Theses

Image Captioning

Image Captioning in computer vision. Image taken from [1].

Introduction

Generating captions that describe the content of an image is a task emerging in computer vision. Lastly, Recurrent Neural Networks (RNN) in the form of Long Short-Term Memory (LSTM) networks have shown great success in generating captions matching an image's content. In contrast to traditional tasks like image classification or object detection, this task is more challenging. A model not only needs to identify a main class but also needs to recognize relationships between objects and describe them in a natural language like English. Recently, an encoder/decoder network presented by Vinyals et al. [1] won the Microsoft COCO 2015 captioning challenge.

 

Available Topics
 

Generating Image Captions for Unkown Objects

Examples of generated captions (LRCN) and improved sentences (LSTM-C) by using a copying mechanism that replaces unknown words by detected objects within the caption. Figure taken from [2].

 

As seen in [1], deep neural networks a capable of generating simple captions for given images when given a large dataset in the order of a million labeled image-caption pairs. However, as every data driven machine learning approach these models can’t create captions for objects never seen before, i.e. the dataset doesn’t contain images or captions of a certain type. Training such a model takes days to weeks on modern GPUs, but a requirement may be that this model can describe new object categories afterwards with little to no extra effort.

 

Ting et al. use a hybrid model that combines an object detection model with a captioning model. In figure 1 you can see a picture and its ground-truth caption. A standard captioning model (LRCN) may produce a caption that is not accurate (e.g. “a red fire hydrant is parked in the middele of a city”). Combined with an object detection pipeline, which detects a bus in this case this hybrid model can generate a much more accurate caption “a bus driving down a street next to a building”.

 

Your tasks will be:

  • Familiarize yourself with the  Tensorflow Framework and the  Show and Tell model
  • Integrate the LSTM copying mechanism introduced in [2] into a given Tensorflow Model (Show and Tell)
  • Train and evaluate your implementation on the  MSCOCO dataset by holding out 8 objects out of the dataset
  • Compare and analyze the performance of your implementation against the paper [2].

Python and Numpy knowledge is advantageous as Tensorflow models are implemented in Python.

If you are interested and want more information, please contact  Philipp Harzig.
 

Visual Question Answering

Examples of question/image input pairs and generated answers. Picture taken from [3].

 

Building on top of general image captioning another more challenging task in computer vision has come up recently. It is called visual question answering a question referencing some of the input image’s contents is part of the input. The model then tries to answer the question as accurate as possible. See figure 2 for image-question-answer examples. Most publications have agreed on one approach to tackle this problem. The question and image are both embedded in a vector representation, then combined in some way and the answer is generated as the most likely answer out of 3000 to 5000 possible answers. Therefore, the problem is modeled as a classification problem, i.e. all possible answers are assigned a probability and the most likely answer is selected out of all answers.

 

Your tasks will be:

  • Familiarize yourself with the  Tensorflow Framework and a classification pipeline already implemented in Tensorflow (e.g. the  Slim Framework)
  • Implement a model that embeds an image with a deep convolutional neural network (DCNN) like the ResNet or Inception network and a question with an LSTM network. This model should combine both representations and use a classification model that selects the most probable answer for this combination.
  • Train and evaluate your implementation on the  VQA-v2 dataset.
  • Compare and analyze the performance vs. state-of-the-art approaches.

Python and Numpy knowledge is advantageous as Tensorflow models are implemented in Python. If you are interested and want more information, please contact  Philipp Harzig.

 

Literature for Image Captioning

[1] Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
 

[2] Yao, Ting, et al. "Incorporating copying mechanism in image captioning for learning novel objects." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
 

[3] Teney, Damien, et al. "Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge." arXiv preprint arXiv:1708.02711 (2017).

Computer Aided Diagnosis

Introduction

Computer Aided Diagnosis (CADx) helps doctors interpret medical images, such as 2 dimensional x-rays, MRI scans, CT scans and ultrasound images. CADx has accomplished major progress, thanks to the big advances in computer vision in recent years. Task range from image preprocessing, segmentation, detection, evaluation, classification to diagnosis of diseases.

 

Available Topics
 

Pneumonia Diagnosis

The National Institutes of Health (NIH) has released a dataset of 112,120 frontal-view thorax x-rays of 30,805 unique patients. The images were labeled with up to 14 thoracic diseases. The disease labels were generated by automated text processing with a combination of machine learning algorithms and rule based approaches. One of the labeled diseases is pneumonia. Pneumonia is an inflammatory condition of the lung caused by infection with viruses or bacteria. It affects about 7% of the world wide population and results in about 4 million deaths per year. According to the WHO, chest X-rays are the best method of detecting pneumonia. The diagnosis depends on expert radiologists, which are scarcely available in developing countries. Developing efficient CADx Algorithms for pneumonia detection could have a great positive impact on these countries.

 

Your tasks will be:

  • Familiarize yourself with the  Tensorflow Framework and  SLIM
  • Implement a deep neural net using SLIM to detect pneumonia
  • Train and evaluate your implementation on the  ChestXray-NIHCC dataset
  • Compare and analyze the performance of your implementation against the paper [1].

Python and Numpy knowledge is advantageous as Tensorflow models are implemented in Python. If you are interested and want more information, please contact  Stephan Brehm.

 

Literature
 

[1] Pranav Rajpurkar, et al. “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning” arxiv 2017  https://arxiv.org/abs/1711.05225

 

Multilabel Chest X-ray Diagnosis

The National Institutes of Health (NIH) has released a dataset of 112,120 frontal-view thorax x-rays of 30,805 unique patients. The images were labeled with up to 14 thoracic diseases. The disease labels were generated by automated text processing with a combination of machine learning algorithms and rule based approaches. The diseases range from Atelectasis, Cardiomegaly to Pneumonia and Pneumothorax. One X-ray can show multiple diseases. So you will be facing a multi-label multi-class classification task.

 

Your tasks will be:

Python and Numpy knowledge is advantageous as Tensorflow models are implemented in Python. If you are interested and want more information, please contact  Stephan Brehm.

Search