Medical Image Captioning

Image Captioning also started to become popular in automatically generating doctor’s reports for thorax x-ray images. Annotating chest x-rays is a tedious and time-consuming job, which involves a lot of domain knowledge. In the recent year, more and more approaches were introduced that try to automatically generate paragraphs of text, which read like a doctor’s report. However, data is really scarce and annotations cannot be gathered as easily as for tasks like generic image captioning or image classification, because domain experts are needed to create a textual impression of a patient’s chest x-ray. Second, real medical data has to conform to privacy laws and, therefore, anonymized. The only publicly available dataset, which combines chest x-ray images with doctor’s reports only contains 7470 sample, of which only half has a unique doctor’s report (there are mostly two chest x-ray images showing a different view per report). 


Two examples from the Indiana University Chest X-Ray collection. The upper row shows a normal case without findings, while the bottom row shows a case with findings. We highlighted the sentences with our human abnormality annotation, i.e., normal sentences are highlighted in blue and abnormal sentences are written in green.


In our research, we focus on correctly identifying abnormalities, as the fraction of sentences describing the abnormalities are very rare. We want to improve the captioning quality on a correct identification of abnormalities, and, not based on a machine translation metric like BLEU.



  • Harzig, Philipp, et al. "Addressing data bias problems for chest x-ray image report generation." arXiv preprint arXiv:1908.02123 (2019). [ PDF]

For more information please contact  Philipp Harzig.