We are developing a classification framework for digital images which is capable of identifying images which belong to a certain class. In other words we want to design filters which find images in a given database which feature certain content (e.g. brand logos).
However, our framework should learn class models in an unsupervised manner. The user is only required to provide images which contain some common object or concept as positive training examples without further annotation or knowledge.
Our framework then finds common properties of the positive training images based on color and visual words. Thus it consists of two main stages: A color-based pre-filter (or region of interest detector) and a classifier trained on histograms of visual words ("bags-of-words").


If we want to apply color-based filters we have to make the assumption that the objects we want to identify have a distinctive color distribution. That is, all instances of the object appear in a reasonably small number of different colors.
Since we want the learning process of the color model to be unsupervised, we are confronted with two major problems: First we have to identify the colors of the object without manual annotation. Second, we have to deal with color deviations due to different lighting conditions.
Besides it is not straightforward to classify images or localize objects based on color models.



Unsupervised detection of region of interest for brand logo based on color histogram.


The second stage of our framework uses bag-of-words models to classify images. We compute spatial histograms of visual words for positive and negative training images and then train a binary classifier using these histograms. Since we want to find positive images among large scale databases we aim for a very low false positive rate. Thus, for classification we opt for a cascade of AdaBoost classifiers.
Obviously there is a vast number of choices to be made which influence the classification performance. For instance, many different local feature descriptors exist which can be used for the bag-of-words model. Also, the clustering process which yields our visual vocabulary and the AdaBoost classifier depend on many parameters. Therefore our main research focus is on finding the optimal configuration and evaluating novel enhancements.


For more information please contact  Christian Ries.