ConceptMap: mining noisy web data for concept learning
We attack the problem of learning concepts automatically from noisy Web image search results. The idea is based on discovering common characteristics shared among subsets of images by posing a method that is able to organise the data while eliminating irrelevant instances. We propose a novel clustering and outlier detection method, namely Concept Map (CMAP). Given an image collection returned for a concept query, CMAP provides clusters pruned from outliers. Each cluster is used to train a model representing a different characteristics of the concept. The proposed method outperforms the state-of-the-art studies on the task of learning from noisy web data for low-level attributes, as well as high level object categories. It is also competitive with the supervised methods in learning scene concepts. Moreover, results on naming faces support the generalisation capability of the CMAP framework to different domains. CMAP is capable to work at large scale with no supervision through exploiting the available sources. © 2014 Springer International Publishing.