Fine-grained object recognition in remote sensing imagery
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/47579
Fine-grained object recognition aims to determine the type of an object in domains with a large number of sub-categories. The steadily increase in spatial and spectral resolution entailing new details in remote sensing image data, and consequently more diversi ed target object classes having subtle di erences makes it an emerging application. For the approaches using images from a single domain, widespread fully supervised algorithms do not completely t into accomplishing this problem since target object classes tend to have low between-class variance and high within-class variance with small sample sizes. As an even more arduous task, a method for zero-shot learning (ZSL), in which identi cation of unseen sub-categories is tackled by associating them with previously learned seen subcategories when there is no training example for some of the classes, is proposed. More speci cally, our method learns a compatibility function between image representation obtained from a deep convolutional neural network and the semantics of target object sub-categories explained by auxiliary information gathered from complementary sources. Knowledge transfer for unseen classes is carried out by maximizing this function throughout the inference. Furthermore, bene tting from multiple image sensors can overcome the drawbacks of closely intertwined sub-categories that limits the object recognition performance. However, since multiple images may be acquired from di erent sensors under di erent conditions at di erent spatial and spectral resolutions, they may be geometrically unaligned correctly due to seasonal changes, di erent viewing geometry, acquisition noise, an imperfection of sensors, di erent atmospheric conditions etc. To address these challenges, a neural network model that aims to correctly align images acquired from di erent sources and to learn the classi cation rules in a uni ed framework simultaneously is proposed. In this network, one of the sources is used as the reference and the others are aligned with the reference image at representation level throughout a learned weighting mechanism. At the end, classi cation of sub-categories is carried out with a feature-level fusion of representations from the source region and estimated multiple target regions. Experimental analysis conducted on a newly proposed data set shows that both zero-shot learning algorithm and the multi-source ne-grained object recognition algorithm give promising results.