Contextual object detection via image inpainting
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
views
downloads
Series
Abstract
Object detection in aerial imagery is a critical task in computer vision with ap plications in urban monitoring, disaster management, and military surveillance. Current approaches often encounter challenges, such as a lack of representative features of objects with cluttered backgrounds. To address these challenges, we introduce Contextual Object Detection via Image Inpainting (CODI), a novel approach that extract the representative features of the image’s semantic context derived from the transformer units of the inpainting model, and our proposed fusion module learns how to inject these contextual features with the representa tive features of the objects obtained from the related Feature Pyramid Network (FPN) of the object detection model. The outputs, which we called Contextual Pyramid Network (CPN), of the fusion model are used to replace the original FPN layers for better localization and labeling accuracy over the objects. Our experiments were conducted on DOTA and HRSC2016 datasets. Our model achieved promising results on mAP by prevailing over Oriented RCNN, which is one of the best-performing models on the DOTA dataset. Specifically, by apply ing the same pre-processing and post-processing, CODI achieved an improvement of 0.67% in mean average precision (mAP) at single-scale and 0.6% at multi-scale over the Oriented RCNN on the test set. On the HRSC2016 dataset, CODI achieved an impressive mAP of 90.57%, outperforming Oriented RCNN by 0.27 percentage points. This result places CODI among the top-performing models using a ResNet-50 backbone. In that sense, our approach provides a foundation for future applications, paving the way for more precise object localization and identification in remote sensing.