Learning visual similarity for image retrieval with global descriptors and capsule networks

Durmuş, Duygu; Güdükbay, Uğur; Ulusoy, Özgür

Learning visual similarity for image retrieval with global descriptors and capsule networks

Files

Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf (2.7 MB)

Date

2023-07-31

Authors

Durmuş, Duygu

Güdükbay, Uğur

Ulusoy, Özgür

Source Title

Multimedia Tools and Applications

Print ISSN

1380-7501

Electronic ISSN

1573-7721

Publisher

Springer

Volume

83

Pages

20243 - 20263

Language

en

Type

Article

Abstract

Finding matching images across large and unstructured datasets is vital in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks, such as image retrieval, have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. Capsule Networks enable learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. We propose a framework that combines the power of global descriptors and Capsule Networks by benefiting from the information of multiple views of images to enhance the image retrieval performance. The Spatial Grouping Enhance strategy, which enhances sub-features parallelly, and self-attention layers, which explore global dependencies within internal representations of images, are utilized to empower the image representations. The approach captures resemblances between similar images and differences between non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. The results are superior to the state-of-the-art approaches for the Stanford Online Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, and 1000, respectively.