Learning visual similarity for image retrieval with global descriptors and capsule networks

Durmuş, Duygu; Güdükbay, Uğur; Ulusoy, Özgür

Learning visual similarity for image retrieval with global descriptors and capsule networks

buir.contributor.author	Durmuş, Duygu
buir.contributor.author	Güdükbay, Uğur
buir.contributor.author	Ulusoy, Özgür
buir.contributor.orcid	Güdükbay, Uğur\|0000-0003-2462-6959
buir.contributor.orcid	Ulusoy, Özgür\|0000-0002-6887-3778
dc.citation.epage	20263
dc.citation.issueNumber	7
dc.citation.spage	20243
dc.citation.volumeNumber	83
dc.contributor.author	Durmuş, Duygu
dc.contributor.author	Güdükbay, Uğur
dc.contributor.author	Ulusoy, Özgür
dc.date.accessioned	2025-02-21T19:14:47Z
dc.date.available	2025-02-21T19:14:47Z
dc.date.issued	2024-02
dc.department	Department of Computer Engineering
dc.description.abstract	Finding matching images across large and unstructured datasets is vital in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks, such as image retrieval, have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. Capsule Networks enable learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. We propose a framework that combines the power of global descriptors and Capsule Networks by benefiting from the information of multiple views of images to enhance the image retrieval performance. The Spatial Grouping Enhance strategy, which enhances sub-features parallelly, and self-attention layers, which explore global dependencies within internal representations of images, are utilized to empower the image representations. The approach captures resemblances between similar images and differences between non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. The results are superior to the state-of-the-art approaches for the Stanford Online Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, and 1000, respectively.
dc.description.provenance	Submitted by Muhammed Murat Uçar (murat.ucar@bilkent.edu.tr) on 2025-02-21T19:14:47Z No. of bitstreams: 1 Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf: 2836223 bytes, checksum: 102c5b45297cd2b3caccfda3732dcf34 (MD5)	en
dc.description.provenance	Made available in DSpace on 2025-02-21T19:14:47Z (GMT). No. of bitstreams: 1 Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf: 2836223 bytes, checksum: 102c5b45297cd2b3caccfda3732dcf34 (MD5) Previous issue date: 2024-02	en
dc.identifier.doi	10.1007/s11042-023-16164-5
dc.identifier.eissn	1573-7721
dc.identifier.issn	1380-7501
dc.identifier.uri	https://hdl.handle.net/11693/116604
dc.language.iso	English
dc.publisher	Springer New York LLC
dc.relation.isversionof	https://dx.doi.org/10.1007/s11042-023-16164-5
dc.rights	CC BY 4.0 Deed (Attribution 4.0 International)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.source.title	Multimedia Tools and Applications
dc.subject	Capsule networks
dc.subject	Cost-sensitive regularized cross-entropy loss
dc.subject	Deep learning
dc.subject	Global descriptors
dc.subject	Image retrieval
dc.subject	Neural networks
dc.subject	Triplet loss
dc.title	Learning visual similarity for image retrieval with global descriptors and capsule networks
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf
Size:: 2.7 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Computer Engineering