Learning visual similarity for image retrieval with global descriptors and capsule networks

buir.contributor.authorDurmuş, Duygu
buir.contributor.authorGüdükbay, Uğur
buir.contributor.authorUlusoy, Özgür
buir.contributor.orcidGüdükbay, Uğur|0000-0003-2462-6959
buir.contributor.orcidUlusoy, Özgür|0000-0002-6887-3778
dc.citation.epage20263
dc.citation.issueNumber7
dc.citation.spage20243
dc.citation.volumeNumber83
dc.contributor.authorDurmuş, Duygu
dc.contributor.authorGüdükbay, Uğur
dc.contributor.authorUlusoy, Özgür
dc.date.accessioned2025-02-21T19:14:47Z
dc.date.available2025-02-21T19:14:47Z
dc.date.issued2024-02
dc.departmentDepartment of Computer Engineering
dc.description.abstractFinding matching images across large and unstructured datasets is vital in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks, such as image retrieval, have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. Capsule Networks enable learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. We propose a framework that combines the power of global descriptors and Capsule Networks by benefiting from the information of multiple views of images to enhance the image retrieval performance. The Spatial Grouping Enhance strategy, which enhances sub-features parallelly, and self-attention layers, which explore global dependencies within internal representations of images, are utilized to empower the image representations. The approach captures resemblances between similar images and differences between non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. The results are superior to the state-of-the-art approaches for the Stanford Online Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, and 1000, respectively.
dc.description.provenanceSubmitted by Muhammed Murat Uçar (murat.ucar@bilkent.edu.tr) on 2025-02-21T19:14:47Z No. of bitstreams: 1 Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf: 2836223 bytes, checksum: 102c5b45297cd2b3caccfda3732dcf34 (MD5)en
dc.description.provenanceMade available in DSpace on 2025-02-21T19:14:47Z (GMT). No. of bitstreams: 1 Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf: 2836223 bytes, checksum: 102c5b45297cd2b3caccfda3732dcf34 (MD5) Previous issue date: 2024-02en
dc.identifier.doi10.1007/s11042-023-16164-5
dc.identifier.eissn1573-7721
dc.identifier.issn1380-7501
dc.identifier.urihttps://hdl.handle.net/11693/116604
dc.language.isoEnglish
dc.publisherSpringer New York LLC
dc.relation.isversionofhttps://dx.doi.org/10.1007/s11042-023-16164-5
dc.rightsCC BY 4.0 Deed (Attribution 4.0 International)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.source.titleMultimedia Tools and Applications
dc.subjectCapsule networks
dc.subjectCost-sensitive regularized cross-entropy loss
dc.subjectDeep learning
dc.subjectGlobal descriptors
dc.subjectImage retrieval
dc.subjectNeural networks
dc.subjectTriplet loss
dc.titleLearning visual similarity for image retrieval with global descriptors and capsule networks
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Learning_visual_similarity_for_image_retrieval_with_global_descriptors_and_capsule_networks.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: