Diversity-aware strategies for static index pruning

buir.contributor.authorUlusoy, , Özgür
buir.contributor.orcidUlusoy, Özgür|0000-0002-6887-3778
dc.citation.epage22
dc.citation.issueNumber5
dc.citation.spage1
dc.citation.volumeNumber61
dc.contributor.authorYiğit-Sert, Sevgi
dc.contributor.authorAltıngövde, İsmail Şengör
dc.contributor.authorUlusoy, Özgür
dc.date.accessioned2025-02-21T11:32:11Z
dc.date.available2025-02-21T11:32:11Z
dc.date.issued2024-05-30
dc.departmentDepartment of Computer Engineering
dc.description.abstractStatic index pruning aims to remove redundant parts of an index to reduce the file size and query processing time. In this paper, we focus on the impact of index pruning on the topical diversity of query results obtained over these pruned indexes, due to the emergence of diversity as an important metric of quality in modern search systems. We hypothesize that typical index pruning strategies are likely to harm result diversity, as the latter dimension has been vastly overlooked while designing and evaluating such methods. As a remedy, we introduce three novel diversity-aware pruning strategies aimed at maintaining the diversity effectiveness of query results. In addition to other widely used features, our strategies exploit document clustering methods and word-embeddings to assess the possible impact of index elements on the topical diversity, and to guide the pruning process accordingly. Our thorough experimental evaluations verify that typical index pruning strategies lead to a substantial decline (i.e., up to 50% for some metrics) in the diversity of the results obtained over the pruned indexes. Our diversity-aware approaches remedy such losses to a great extent, and yield more diverse query results, for which scores of the various diversity metrics are closer to those obtained over the full index. Specifically, our best-performing strategy provides gains in result diversity reaching up to 2.9%, 3.0%, 7.5%, and 3.9% wrt. the strongest baseline, in terms of the ERR-IA, alpha-nDCG, P-IA, and ST-Recall metrics (at the cut-off value of 20), respectively. The proposed strategies also yield better scores in terms of an entropy-based fairness metric, confirming the correlation between topical diversity and fairness in this setup.
dc.description.provenanceSubmitted by Zeliha Bucak Çelik (zeliha.celik@bilkent.edu.tr) on 2025-02-21T11:32:11Z No. of bitstreams: 1 Diversity-aware_strategies_for_static_index_pruning.pdf: 1569961 bytes, checksum: c736ec54c82ba3682977f6eaddc6700a (MD5)en
dc.description.provenanceMade available in DSpace on 2025-02-21T11:32:11Z (GMT). No. of bitstreams: 1 Diversity-aware_strategies_for_static_index_pruning.pdf: 1569961 bytes, checksum: c736ec54c82ba3682977f6eaddc6700a (MD5) Previous issue date: 2024-05-30en
dc.embargo.release2026-05-30
dc.identifier.doi10.1016/j.ipm.2024.103795
dc.identifier.eissn0306-4573
dc.identifier.issn0306-4573
dc.identifier.urihttps://hdl.handle.net/11693/116561
dc.language.isoEnglish
dc.publisherElsevier Ltd
dc.relation.isversionofhttps://dx.doi.org/10.1016/j.ipm.2024.103795
dc.source.titleInformation Processing & Management
dc.subjectQuery result diversity
dc.subjectStatic index pruning
dc.subjectQuery processing efficiency
dc.titleDiversity-aware strategies for static index pruning
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Diversity-aware_strategies_for_static_index_pruning.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: