Diversity-aware strategies for static index pruning

Limited Access
This item is unavailable until:
2026-05-30

Date

2024-05-30

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Citation Stats

Series

Abstract

Static index pruning aims to remove redundant parts of an index to reduce the file size and query processing time. In this paper, we focus on the impact of index pruning on the topical diversity of query results obtained over these pruned indexes, due to the emergence of diversity as an important metric of quality in modern search systems. We hypothesize that typical index pruning strategies are likely to harm result diversity, as the latter dimension has been vastly overlooked while designing and evaluating such methods. As a remedy, we introduce three novel diversity-aware pruning strategies aimed at maintaining the diversity effectiveness of query results. In addition to other widely used features, our strategies exploit document clustering methods and word-embeddings to assess the possible impact of index elements on the topical diversity, and to guide the pruning process accordingly. Our thorough experimental evaluations verify that typical index pruning strategies lead to a substantial decline (i.e., up to 50% for some metrics) in the diversity of the results obtained over the pruned indexes. Our diversity-aware approaches remedy such losses to a great extent, and yield more diverse query results, for which scores of the various diversity metrics are closer to those obtained over the full index. Specifically, our best-performing strategy provides gains in result diversity reaching up to 2.9%, 3.0%, 7.5%, and 3.9% wrt. the strongest baseline, in terms of the ERR-IA, alpha-nDCG, P-IA, and ST-Recall metrics (at the cut-off value of 20), respectively. The proposed strategies also yield better scores in terms of an entropy-based fairness metric, confirming the correlation between topical diversity and fairness in this setup.

Source Title

Information Processing & Management

Publisher

Elsevier Ltd

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

English