Cascaded cross entropy-based search result diversification

buir.advisorCan, Fazlı
dc.contributor.authorKöroğlu, Bilge
dc.date.accessioned2016-01-08T18:24:49Z
dc.date.available2016-01-08T18:24:49Z
dc.date.issued2012
dc.descriptionAnkara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2012.en_US
dc.descriptionIncludes bibliographical references leaves 82-86.en_US
dc.description.abstractSearch engines are used to find information on the web. Retrieving relevant documents for ambiguous queries based on query-document similarity does not satisfy the users because such queries have more than one different meaning. In this study, a new method, cascaded cross entropy-based search result diversification (CCED), is proposed to list the web pages corresponding to different meanings of the query in higher rank positions. It combines modified reciprocal rank and cross entropy measures to balance the trade-off between query-document relevancy and diversity among the retrieved documents. We use the Latent Dirichlet Allocation (LDA) algorithm to compute query-document relevancy scores. The number of different meanings of an ambiguous query is estimated by complete-link clustering. We construct the first Turkish test collection for result diversification, BILDIV-2012. The performance of CCED is compared with Maximum Marginal Relevance (MMR) and IA-Select algorithms. In this comparison, the Ambient, TREC Diversity Track, and BILDIV-2012 test collections are used. We also compare performance of these algorithms with those of Bing and Google. The results indicate that CCED is the most successful method in terms of satisfying the users interested in different meanings of the query in higher rank positions of the result list.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:24:49Z (GMT). No. of bitstreams: 1 0006504.pdf: 1991735 bytes, checksum: 6703e4fba051ba997a072e0b395b6198 (MD5)en
dc.description.statementofresponsibilityKöroğlu, Bilgeen_US
dc.format.extentxi, 89 leavesen_US
dc.identifier.itemidB133868
dc.identifier.urihttp://hdl.handle.net/11693/15799
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectAmbiguous Queryen_US
dc.subjectCross Entropyen_US
dc.subjectIA-Selecten_US
dc.subjectLatent Dirichlet Allocation (LDA)en_US
dc.subject.lccTK5105.884 .K67 2012en_US
dc.subject.lcshSearch engines--Programming.en_US
dc.subject.lcshWeb search engines--Mathematical models.en_US
dc.subject.lcshInformation storage and retrieval systems.en_US
dc.subject.lcshInformation retrieval.en_US
dc.subject.lcshInternet searching.en_US
dc.titleCascaded cross entropy-based search result diversificationen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006504.pdf
Size:
1.9 MB
Format:
Adobe Portable Document Format