Document ranking by graph based lexical cohesion and term proximity computation

Gürkök, Hayrettin

Document ranking by graph based lexical cohesion and term proximity computation

buir.advisor	Karamüftüoğlu, Murat
dc.contributor.author	Gürkök, Hayrettin
dc.date.accessioned	2016-01-08T18:06:37Z
dc.date.available	2016-01-08T18:06:37Z
dc.date.issued	2008
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references leaves 48-52.	en_US
dc.description.abstract	During the course of reading, the meaning of each word is processed in the context of the meaning of the preceding words in text. Traditional IR systems usually adopt index terms to index and retrieve documents. Unfortunately, a lot of the semantics in a document or query is lost when the text is replaced with just a set of words (bag-of-words). This makes it mandatory to adapt linguistic theories and incorporate language processing techniques into IR tasks. The occurrences of index terms in a document are motivated. Frequently, in a document, the appearance of one word attracts the appearance of another. This can occur in forms of short-distance relationships (proximity) like common noun phrases as well as long-distance relationships (transitivity) defined as lexical cohesion in text. Much of the work done on determining context is based on estimating either long-distance or short-distance word relationships in a document. This work proposes a graph representation for documents and a new matching function based on this representation. By the use of graphs, it is possible to capture both short- and long-distance relationships in a single entity to calculate an overall context score. Experiments made on three TREC document collections showed significant performance improvements over the benchmark, Okapi BM25, retrieval model. Additionally, linguistic implications about the nature and trend of cohesion between query terms were achieved.	en_US
dc.description.statementofresponsibility	Gürkök, Hayrettin	en_US
dc.format.extent	xi, 67 leaves, graphs	en_US
dc.identifier.itemid	BILKUTUPB109255
dc.identifier.uri	http://hdl.handle.net/11693/14737
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Information retrieval	en_US
dc.subject	Lexical cohesion	en_US
dc.subject	Term proximity	en_US
dc.subject	Collocation	en_US
dc.subject.lcc	QA76.9.T48 G87 2008	en_US
dc.subject.lcsh	Text processing (Computer science)	en_US
dc.subject.lcsh	Information storage and retrieval systems.	en_US
dc.subject.lcsh	Information retrieval.	en_US
dc.subject.lcsh	Collocation methods.	en_US
dc.title	Document ranking by graph based lexical cohesion and term proximity computation	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0003598.pdf
Size:: 909.47 KB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science