Document ranking by graph based lexical cohesion and term proximity computation

Gürkök, Hayrettin

Document ranking by graph based lexical cohesion and term proximity computation

Files

0003598.pdf (909.47 KB)

Date

2008

Authors

Gürkök, Hayrettin

Advisor

Karamüftüoğlu, Murat

BUIR Usage Stats

3
views

15
downloads

Abstract

During the course of reading, the meaning of each word is processed in the context of the meaning of the preceding words in text. Traditional IR systems usually adopt index terms to index and retrieve documents. Unfortunately, a lot of the semantics in a document or query is lost when the text is replaced with just a set of words (bag-of-words). This makes it mandatory to adapt linguistic theories and incorporate language processing techniques into IR tasks. The occurrences of index terms in a document are motivated. Frequently, in a document, the appearance of one word attracts the appearance of another. This can occur in forms of short-distance relationships (proximity) like common noun phrases as well as long-distance relationships (transitivity) defined as lexical cohesion in text. Much of the work done on determining context is based on estimating either long-distance or short-distance word relationships in a document. This work proposes a graph representation for documents and a new matching function based on this representation. By the use of graphs, it is possible to capture both short- and long-distance relationships in a single entity to calculate an overall context score. Experiments made on three TREC document collections showed significant performance improvements over the benchmark, Okapi BM25, retrieval model. Additionally, linguistic implications about the nature and trend of cohesion between query terms were achieved.

Keywords

Information retrieval, Lexical cohesion, Term proximity, Collocation

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/14737

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Document ranking by graph based lexical cohesion and term proximity computation

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Document ranking by graph based lexical cohesion and term proximity computation

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type