Document ranking by graph based lexical cohesion and term proximity computation

buir.advisorKaramüftüoğlu, Murat
dc.contributor.authorGürkök, Hayrettin
dc.date.accessioned2016-01-08T18:06:37Z
dc.date.available2016-01-08T18:06:37Z
dc.date.issued2008
dc.descriptionAnkara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2008.en_US
dc.descriptionIncludes bibliographical references leaves 48-52.en_US
dc.description.abstractDuring the course of reading, the meaning of each word is processed in the context of the meaning of the preceding words in text. Traditional IR systems usually adopt index terms to index and retrieve documents. Unfortunately, a lot of the semantics in a document or query is lost when the text is replaced with just a set of words (bag-of-words). This makes it mandatory to adapt linguistic theories and incorporate language processing techniques into IR tasks. The occurrences of index terms in a document are motivated. Frequently, in a document, the appearance of one word attracts the appearance of another. This can occur in forms of short-distance relationships (proximity) like common noun phrases as well as long-distance relationships (transitivity) defined as lexical cohesion in text. Much of the work done on determining context is based on estimating either long-distance or short-distance word relationships in a document. This work proposes a graph representation for documents and a new matching function based on this representation. By the use of graphs, it is possible to capture both short- and long-distance relationships in a single entity to calculate an overall context score. Experiments made on three TREC document collections showed significant performance improvements over the benchmark, Okapi BM25, retrieval model. Additionally, linguistic implications about the nature and trend of cohesion between query terms were achieved.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:06:37Z (GMT). No. of bitstreams: 1 0003598.pdf: 931293 bytes, checksum: c6d094f013c027047df51965d6e0dd87 (MD5)en
dc.description.statementofresponsibilityGürkök, Hayrettinen_US
dc.format.extentxi, 67 leaves, graphsen_US
dc.identifier.itemidBILKUTUPB109255
dc.identifier.urihttp://hdl.handle.net/11693/14737
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectInformation retrievalen_US
dc.subjectLexical cohesionen_US
dc.subjectTerm proximityen_US
dc.subjectCollocationen_US
dc.subject.lccQA76.9.T48 G87 2008en_US
dc.subject.lcshText processing (Computer science)en_US
dc.subject.lcshInformation storage and retrieval systems.en_US
dc.subject.lcshInformation retrieval.en_US
dc.subject.lcshCollocation methods.en_US
dc.titleDocument ranking by graph based lexical cohesion and term proximity computationen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0003598.pdf
Size:
909.47 KB
Format:
Adobe Portable Document Format