Browsing by Subject "Lexical cohesion"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Open Access Document ranking by graph based lexical cohesion and term proximity computation(2008) Gürkök, HayrettinDuring the course of reading, the meaning of each word is processed in the context of the meaning of the preceding words in text. Traditional IR systems usually adopt index terms to index and retrieve documents. Unfortunately, a lot of the semantics in a document or query is lost when the text is replaced with just a set of words (bag-of-words). This makes it mandatory to adapt linguistic theories and incorporate language processing techniques into IR tasks. The occurrences of index terms in a document are motivated. Frequently, in a document, the appearance of one word attracts the appearance of another. This can occur in forms of short-distance relationships (proximity) like common noun phrases as well as long-distance relationships (transitivity) defined as lexical cohesion in text. Much of the work done on determining context is based on estimating either long-distance or short-distance word relationships in a document. This work proposes a graph representation for documents and a new matching function based on this representation. By the use of graphs, it is possible to capture both short- and long-distance relationships in a single entity to calculate an overall context score. Experiments made on three TREC document collections showed significant performance improvements over the benchmark, Okapi BM25, retrieval model. Additionally, linguistic implications about the nature and trend of cohesion between query terms were achieved.Item Open Access A graph based approach to estimating lexical cohesion(ACM, 2008) Gürkök, Hayrettin; Karamuftuoglu, Murat; Schaal, MarkusTraditionally, information retrieval systems rank documents according to the query terms they contain. However, even if a document may contain all query terms, this does not guarantee that it is relevant to the query. The query terms can occur together in the same document, but may have been used in different contexts, expressing separate topics. Lexical cohesion is a characteristic of natural language texts, which can be used to determine whether the query terms are used in the same context in the document. In this paper we make use of a graph-based approach to capture term contexts and estimate the level of lexical cohesion in a document. To evaluate the performance of our system, we compare it against two benchmark systems using three TREC document collections. Copyright 2008 ACM.Item Open Access Lexical cohesion and term proximity in document ranking(Elsevier Ltd, 2008-07) Vechtomova, O.; Karamuftuoglu, M.We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocation relationship between query terms, and long-distance relationship, determined by the collocation of query terms with other words. The methods are evaluated on TREC corpora, and show improvements over baseline systems. © 2008 Elsevier Ltd. All rights reserved.Item Open Access Lexical cohesion based topic modeling for summarization(Springer, 2008-02) Ercan, Gönenç; Çiçekli, İlyasIn this paper, we attack the problem of forming extracts for text summarization. Forming extracts involves selecting the most representative and significant sentences from the text. Our method takes advantage of the lexical cohesion structure in the text in order to evaluate significance of sentences. Lexical chains have been used in summarization research to analyze the lexical cohesion structure and represent topics in a text. Our algorithm represents topics by sets of co-located lexical chains to take advantage of more lexical cohesion clues. Our algorithm segments the text with respect to each topic and finds the most important topic segments. Our summarization algorithm has achieved better results, compared to some other lexical chain based algorithms. © 2008 Springer-Verlag Berlin Heidelberg.Item Open Access On document relevance and lexical cohesion between query terms(Elsevier, 2006) Vechtomova, O.; Karamuftuoglu, M.; Robertson, S. E.Lexical cohesion is a property of text, achieved through lexical-semantic relations between words in text. Most information retrieval systems make use of lexical relations in text only to a limited extent. In this paper we empirically investigate whether the degree of lexical cohesion between the contexts of query terms' occurrences in a document is related to its relevance to the query. Lexical cohesion between distinct query terms in a document is estimated on the basis of the lexical-semantic relations (repetition, synonymy, hyponymy and sibling) that exist between there collocates - words that co-occur with them in the same windows of text. Experiments suggest significant differences between the lexical cohesion in relevant and non-relevant document sets exist. A document ranking method based on lexical cohesion shows some performance improvements. © 2006 Elsevier Ltd. All rights reserved.Item Open Access Query expansion with terms selected using lexical cohesion analysis of documents(Elsevier Ltd, 2007-07) Vechtomova, O.; Karamuftuoglu, M.We present new methods of query expansion using terms that form lexical cohesive links between the contexts of distinct query terms in documents (i.e., words surrounding the query terms in text). The link-forming terms (link-terms) and short snippets of text surrounding them are evaluated in both interactive and automatic query expansion (QE). We explore the effectiveness of snippets in providing context in interactive query expansion, compare query expansion from snippets vs. whole documents, and query expansion following snippet selection vs. full document relevance judgements. The evaluation, conducted on the HARD track data of TREC 2005, suggests that there are considerable advantages in using link-terms and their surrounding short text snippets in QE compared to terms selected from full-texts of documents. © 2006 Elsevier Ltd. All rights reserved.