Lexical cohesion analysis for topic segmentation, summarization and keyphrase extraction

buir.advisorCan, Fazlı
dc.contributor.authorErcan, Gönenç
dc.date.accessioned2016-07-01T11:10:12Z
dc.date.available2016-07-01T11:10:12Z
dc.date.issued2012
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.description.abstractWhen we express some idea or story, it is inevitable to use words that are semantically related to each other. When this phenomena is exploited from the aspect of words in the language, it is possible to infer the level of semantic relationship between words by observing their distribution and use in discourse. From the aspect of discourse it is possible to model the structure of the document by observing the changes in the lexical cohesion in order to attack high level natural language processing tasks. In this research lexical cohesion is investigated from both of these aspects by first building methods for measuring semantic relatedness of word pairs and then using these methods in the tasks of topic segmentation, summarization and keyphrase extraction. Measuring semantic relatedness of words requires prior knowledge about the words. Two different knowledge-bases are investigated in this research. The first knowledge base is a manually built network of semantic relationships, while the second relies on the distributional patterns in raw text corpora. In order to discover which method is effective in lexical cohesion analysis, a comprehensive comparison of state-of-the art methods in semantic relatedness is made. For topic segmentation different methods using some form of lexical cohesion are present in the literature. While some of these confine the relationships only to word repetition or strong semantic relationships like synonymy, no other work uses the semantic relatedness measures that can be calculated for any two word pairs in the vocabulary. Our experiments suggest that topic segmentation performance improves methods using both classical relationships and word repetition. Furthermore, the experiments compare the performance of different semantic relatedness methods in a high level task. The detected topic segments are used in summarization, and achieves better results compared to a lexical chains based method that uses WordNet. Finally, the use of lexical cohesion analysis in keyphrase extraction is investigated. Previous research shows that keyphrases are useful tools in document retrieval and navigation. While these point to a relation between keyphrases and document retrieval performance, no other work uses this relationship to identify keyphrases of a given document. We aim to establish a link between the problems of query performance prediction (QPP) and keyphrase extraction. To this end, features used in QPP are evaluated in keyphrase extraction using a Naive Bayes classifier. Our experiments indicate that these features improve the effectiveness of keyphrase extraction in documents of different length. More importantly, commonly used features of frequency and first position in text perform poorly on shorter documents, whereas QPP features are more robust and achieve better results.en_US
dc.description.degreePh.D.en_US
dc.description.statementofresponsibilityErcan, Gönençen_US
dc.format.extentxviii, 151 leavesen_US
dc.identifier.itemidB134797
dc.identifier.urihttp://hdl.handle.net/11693/29994
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectLexical Cohesionen_US
dc.subjectSemantic Relatednessen_US
dc.subjectTopic Segmentationen_US
dc.subjectSummarizationen_US
dc.subjectKeyphrase Extractionen_US
dc.subject.lccQA76.9.T48 E73 2012en_US
dc.subject.lcshText processing (Computer science)en_US
dc.titleLexical cohesion analysis for topic segmentation, summarization and keyphrase extractionen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006230.pdf
Size:
2.21 MB
Format:
Adobe Portable Document Format
Description:
Full printable version