Erdem, Hayrettin2023-07-102023-07-102014-022014-022014-03-14https://hdl.handle.net/11693/112390Cataloged from PDF version of article.Thesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2014.Includes bibliogaphical references (leaves 57-63).Understanding the topical structure of text documents is important for effective retrieval and browsing, automatic summarization, and tasks related to identifying, clustering and tracking documents about their topics. Despite documents often display structural organization and contain explicit section markers, some lack of such properties thereby revealing the need for topical text segmentation systems. Examples of such documents are speech transcripts and inherently un-structured texts like newspaper columns and blog entries discussing several sub-jects in a discourse. A novel local-context based approach depending on lexical cohesion is presented for linear text segmentation, which is the task of dividing text into a linear sequence of coherent segments. As the lexical cohesion indicator, the proposed technique exploits relationships among terms induced from semantic space called HAL (Hyperspace Analogue to Language), which is built upon by examining co-occurrence of terms through passing a fixed-sized window over text. The proposed algorithm (BTS) iteratively discovers topical shifts by examining the most relevant sentence pairs in a block of sentences considered at each iteration. The technique is evaluated on both error-free speech transcripts of news broadcasts and documents formed by concatenating different topical regions of text. A new corpus for Turkish is automatically built where each document is formed by concatenating different news articles. For performance comparison, two state-of-the-art methods, TextTiling and C99, are leveraged, and the results show that the proposed approach has comparable performance with these two techniques. The results are also statistically validated by applying the ANOVA and Tukey post–hoc test.xii, 63 leaves : tables, graphics ; 30 cm.info:eu-repo/semantics/openAccessText segmentationTopic segmentationNatural language processingLexical cohesianSemantic relatednessLocal context based linear text segmentationYerel içerik tabanlı konusal metin bölümlendirmeThesisB125674