Local context based linear text segmentation

buir.advisorCan, Fazlı
dc.contributor.authorErdem, Hayrettin
dc.date.accessioned2023-07-10T06:53:15Z
dc.date.available2023-07-10T06:53:15Z
dc.date.copyright2014-02
dc.date.issued2014-02
dc.date.submitted2014-03-14
dc.departmentDepartment of Computer Engineering
dc.descriptionCataloged from PDF version of article.
dc.descriptionThesis (Master's): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2014.
dc.descriptionIncludes bibliogaphical references (leaves 57-63).
dc.description.abstractUnderstanding the topical structure of text documents is important for effective retrieval and browsing, automatic summarization, and tasks related to identifying, clustering and tracking documents about their topics. Despite documents often display structural organization and contain explicit section markers, some lack of such properties thereby revealing the need for topical text segmentation systems. Examples of such documents are speech transcripts and inherently un-structured texts like newspaper columns and blog entries discussing several sub-jects in a discourse. A novel local-context based approach depending on lexical cohesion is presented for linear text segmentation, which is the task of dividing text into a linear sequence of coherent segments. As the lexical cohesion indicator, the proposed technique exploits relationships among terms induced from semantic space called HAL (Hyperspace Analogue to Language), which is built upon by examining co-occurrence of terms through passing a fixed-sized window over text. The proposed algorithm (BTS) iteratively discovers topical shifts by examining the most relevant sentence pairs in a block of sentences considered at each iteration. The technique is evaluated on both error-free speech transcripts of news broadcasts and documents formed by concatenating different topical regions of text. A new corpus for Turkish is automatically built where each document is formed by concatenating different news articles. For performance comparison, two state-of-the-art methods, TextTiling and C99, are leveraged, and the results show that the proposed approach has comparable performance with these two techniques. The results are also statistically validated by applying the ANOVA and Tukey post–hoc test.
dc.description.degreeM.S.
dc.description.statementofresponsibilityby Hayrettin Erdem
dc.embargo.release2016-03-14
dc.format.extentxii, 63 leaves : tables, graphics ; 30 cm.
dc.identifier.itemidB125674
dc.identifier.urihttps://hdl.handle.net/11693/112390
dc.publisherBilkent University
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectText segmentation
dc.subjectTopic segmentation
dc.subjectNatural language processing
dc.subjectLexical cohesian
dc.subjectSemantic relatedness
dc.titleLocal context based linear text segmentation
dc.title.alternativeYerel içerik tabanlı konusal metin bölümlendirme
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B125674.pdf
Size:
450.14 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: