An information-based approach to punctuation

Say, Bilge

An information-based approach to punctuation

buir.advisor	Akman, Varol
dc.contributor.author	Say, Bilge
dc.date.accessioned	2016-01-08T20:19:37Z
dc.date.available	2016-01-08T20:19:37Z
dc.date.copyright	1998
dc.date.issued	1998
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 83-93).	en_US
dc.description.abstract	Punctuation marks have special importance in bringing out the meaning of a text. Geoffrey Nunberg's 1990 monograph bridged the gap between descriptive treatments of punctuation and perspective accounts, by spelling out the features of a text-grammar for the orthographic sentence. His research inspired most of the recent work concentrating on punctuation marks in Natural Language Processing (NLP). Several grammars incorporating punctuation were then shown to reduce failures and ambiguities in parsing. Numberg's approach to punctuation (and other formatting devices) was partially incorporated into natural language generation systems. However, little has been done concerning how punctuation marks bring semantic and discourse cues to the text and whether these can be exploited computationally. The aim of this thesis is to analyse the semantic and discourse aspects of punctuation marks, within the framework of Hans Kamp and Uwe Reyle's Discourse Representation Theory (DRT) and its extension by Nicholas Asher, Segmented Discourse Representation Theory (SDRT), drawing implications for NLP systems. The method used is the extraction of patterns for four common punctuation marks (dashes, semicolons, colons, and parentheses) from corpora, followed by formal modeling and a modest computational prototype. Our observations and results have revealed interesting occurrences of linguistic phenomena, such as anaphora resolution and presupposition, in conjunction with punctuation marks. Within the framework of SDRT such occurrences are then tied with the overall discourse structure. The proposed model can be taken as a template for NLP software developers for making use of the punctuation marks more effectively. Overall, the thesis describes the contribution of punctuation at the orthographic sentence level to the information passed on to the reader of a text.
dc.description.statementofresponsibility	by Bilge Say	en_US
dc.format.extent	xvi, 96 leaves : charts ; 30 cm.	en_US
dc.identifier.itemid	BILKUTUPB045144
dc.identifier.uri	http://hdl.handle.net/11693/18467
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Punctuation
dc.subject	Discourse
dc.subject	(Segmented) Discourse Representation Theory [(S)DRT]
dc.subject	Information structure
dc.subject	Corpora
dc.subject	Natural Language Processing (NLP)
dc.title	An information-based approach to punctuation	en_US
dc.title.alternative	Noktalamaya enformasyon temelli bir yaklaşım
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Doctoral
thesis.degree.name	Ph.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: B045144.pdf
Size:: 3.14 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Graduate School of Engineering and Science