An information-based approach to punctuation

buir.advisorAkman, Varol
dc.contributor.authorSay, Bilge
dc.date.accessioned2016-01-08T20:19:37Z
dc.date.available2016-01-08T20:19:37Z
dc.date.copyright1998
dc.date.issued1998
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionAnkara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1998.en_US
dc.descriptionThesis (Ph. D.) -- Bilkent University, 1998.en_US
dc.descriptionIncludes bibliographical references (leaves 83-93).en_US
dc.descriptionCataloged from PDF version of article.
dc.description.abstractPunctuation marks have special importance in bringing out the meaning of a text. Geoffrey Nunberg's 1990 monograph bridged the gap between descriptive treatments of punctuation and perspective accounts, by spelling out the features of a text-grammar for the orthographic sentence. His research inspired most of the recent work concentrating on punctuation marks in Natural Language Processing (NLP). Several grammars incorporating punctuation were then shown to reduce failures and ambiguities in parsing. Numberg's approach to punctuation (and other formatting devices) was partially incorporated into natural language generation systems. However, little has been done concerning how punctuation marks bring semantic and discourse cues to the text and whether these can be exploited computationally. The aim of this thesis is to analyse the semantic and discourse aspects of punctuation marks, within the framework of Hans Kamp and Uwe Reyle's Discourse Representation Theory (DRT) and its extension by Nicholas Asher, Segmented Discourse Representation Theory (SDRT), drawing implications for NLP systems. The method used is the extraction of patterns for four common punctuation marks (dashes, semicolons, colons, and parentheses) from corpora, followed by formal modeling and a modest computational prototype. Our observations and results have revealed interesting occurrences of linguistic phenomena, such as anaphora resolution and presupposition, in conjunction with punctuation marks. Within the framework of SDRT such occurrences are then tied with the overall discourse structure. The proposed model can be taken as a template for NLP software developers for making use of the punctuation marks more effectively. Overall, the thesis describes the contribution of punctuation at the orthographic sentence level to the information passed on to the reader of a text.
dc.description.degreePh.D.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T20:19:37Z (GMT). No. of bitstreams: 1 1.pdf: 78510 bytes, checksum: d85492f20c2362aa2bcf4aad49380397 (MD5)en
dc.description.statementofresponsibilityby Bilge Sayen_US
dc.format.extentxvi, 96 leaves : charts ; 30 cm.en_US
dc.identifier.itemidBILKUTUPB045144
dc.identifier.urihttp://hdl.handle.net/11693/18467
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectPunctuation
dc.subjectDiscourse
dc.subject(Segmented) Discourse Representation Theory [(S)DRT]
dc.subjectInformation structure
dc.subjectCorpora
dc.subjectNatural Language Processing (NLP)
dc.titleAn information-based approach to punctuationen_US
dc.title.alternativeNoktalamaya enformasyon temelli bir yaklaşım
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B045144.pdf
Size:
3.14 MB
Format:
Adobe Portable Document Format
Description:
Full printable version