Browsing by Subject "Natural Language Processing (NLP)"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Open Access An information-based approach to punctuation(1998) Say, BilgePunctuation marks have special importance in bringing out the meaning of a text. Geoffrey Nunberg's 1990 monograph bridged the gap between descriptive treatments of punctuation and perspective accounts, by spelling out the features of a text-grammar for the orthographic sentence. His research inspired most of the recent work concentrating on punctuation marks in Natural Language Processing (NLP). Several grammars incorporating punctuation were then shown to reduce failures and ambiguities in parsing. Numberg's approach to punctuation (and other formatting devices) was partially incorporated into natural language generation systems. However, little has been done concerning how punctuation marks bring semantic and discourse cues to the text and whether these can be exploited computationally. The aim of this thesis is to analyse the semantic and discourse aspects of punctuation marks, within the framework of Hans Kamp and Uwe Reyle's Discourse Representation Theory (DRT) and its extension by Nicholas Asher, Segmented Discourse Representation Theory (SDRT), drawing implications for NLP systems. The method used is the extraction of patterns for four common punctuation marks (dashes, semicolons, colons, and parentheses) from corpora, followed by formal modeling and a modest computational prototype. Our observations and results have revealed interesting occurrences of linguistic phenomena, such as anaphora resolution and presupposition, in conjunction with punctuation marks. Within the framework of SDRT such occurrences are then tied with the overall discourse structure. The proposed model can be taken as a template for NLP software developers for making use of the punctuation marks more effectively. Overall, the thesis describes the contribution of punctuation at the orthographic sentence level to the information passed on to the reader of a text.