Browsing by Subject "Punctuation"

Now showing 1 - 5 of 5

Open Access
An analysis of english punctuation: the special case of comma
(John Benjamins Publishing Co., 1998) Bayraktar, M.; Say, B.; Akman, V.
Punctuation has usually been ignored by researchers in computational linguistics over the years. Recently, it has been realized that a true understanding of written language will be impossible if punctuation marks are not taken into account. This paper contains the details of a computer-aided exercise to investigate English punctuation practice for the special case of comma (the most significant punctuation mark) in a parsed corpus. The study classifies the various ‘structural’ uses of comma according to the syntaxpatterns in which comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about the individual words.
Open Access
Computer-aided analysis of English punctuation on a parsed corpus: the special case of comma
(1996) Bayraktar, Murat
Punctuation, an orthographical component of language, has usually been ignored by most research in computational linguistics over the years. One reason for this is the overall difficulty of the subject, and another is the absence of a good theory. On the other hand, both ‘conventional’ and computational linguistics have increased their attention to punctuation in recent years because it has been realized that true understanding and processing of written language will be almost impossible if punctuation marks are not taken into account. Except the lists of rules given in style manuals or usage books, we know little about punctuation. These books give us information about how we should punctuate, but they are generally silent about the actual punctuation practice. This thesis contains the details of a computer-aided experiment to investigate English punctuation practice, for the special case of comma (the most significant punctuation mark) in a parsed corpus. The experiment attempts to classify the various uses of comma according to the syntax-patterns in which comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about individual words, and this ideally seems to be enough to classify ‘structural’ punctuation marks.
Open Access
Current approaches to punctuation in computational linguistics
(Springer/, 1997) Say, B.; Akman, V.
Some recent studies in computational linguistics have aimed to take advantage of various cues presented by punctuation marks. This short survey is intended to summarise these research efforts and additionally, to outline a current perspective for the usage and functions of punctuation marks. We conclude by presenting an information-based framework for punctuation, influenced by treatments of several related phenomena in computational linguistics. © 1997 Kluwer Academic Publishers.
Open Access
Information-based approach to punctuation
(AAAI, 1997-07) Say, Bilge
This thesis analyzes, in an information-based framework, the semantic and discourse aspects of punctuation, drawing computational implications for Natural Language Processing (NLP) systems. The Discourse Representation Theory (DRT) is taken as the theoretical framework of the thesis. By following this analysis, it is hoped that NLP software writers will be able to make use of the punctuation marks effectively as well as reveal interesting linguistic phenomena in conjunction with punctuation marks.
Open Access
An information-based approach to punctuation
(1998) Say, Bilge
Punctuation marks have special importance in bringing out the meaning of a text. Geoffrey Nunberg's 1990 monograph bridged the gap between descriptive treatments of punctuation and perspective accounts, by spelling out the features of a text-grammar for the orthographic sentence. His research inspired most of the recent work concentrating on punctuation marks in Natural Language Processing (NLP). Several grammars incorporating punctuation were then shown to reduce failures and ambiguities in parsing. Numberg's approach to punctuation (and other formatting devices) was partially incorporated into natural language generation systems. However, little has been done concerning how punctuation marks bring semantic and discourse cues to the text and whether these can be exploited computationally. The aim of this thesis is to analyse the semantic and discourse aspects of punctuation marks, within the framework of Hans Kamp and Uwe Reyle's Discourse Representation Theory (DRT) and its extension by Nicholas Asher, Segmented Discourse Representation Theory (SDRT), drawing implications for NLP systems. The method used is the extraction of patterns for four common punctuation marks (dashes, semicolons, colons, and parentheses) from corpora, followed by formal modeling and a modest computational prototype. Our observations and results have revealed interesting occurrences of linguistic phenomena, such as anaphora resolution and presupposition, in conjunction with punctuation marks. Within the framework of SDRT such occurrences are then tied with the overall discourse structure. The proposed model can be taken as a template for NLP software developers for making use of the punctuation marks more effectively. Overall, the thesis describes the contribution of punctuation at the orthographic sentence level to the information passed on to the reader of a text.