Computer-aided analysis of English punctuation on a parsed corpus: the special case of comma
Author(s)
Advisor
Date
1996Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
109
views
views
69
downloads
downloads
Abstract
Punctuation, an orthographical component of language, has usually been ignored
by most research in computational linguistics over the years. One reason
for this is the overall difficulty of the subject, and another is the absence of a
good theory. On the other hand, both ‘conventional’ and computational linguistics
have increased their attention to punctuation in recent years because it
has been realized that true understanding and processing of written language
will be almost impossible if punctuation marks are not taken into account.
Except the lists of rules given in style manuals or usage books, we know little
about punctuation. These books give us information about how we should
punctuate, but they are generally silent about the actual punctuation practice.
This thesis contains the details of a computer-aided experiment to investigate
English punctuation practice, for the special case of comma (the most significant
punctuation mark) in a parsed corpus. The experiment attempts to
classify the various uses of comma according to the syntax-patterns in which
comma occurs. The corpus (Penn Treebank) consists of syntactically annotated
sentences with no part-of-speech tag information about individual words, and
this ideally seems to be enough to classify ‘structural’ punctuation marks.
Keywords
Computational LinguisticsNatural Laaguage Processing
Punctuation
English
Corpus-based Analysis
Comma