An analysis of english punctuation: the special case of comma
International Journal of Corpus Linguistics
John Benjamins Publishing Co.
33 - 57
Item Usage Stats
Punctuation has usually been ignored by researchers in computational linguistics over the years. Recently, it has been realized that a true understanding of written language will be impossible if punctuation marks are not taken into account. This paper contains the details of a computer-aided exercise to investigate English punctuation practice for the special case of comma (the most significant punctuation mark) in a parsed corpus. The study classifies the various ‘structural’ uses of comma according to the syntaxpatterns in which comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about the individual words.