Computer-aided analysis of English punctuation on a parsed corpus: the special case of comma

buir.advisorAkman, Varol
dc.contributor.authorBayraktar, Murat
dc.date.accessioned2016-01-08T20:13:35Z
dc.date.available2016-01-08T20:13:35Z
dc.date.issued1996
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionIncludes bibliographical references leaves 51-56.en_US
dc.description.abstractPunctuation, an orthographical component of language, has usually been ignored by most research in computational linguistics over the years. One reason for this is the overall difficulty of the subject, and another is the absence of a good theory. On the other hand, both ‘conventional’ and computational linguistics have increased their attention to punctuation in recent years because it has been realized that true understanding and processing of written language will be almost impossible if punctuation marks are not taken into account. Except the lists of rules given in style manuals or usage books, we know little about punctuation. These books give us information about how we should punctuate, but they are generally silent about the actual punctuation practice. This thesis contains the details of a computer-aided experiment to investigate English punctuation practice, for the special case of comma (the most significant punctuation mark) in a parsed corpus. The experiment attempts to classify the various uses of comma according to the syntax-patterns in which comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about individual words, and this ideally seems to be enough to classify ‘structural’ punctuation marks.en_US
dc.description.statementofresponsibilityBayraktar, Muraten_US
dc.format.extentix, 92 leavesen_US
dc.identifier.urihttp://hdl.handle.net/11693/17809
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectComputational Linguisticsen_US
dc.subjectNatural Laaguage Processingen_US
dc.subjectPunctuationen_US
dc.subjectEnglishen_US
dc.subjectCorpus-based Analysisen_US
dc.subjectCommaen_US
dc.subject.lccQA76.9.N38 B39 1996en_US
dc.subject.lcshNatural language processing (Computer science).en_US
dc.subject.lcshComputational linguistics.en_US
dc.titleComputer-aided analysis of English punctuation on a parsed corpus: the special case of commaen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B035250.pdf
Size:
2.18 MB
Format:
Adobe Portable Document Format
Description:
Full printable version