Tagging and morphological disambiguation of Turkish text

buir.advisorOflazer, Kemal
dc.contributor.authorKuruöz, İlker
dc.date.accessioned2016-01-08T20:11:49Z
dc.date.available2016-01-08T20:11:49Z
dc.date.issued1994
dc.descriptionAnkara : Department of Computer Engineering and Information Science and Institute of Engineering and Science, Bilkent University, 1994.en_US
dc.descriptionThesis (Master's) -- -Bilkent University, 1994.en_US
dc.descriptionIncludes bibliographical refences.en_US
dc.description.abstractA part-of-speech (POS) tagger is a system that uses various sources of information to assign possibly unique POS to words. Automatic text tagging is an important component in higher level analysis of text corpora. Its output can also be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging as the structures of many lexical forms are morphologically ambiguous. This thesis present a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology. The tagger is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local lexical neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has additional functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Test results indicate that the tagger can tag about 97/% to 99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, on the average, generates 50% less ambiguous parses almost 2.5 times faster.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T20:11:49Z (GMT). No. of bitstreams: 1 1.pdf: 78510 bytes, checksum: d85492f20c2362aa2bcf4aad49380397 (MD5)en
dc.description.statementofresponsibilityKuruöz, İlkeren_US
dc.format.extent94 leavesen_US
dc.identifier.itemidBILKUTUPB024372
dc.identifier.urihttp://hdl.handle.net/11693/17608
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectTaggingen_US
dc.subjectMorphological Analysisen_US
dc.subjectCorpus Developmenten_US
dc.subject.lccP98 .K87 1994en_US
dc.subject.lcshComputational linguistics.en_US
dc.subject.lcshNatural language processing (Computer science)en_US
dc.subject.lcshLinguistics--Data processing.en_US
dc.subject.lcshSystemic grammar.en_US
dc.subject.lcshText processing (Computer science).en_US
dc.titleTagging and morphological disambiguation of Turkish texten_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
B024372.pdf
Size:
2.63 MB
Format:
Adobe Portable Document Format
Description:
Full printable version