Statistical morphological disambiguation for agglutinative languages

dc.citation.epage410en_US
dc.citation.issueNumber4en_US
dc.citation.spage381en_US
dc.citation.volumeNumber36en_US
dc.contributor.authorHakkani-Tür, D. Z.en_US
dc.contributor.authorOflazer, K.en_US
dc.contributor.authorTür, G.en_US
dc.date.accessioned2019-02-01T12:13:18Z
dc.date.available2019-02-01T12:13:18Zen_US
dc.date.issued2002en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractWe present statistical models for morphological disambiguation in agglutinative languages, with a specific application to Turkish. Turkish presents an interesting problem for statistical models as the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflectional groups and surface roots in trigram models. Among the four models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.en_US
dc.description.provenanceSubmitted by Ülkü Ögel (ulkul@bilkent.edu.tr) on 2019-02-01T12:13:18Z No. of bitstreams: 1 Statistical_morphological_disambiguation_for_agglutinative_languages.pdf: 186273 bytes, checksum: ff3e33615bae100f4ad53f098d9bbb62 (MD5)en_US
dc.description.provenanceMade available in DSpace on 2019-02-01T12:13:18Z (GMT). No. of bitstreams: 1 Statistical_morphological_disambiguation_for_agglutinative_languages.pdf: 186273 bytes, checksum: ff3e33615bae100f4ad53f098d9bbb62 (MD5) Previous issue date: 2002en_US
dc.identifier.doi10.1023/A:1020271707826en_US
dc.identifier.issn0010-4817
dc.identifier.issn1572-8412
dc.identifier.urihttp://hdl.handle.net/11693/48722en_US
dc.language.isoEnglishen_US
dc.publisherSpringer/en_US
dc.publisherKluwer Academic Publishersen_US
dc.relation.isversionofhttps://doi.org/10.1023/A:1020271707826en_US
dc.source.titleComputers and the Humanitiesen_US
dc.subjectAgglutinative Languagesen_US
dc.subjectMorphological Disambiguationen_US
dc.subjectN-Gram Language Modelsen_US
dc.subjectStatistical Natural Language Processingen_US
dc.subjectTurkishen_US
dc.titleStatistical morphological disambiguation for agglutinative languagesen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Statistical_morphological_disambiguation_for_agglutinative_languages.pdf
Size:
181.91 KB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: