Statistical morphological disambiguation for agglutinative languages
dc.citation.epage | 410 | en_US |
dc.citation.issueNumber | 4 | en_US |
dc.citation.spage | 381 | en_US |
dc.citation.volumeNumber | 36 | en_US |
dc.contributor.author | Hakkani-Tür, D. Z. | en_US |
dc.contributor.author | Oflazer, K. | en_US |
dc.contributor.author | Tür, G. | en_US |
dc.date.accessioned | 2019-02-01T12:13:18Z | |
dc.date.available | 2019-02-01T12:13:18Z | en_US |
dc.date.issued | 2002 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | We present statistical models for morphological disambiguation in agglutinative languages, with a specific application to Turkish. Turkish presents an interesting problem for statistical models as the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflectional groups and surface roots in trigram models. Among the four models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%. | en_US |
dc.description.provenance | Submitted by Ülkü Ögel (ulkul@bilkent.edu.tr) on 2019-02-01T12:13:18Z No. of bitstreams: 1 Statistical_morphological_disambiguation_for_agglutinative_languages.pdf: 186273 bytes, checksum: ff3e33615bae100f4ad53f098d9bbb62 (MD5) | en_US |
dc.description.provenance | Made available in DSpace on 2019-02-01T12:13:18Z (GMT). No. of bitstreams: 1 Statistical_morphological_disambiguation_for_agglutinative_languages.pdf: 186273 bytes, checksum: ff3e33615bae100f4ad53f098d9bbb62 (MD5) Previous issue date: 2002 | en_US |
dc.identifier.doi | 10.1023/A:1020271707826 | en_US |
dc.identifier.issn | 0010-4817 | en_US |
dc.identifier.issn | 1572-8412 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/48722 | en_US |
dc.language.iso | English | en_US |
dc.publisher | Springer/ | en_US |
dc.publisher | Kluwer Academic Publishers | en_US |
dc.relation.isversionof | https://doi.org/10.1023/A:1020271707826 | en_US |
dc.source.title | Computers and the Humanities | en_US |
dc.subject | Agglutinative Languages | en_US |
dc.subject | Morphological Disambiguation | en_US |
dc.subject | N-Gram Language Models | en_US |
dc.subject | Statistical Natural Language Processing | en_US |
dc.subject | Turkish | en_US |
dc.title | Statistical morphological disambiguation for agglutinative languages | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Statistical_morphological_disambiguation_for_agglutinative_languages.pdf
- Size:
- 181.91 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: