Statistical morphological disambiguation for agglutinative languages

Hakkani-Tür, D. Z.; Oflazer, K.; Tür, G.

Statistical morphological disambiguation for agglutinative languages

Files

Statistical_morphological_disambiguation_for_agglutinative_languages.pdf (181.91 KB)

Date

2002

Authors

Hakkani-Tür, D. Z.

Oflazer, K.

Tür, G.

BUIR Usage Stats

2
views

31
downloads

Citation Stats

Abstract

We present statistical models for morphological disambiguation in agglutinative languages, with a specific application to Turkish. Turkish presents an interesting problem for statistical models as the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflectional groups and surface roots in trigram models. Among the four models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.

Source Title

Computers and the Humanities

Publisher

Springer/
Kluwer Academic Publishers

Keywords

Agglutinative Languages, Morphological Disambiguation, N-Gram Language Models, Statistical Natural Language Processing, Turkish

Permalink

http://hdl.handle.net/11693/48722

Published Version (Please cite this version)

https://doi.org/10.1023/A:1020271707826

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Statistical morphological disambiguation for agglutinative languages

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Statistical morphological disambiguation for agglutinative languages

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type