Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Date
2011
Authors
Tatar, S.
Cicekli I.
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Journal of Information Science
Print ISSN
1655515
Electronic ISSN
Publisher
Volume
37
Issue
2
Pages
137 - 151
Language
English
Journal Title
Journal ISSN
Volume Title
Series
Abstract

Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language. © The Author(s) 2011.

Course
Other identifiers
Book Title
Citation
Published Version (Please cite this version)