Noun phrase chunker for Turkish using dependency parser
Item Usage Stats
Noun phrase chunking is a sub-category of shallow parsing that can be used for many natural language processing tasks. In this thesis, we propose a noun phrase chunker system for Turkish texts. We use a weighted constraint dependency parser to represent the relationship between sentence components and to determine noun phrases. The dependency parser uses a set of hand-crafted rules which can combine morphological and semantic information for constraints. The rules are suitable for handling complex noun phrase structures because of their flexibility. The developed dependency parser can be easily used for shallow parsing of all phrase types by changing the employed rule set. The lack of reliable human tagged datasets is a significant problem for natural language studies about Turkish. Therefore, we constructed the first noun phrase dataset for Turkish. According to our evaluation results, our noun phrase chunker gives promising results on this dataset. The correct morphological disambiguation of words is required for the correctness of the dependency parser. Therefore, in this thesis, we propose a hybrid morphological disambiguation technique which combines statistical information, hand-crafted grammar rules, and transformation based learning rules. We have also constructed a dataset for testing the performance of our disambiguation system. According to tests, the disambiguation system is highly effective.
KeywordsNatural Language Processing
Noun Phrase Chunker