• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Computer Engineering
      • Dept. of Computer Engineering - Ph.D. / Sc.D.
      • View Item
      •   BUIR Home
      • University Library
      • Bilkent Theses
      • Theses - Department of Computer Engineering
      • Dept. of Computer Engineering - Ph.D. / Sc.D.
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Automating information extraction task for Turkish texts

      Thumbnail
      View / Download
      1.4 Mb
      Author
      Tatar, Serhan
      Advisor
      Ulusoy, Özgür
      Date
      2011
      Publisher
      Bilkent University
      Language
      English
      Type
      Thesis
      Item Usage Stats
      114
      views
      49
      downloads
      Abstract
      Throughout history, mankind has often suffered from a lack of necessary resources. In today’s information world, the challenge can sometimes be a wealth of resources. That is to say, an excessive amount of information implies the need to find and extract necessary information. Information extraction can be defined as the identification of selected types of entities, relations, facts or events in a set of unstructured text documents in a natural language. The goal of our research is to build a system that automatically locates and extracts information from Turkish unstructured texts. Our study focuses on two basic Information Extraction (IE) tasks: Named Entity Recognition and Entity Relation Detection. Named Entity Recognition, finding named entities (persons, locations, organizations, etc.) located in unstructured texts, is one of the most fundamental IE tasks. Entity Relation Detection task tries to identify relationships between entities mentioned in text documents. Using supervised learning strategy, the developed systems start with a set of examples collected from a training dataset and generate the extraction rules from the given examples by using a carefully designed coverage algorithm. Moreover, several rule filtering and rule refinement techniques are utilized to maximize generalization and accuracy at the same time. In order to obtain accurate generalization, we use several syntactic and semantic features of the text, including: orthographical, contextual, lexical and morphological features. In particular, morphological features of the text are effectively used in this study to increase the extraction performance for Turkish, an agglutinative language. Since the system does not rely on handcrafted rules/patterns, it does not heavily suffer from domain adaptability problem. The results of the conducted experiments show that (1) the developed systems are successfully applicable to the Named Entity Recognition and Entity Relation Detection tasks, and (2) exploiting morphological features can significantly improve the performance of information extraction from Turkish, an agglutinative language.
      Keywords
      Information Extraction
      Turkish
      Named Entity Recognition
      Entity Relation Detection
      Permalink
      http://hdl.handle.net/11693/15166
      Collections
      • Dept. of Computer Engineering - Ph.D. / Sc.D. 75
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy