Browsing by Subject "Natural language processing (Computer science)"

Now showing 1 - 5 of 5

Open Access
Automating information extraction task for Turkish texts
(2011) Tatar, Serhan
Throughout history, mankind has often suffered from a lack of necessary resources. In today’s information world, the challenge can sometimes be a wealth of resources. That is to say, an excessive amount of information implies the need to find and extract necessary information. Information extraction can be defined as the identification of selected types of entities, relations, facts or events in a set of unstructured text documents in a natural language. The goal of our research is to build a system that automatically locates and extracts information from Turkish unstructured texts. Our study focuses on two basic Information Extraction (IE) tasks: Named Entity Recognition and Entity Relation Detection. Named Entity Recognition, finding named entities (persons, locations, organizations, etc.) located in unstructured texts, is one of the most fundamental IE tasks. Entity Relation Detection task tries to identify relationships between entities mentioned in text documents. Using supervised learning strategy, the developed systems start with a set of examples collected from a training dataset and generate the extraction rules from the given examples by using a carefully designed coverage algorithm. Moreover, several rule filtering and rule refinement techniques are utilized to maximize generalization and accuracy at the same time. In order to obtain accurate generalization, we use several syntactic and semantic features of the text, including: orthographical, contextual, lexical and morphological features. In particular, morphological features of the text are effectively used in this study to increase the extraction performance for Turkish, an agglutinative language. Since the system does not rely on handcrafted rules/patterns, it does not heavily suffer from domain adaptability problem. The results of the conducted experiments show that (1) the developed systems are successfully applicable to the Named Entity Recognition and Entity Relation Detection tasks, and (2) exploiting morphological features can significantly improve the performance of information extraction from Turkish, an agglutinative language.
Open Access
Design and implementation of a verb lexicon and verb sense disambiguator for Turkish
(1994) Yılmaz, Okan
The lexicon has a crucial role in all natural language processing systems and has special importance in machine translation systems. This thesis presents the design and implementation of a verb lexicon and a verb sense disambigua- tor for Turkish. The lexicon contains only verbs because verbs encode events in sentences and play the most important role in natural language processing systems, especially in parsing (syntactic analyzing) and machine translation. The verb sense disambiguator uses the information stored in the verb lexicon that we developed. The main purpose of this tool is to disambiguate senses of verbs having several meanings, some of which are idiomatic. We also present a tool implemented in Lucid Common Lisp under X-Windows for adding, accessing, modifying, and removing entries of the lexicon, and a semantic concept ontology containing semantic features of commonly used Turkish nouns.
Open Access
Noun phrase chunker for Turkish using dependency parser
(2010) Kutlu, Mücahid
Noun phrase chunking is a sub-category of shallow parsing that can be used for many natural language processing tasks. In this thesis, we propose a noun phrase chunker system for Turkish texts. We use a weighted constraint dependency parser to represent the relationship between sentence components and to determine noun phrases. The dependency parser uses a set of hand-crafted rules which can combine morphological and semantic information for constraints. The rules are suitable for handling complex noun phrase structures because of their flexibility. The developed dependency parser can be easily used for shallow parsing of all phrase types by changing the employed rule set. The lack of reliable human tagged datasets is a significant problem for natural language studies about Turkish. Therefore, we constructed the first noun phrase dataset for Turkish. According to our evaluation results, our noun phrase chunker gives promising results on this dataset. The correct morphological disambiguation of words is required for the correctness of the dependency parser. Therefore, in this thesis, we propose a hybrid morphological disambiguation technique which combines statistical information, hand-crafted grammar rules, and transformation based learning rules. We have also constructed a dataset for testing the performance of our disambiguation system. According to tests, the disambiguation system is highly effective.
Open Access
A performatory analysis of the overt use of the predicate "true"
(2013) Şenol, Mahmut Burak
The de ationary theory has been one of the most in uential theories of truth in contemporary philosophy. This theory holds that there is no property of truth at all, and that overt uses of the predicate \true" in our sentences are redundant, having absolutely no e ect on what we express. However, all hypothetical examples used by de ationary theorists in exemplifying the theory, in papers, books, have been taken out of context. Thus, there is no way to examine and analyze what the predicate adds to the sentence within context. We oppose this theory not on philosophical grounds, but on empirical grounds, with an \ordinary language philosophy" approach. We computationally collect 7610 occurrences of overt uses of the predicate \true" in the form \it is true that", from 10 in uential periodicals (newspapers and a magazine) published in the United States. We classify and annotate these examples with respect to coordinating and subordinating conjunctions' positions they contain. We investigate contextual relations of the proposition following the phrase \it is true that" with its surrounding propositions. We encounter 34 di erent syntactical patterns. We propose that in some occurrences of overt uses of the predicate \true", existence of the predicate makes an emphasis, performs an action in the same manner as a performatory verb does. We provide ordinary language appearances of overt uses of the predicate \true", which have been used in linguistically reliable media and constitute pragmatic `counter-examples' to the de ationary theory of truth.
Open Access
Tagging and morphological disambiguation of Turkish text
(1994) Kuruöz, İlker
A part-of-speech (POS) tagger is a system that uses various sources of information to assign possibly unique POS to words. Automatic text tagging is an important component in higher level analysis of text corpora. Its output can also be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging as the structures of many lexical forms are morphologically ambiguous. This thesis present a POS tagger for Turkish text based on a full-scale two-level specification of Turkish morphology. The tagger is augmented with a multi-word and idiomatic construct recognizer, and most importantly morphological disambiguator based on local lexical neighborhood constraints, heuristics and limited amount of statistical information. The tagger also has additional functionality for statistics compilation and fine tuning of the morphological analyzer, such as logging erroneous morphological parses, commonly used roots, etc. Test results indicate that the tagger can tag about 97/% to 99% of the texts accurately with very minimal user intervention. Furthermore for sentences morphologically disambiguated with the tagger, an LFG parser developed for Turkish, on the average, generates 50% less ambiguous parses almost 2.5 times faster.