Browsing by Subject "Natural language processing (Computer science)."

Now showing 1 - 4 of 4

Open Access
Computer-aided analysis of English punctuation on a parsed corpus: the special case of comma
(1996) Bayraktar, Murat
Punctuation, an orthographical component of language, has usually been ignored by most research in computational linguistics over the years. One reason for this is the overall difficulty of the subject, and another is the absence of a good theory. On the other hand, both ‘conventional’ and computational linguistics have increased their attention to punctuation in recent years because it has been realized that true understanding and processing of written language will be almost impossible if punctuation marks are not taken into account. Except the lists of rules given in style manuals or usage books, we know little about punctuation. These books give us information about how we should punctuate, but they are generally silent about the actual punctuation practice. This thesis contains the details of a computer-aided experiment to investigate English punctuation practice, for the special case of comma (the most significant punctuation mark) in a parsed corpus. The experiment attempts to classify the various uses of comma according to the syntax-patterns in which comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about individual words, and this ideally seems to be enough to classify ‘structural’ punctuation marks.
Open Access
Focusing for pronoun resolution in English discourse: an implementation
(1994) Ersan, Ebru
Anaphora resolution is one of the most active research areas in natural language processing. This study examines focusing as a tool for the resolution of pronouns which are a kind of anaphora. Focusing is a discourse phenomenon like anaphora. Candy Sidner formalized focusing in her 1979 MIT PhD thesis and devised several algorithms to resolve definite anaphora including pronouns. She presented her theory in a computational framework but did not generally implement the algorithms. Her algorithms related to focusing and pronoun resolution are implemented in this thesis. This implementation provides a better comprehension of the theory both from a conceptual and a computational point of view. The resulting program is tested on different discourse segments, and evaluation and analysis of the experiments are presented together with the statistical results
Open Access
Turkish text to speech system
(2002) Eker, Barış
Scientists have been interested in producing human speech artificially for more than two centuries. After the invention of computers, computers have been used in order to synthesize speech. By the help of this new technology, Text To Speech (TTS) systems that take a text as input and produce speech as output have been created. Some languages like English and French have taken most of the attention and some languages like Turkish have not been taken into consideration. This thesis presents a TTS system for Turkish that uses the diphone concatenation method. It takes a text as input and produces corresponding speech in Turkish. The output can be obtained in one male voice only in this system. Since Turkish is a phonetic language, this system also can be used for other phonetic languages with some minor modifications. If this system is integrated with a pronunciation unit, it can also be used for languages that are not phonetic.
Open Access
Using multiple sources of information for constraint-based morphological disambiguation
(1996) Tür, Gökhan
This thesis presents a constraint-based morphological disambiguation approach that is applicable to languages with complex morphology-specifically agglutiriiitive languages with productive inflectional and derivational morphological phenomena. For morphologicciJly comiDlex languages like Turkish, automatic morphological disarnbigucition involves selecting for each token rnorphologiccil parse(s), with the right set of inflectional and derivational markers. Our system combines corpus independent hand-crafted constraint rules, constraint rules that are lecirned via unsupervised learning from a training corpus, and additioiml stcitistiCcil information obtcvined from the corpus to be morphologically disarnbigucited. The hcind-crafted rules are linguistically motivated and tuned to improve precision without sacrificing recall. In certain respects, our ai^proach has been motivated by Brill’s recent work [6], but with the observation that his transformational approach is not directly applicable to languages like Turkish. Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant inflectional and derivational information from a lexical item whose root is unknown. With this approach, well below 1% of the tokens remains as unknown in the texts we have experimented with. Our results indicate that by combining these hand-crafted, statistical and learned information sources, we can attain a reccill of 96 to 97% with a corresponding precision of 93 to 94%, and ambiguity of 1.02 to 1.03 parses per token.