Browsing by Subject "Computational Linguistics"

Now showing 1 - 5 of 5

Open Access
Computer-aided analysis of English punctuation on a parsed corpus: the special case of comma
(1996) Bayraktar, Murat
Punctuation, an orthographical component of language, has usually been ignored by most research in computational linguistics over the years. One reason for this is the overall difficulty of the subject, and another is the absence of a good theory. On the other hand, both ‘conventional’ and computational linguistics have increased their attention to punctuation in recent years because it has been realized that true understanding and processing of written language will be almost impossible if punctuation marks are not taken into account. Except the lists of rules given in style manuals or usage books, we know little about punctuation. These books give us information about how we should punctuate, but they are generally silent about the actual punctuation practice. This thesis contains the details of a computer-aided experiment to investigate English punctuation practice, for the special case of comma (the most significant punctuation mark) in a parsed corpus. The experiment attempts to classify the various uses of comma according to the syntax-patterns in which comma occurs. The corpus (Penn Treebank) consists of syntactically annotated sentences with no part-of-speech tag information about individual words, and this ideally seems to be enough to classify ‘structural’ punctuation marks.
Open Access
Design and evaluation of a new transaction execution model for multidatabase systems
(Elsevier, 1997) Devirmiş, T.; Ulusoy, Özgür
In this paper, we present a new transaction execution model that captures the formalism and semantics of various extended transaction models and adopts them to a multidatabase system (MDBS) environment. The proposed model covers nested transactions, various dependency types among transactions, and commit independent transactions. The formulation of complex MDBS transaction types can be accomplished easily with the extended semantics captured in the model. A detailed performance model of an MDBS is employed in investigating the performance implications of the proposed transaction model. © Elsevier Science Inc. 1997.
Open Access
Learning translation templates for closely related languages
(Springer, Berlin, Heidelberg, 2003) Altıntaş, Kemal; Güvenir, H. Altay
Many researchers have worked on example-based machine translation and different techniques have been investigated in the area. In literature, a method of using translation templates learned from bilingual example pairs was proposed. The paper investigates the possibility of applying the same idea for close languages where word order is preserved. In addition to applying the original algorithm for example pairs, we believe that the similarities between the translated sentences may always be learned as atomic translations. Since the word order is almost always preserved, there is no need to have any previous knowledge to identify the corresponding differences. The paper concludes that applying this method for close languages may improve the performance of the system.
Open Access
Pragmatics in human-computer conversations
(Elsevier, 2002) Saygin, A. P.; Cicekli, I.
This paper provides a pragmatic analysis of some human-computer conversations carried out during the past six years within the context of the Loebner Prize Contest, an annual competition in which computers participate in Turing Tests. The Turing Test posits that to be granted intelligence, a computer should imitate human conversational behavior so well as to be indistinguishable from a real human being. We carried out an empirical study exploring the relationship between computers' violations of Grice's cooperative principle and conversational maxims, and their success in imitating human language use. Based on conversation analysis and a large survey, we found that different maxims have different effects when violated, but more often than not, when computers violate the maxims, they reveal their identity. The results indicate that Grice's cooperative principle is at work during conversations with computers. On the other hand, studying human-computer communication may require some modifications of existing frameworks in pragmatics because of certain characteristics of these conversational environments. Pragmatics constitutes a serious challenge to computational linguistics. While existing programs have other significant shortcomings, it may be that the biggest hurdle in developing computer programs which can successfully carry out conversations will be modeling the ability to 'cooperate'. © 2002 Elsevier Science B.V. All rights reserved.
Open Access
Prosody-based automatic segmentation of speech into sentences and topics
(Elsevier, 2000) Shriberg, E.; Stolcke, A.; Hakkani-Tür, D.; Tür, G.
A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models-for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.