Browsing by Subject "Computational linguistics"

Now showing 1 - 20 of 21

Open Access
Architecture framework for mapping parallel algorithms to parallel computing platforms
(CEUR-WS, 2013) Tekinerdogan, Bedir; Arkin, E.
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, and the mapping of the algorithm to the logical configuration platform. Unfortunately, in current parallel computing approaches there does not seem to be precise modeling approaches for supporting the mapping process. The lack of a clear and precise modeling approach for parallel computing impedes the communication and analysis of the decisions for supporting the mapping of parallel algorithms to parallel computing platforms. In this paper we present an architecture framework for modeling the various views that are related to the mapping process. An architectural framework organizes and structures the proposed architectural viewpoints. We propose five coherent set of viewpoints for supporting the mapping of parallel algorithms to parallel computing platforms. We illustrate the architecture framework for the mapping of array increment algorithm to the parallel computing platform. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
Association rules for supporting hoarding in mobile computing environments
(IEEE, 2000) Saygın, Yücel; Ulusoy, Özgür; Elmagarmid, A. K.
One of the features that a mobile computer should provide is disconnected operation which is performed by hoarding. The process of hoarding can be described as loading the data items needed in the future to the client cache prior to disconnection. Automated hoarding is the process of predicting the hoard set without any user intervention. In this paper, we describe an application independent and generic technique for determining what should be hoarded prior to disconnection. Our method utilizes association rules that are extracted by data mining techniques for determining the set of items that should be hoarded to a mobile computer prior to disconnection. The proposed method was implemented and tested on synthetic data to estimate its effectiveness. Performance experiments determined that the proposed rule-based methods are effective in improving the system performance in terms of the cache hit ratio of mobile clients especially for small cache sizes.
Open Access
Atatürk'ün el yazmalarının işlenmesi
(IEEE, 2010-04) Soysal, Talha; Adıgüzel Hande; Öktem, Alp; Haman, Alican; Can, Ethem Fatih; Duygulu, Pınar; Kalpaklı, Mehmet
Bu çalımada Atatürk'ün el yazmalarının etkin ve kolay eriimini salayabilecek kelime tabanlı bir arama sisteminin ilk aaması olarak sayısallatırılmı belgelerin ön ilemesi ve satır ve kelimelere bölütlenmesi konusunda çalımalar yapılmıtır. Tarihi el yazması belgeler çeitli zorluklar getirmekte, basılı belgelerde kullanılan yöntemlerin uygulanması baarılı sonuçlar üretememektedir. Bu nedenle daha gelimi çözümler üzerine younlaarak satır bölütlemede Hough dönüümü [1] tabanlı bir yöntem uyarlanmı, kelime bölütlemede ise yazıların eiklii göz önüne alınmıtır. Afet nan tarafından salanan belgelerin [4] 30 sayfası üzerinde yapılan çalımalarda elde edilen sonuçlar gelecek çalımalar açısından umut vericidir. In this paper, as a first step to an easy and convenient way to access the manuscripts of Atatürk with a word based search engine, the preprocessing of digitalized documents and their line and word segmentation is studied. The techniques that are applied on printed documents may not yield satisfactory results. Due to this fact, more developed techniques are decided to be applied consisting of a technique based on Hough transform [1] for line segmentation and a technique that is based on dealing with skewness of lines for word segmentation. The results, which are acquired through studies that are conducted on the documents provided by Afet İnan and consisting of 30 pages [2], prove to be highly accurate and promising for future researches. ©2010 IEEE.
Open Access
Classifying fonts and calligraphy styles using complex wavelet transform
(Springer-Verlag London Ltd, 2015) Bozkurt, A.; Duygulu P.; Cetin, A.E.
Recognizing fonts has become an important task in document analysis, due to the increasing number of available digital documents in different fonts and emphases. A generic font recognition system independent of language, script and content is desirable for processing various types of documents. At the same time, categorizing calligraphy styles in handwritten manuscripts is important for paleographic analysis, but has not been studied sufficiently in the literature. We address the font recognition problem as analysis and categorization of textures. We extract features using complex wavelet transform and use support vector machines for classification. Extensive experimental evaluations on different datasets in four languages and comparisons with state-of-the-art studies show that our proposed method achieves higher recognition accuracy while being computationally simpler. Furthermore, on a new dataset generated from Ottoman manuscripts, we show that the proposed method can also be used for categorizing Ottoman calligraphy with high accuracy. © 2015, Springer-Verlag London.
Open Access
Correct-schema-guided synthesis of steadfast programs
(IEEE, 1997-11) Flener, Pierre; Lau, K. K.; Ornaghi, M.
It can be argued that for (semi-)automated software development, program schemas are indispensable, since they capture not only structured program design principles, but also domain knowledge, both of which are of crucial importance for hierarchical program synthesis. Most researchers represent schemas purely syntactically (as higher-order expressions). This means that the knowledge captured by a schema is not formalized. We take a semantic approach and show that a schema can be formalized as an open (first-order) logical theory that contains an open logic program. By using a special kind of correctness for open programs, called steadfastness, we can define and reason about the correctness of schemas. We also show how to use correct schemas to synthesize steadfast programs.
Open Access
An English-to-Turkish interlingual MT system
(Springer, 1998-10) Hakkani, Dilek Zeynep; Tür, Gökhan; Oflazer, Kemal; Mitamura, T.; Nyberg, E.H.
This paper describes the integration of a Turkish generation system with the KANT knowledge-based machine translation system to produce a prototype English-Turkish interlingua-based machine translation system. These two independently constructed systems were successfully integrated within a period of two months, through development of a module which maps KANT interlingua expressions to Turkish syntactic structures. The combined system is able to translate completely and correctly 44 of 52 benchmark sentences in the domain of broadcast news captions. This study is the first known application of knowledge-based machine translation from English to Turkish, and our initial results show promise for future development. © Springer-Verlag Berlin Heidelberg 1998.
Open Access
Generating semantic similarity atlas for natural languages
(IEEE, 2018-12) Şenel, Lütfi Kerem; Utlu, İhsan; Yücesoy, V.; Koç, A.; Çukur, Tolga
Cross-lingual studies attract a growing interest in natural language processing (NLP) research, and several studies showed that similar languages are more advantageous to work with than fundamentally different languages in transferring knowledge. Different similarity measures for the languages are proposed by researchers from different domains. However, a similarity measure focusing on semantic structures of languages can be useful for selecting pairs or groups of languages to work with, especially for the tasks requiring semantic knowledge such as sentiment analysis or word sense disambiguation. For this purpose, in this work, we leverage a recently proposed word embedding based method to generate a language similarity atlas for 76 different languages around the world. This atlas can help researchers select similar language pairs or groups in cross-lingual applications. Our findings suggest that semantic similarity between two languages is strongly correlated with the geographic proximity of the countries in which they are used.
Open Access
Hareket geçmişi görüntüsü yöntemi ile Türkçe işaret dilini tanima uygulaması
(IEEE, 2016-05) Yalçınkaya, Özge; Atvar, A.; Duygulu, P.
İşitme ve konuşma engelli bireylerin toplum içerisinde diger bireylerle sağlıklı şekilde iletişim kurabilmeleri açısından işaret dili çok önemli bir role sahiptir. Ne yazık ki işaret dilinin toplumda sadece duyarlı insanlar tarafından bilindiği ve bu sayının da azlıgı dikkat çekmektedir. Yaptığımız çalışma kapsamındaki amaç, geliştirdiğimiz sistem sayesinde işitme veya konuşma engeli mevcut olan bireylerin diğer bireylerle olan iletişiminde iyileşme sağlamaktır. Bu amaç doğrultusunda kameradan alınan işaret diline ait hareket bilgisi tanınabilmekte ve o hareketin ne anlama geldiği daha önceden eğitilen işaret diline ait hareket bilgileri ile karşılaştırılarak bulunabilmektedir. Hareket bilgilerinin kameradan alınan görüntülerden çıkarılması aşamasında "Hareket Geçmişi Görüntüsü" yöntemi kullanılmıştır. Bu bağlamdaki sınıflandırma işlemi için de "En Yakın Komşuluk" algoritması kullanılmıştır. Sonuç olarak geliştirilen sistem, eğitim kümesini kullanarak işaret dili hareketi için bir metin tahmin etmektedir. Toplamdaki sınıflandırma başarısı %95 olarak hesaplanmıştır.
Open Access
Information-based approach to punctuation
(AAAI, 1997-07) Say, Bilge
This thesis analyzes, in an information-based framework, the semantic and discourse aspects of punctuation, drawing computational implications for Natural Language Processing (NLP) systems. The Discourse Representation Theory (DRT) is taken as the theoretical framework of the thesis. By following this analysis, it is hoped that NLP software writers will be able to make use of the punctuation marks effectively as well as reveal interesting linguistic phenomena in conjunction with punctuation marks.
Open Access
Lexical cohesion based topic modeling for summarization
(Springer, 2008-02) Ercan, Gönenç; Çiçekli, İlyas
In this paper, we attack the problem of forming extracts for text summarization. Forming extracts involves selecting the most representative and significant sentences from the text. Our method takes advantage of the lexical cohesion structure in the text in order to evaluate significance of sentences. Lexical chains have been used in summarization research to analyze the lexical cohesion structure and represent topics in a text. Our algorithm represents topics by sets of co-located lexical chains to take advantage of more lexical cohesion clues. Our algorithm segments the text with respect to each topic and finds the most important topic segments. Our summarization algorithm has achieved better results, compared to some other lexical chain based algorithms. © 2008 Springer-Verlag Berlin Heidelberg.
Open Access
A lexical-functional grammar for Turkish
(1993) Güngördü, Zelal
Natural language processing is a research area which is becoming increasingly popular each day for both academic and commercial reasons. Syntactic parsing underlies most of the applications in natural language processing. Although there have been comprehensive studies of Turkish syntax from a linguistic perspective, this is one of the first attempts for investigating it extensively from a computational point of view. In this thesis, a lexical-functional grammar for Turkish syntax is presented. Our current work deals with regular Turkish sentences that are structurally simple or complex.
Open Access
Model-driven transformations for mapping parallel algorithms on parallel computing platforms
(MDHPCL, 2013) Arkin, E.; Tekinerdoğan, Bedir
One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. For platforms with a limited number of processing nodes this can be done manually. However, in case the parallel computing platform consists of hundreds of thousands of processing nodes then the manual coding of the parallel algorithms becomes intractable and error-prone. Moreover, a change of the parallel computing platform requires considerable effort and time of coding. In this paper we present a model-driven approach for generating the code of selected parallel algorithms to be mapped on parallel computing platforms. We describe the required platform independent metamodel, and the model-to-model and the model-to-text transformation patterns. We illustrate our approach for the parallel matrix multiplication algorithm. Copyright © 2013 for the individual papers by the papers' authors.
Open Access
An ontology-based approach to parsing Turkish sentences
(Springer, 1998-10) Temizsoy, Murat; Çiçekli, ilyas
The main problem with natural language analysis is the ambiguity found in various levels of linguistic information. Syntactic analysis with word senses is frequently not enough to resolve all ambiguities found in a sentence. Although natural languages are highly connected to the real world knowledge, most of the parsing architectures do not make use of it effectively In this paper, a new methodology is proposed for analyzing Turkish sentences which is heavily based on the constraints in the ontology. The methodology also makes use of morphological marks of Turkish which generally denote semantic properties. Analysis aims to find the propositional structure of the input utterance without constructing a deep syntactic tree, instead it utilizes a weak interaction between syntax and semantics. The architecture constructs a specific meaning representation on top of the analyzed propositional structure. © Springer-Verlag Berlin Heidelberg 1998.
Open Access
Ordering translation templates by assigning confidence factors
(Springer Verlag, 1998) Öz, Zeynep; Çiçekli, İlyas
TTL (Translation Template Learner) algorithm learns lexical level correspondences between two translation examples by using analogical reasoning. The sentences used as translation examples have similar and different parts in the source language which must correspond to the similar and different parts in the target language. Therefore these correspondences are learned as translation templates. The learned translation templates are used in the translation of other sentences. However, we need to assign confidence factors to these translation templates to order translation results with respect to previously assigned confidence factors. This paper proposes a method for assigning confidence factors to translation templates learned by the TTL algorithm. Training data is used for collecting statistical information that will be used in confidence factor assignment process. In this process, each template is assigned a confidence factor according to the statistical information obtained from training data. Furthermore, some template combinations are also assigned confidence factors in order to eliminate certain combinations resulting bad translation. © Springer-Verlag Berlin Heidelberg 1998.
Open Access
OTAP Ottoman archives internet interface
(IEEE, 2012) Şahin, Emre; Adıgüzel, Hande; Duygulu, Pınar; Kalpaklı, Mehmet
Within Ottoman Text Archive Project a web interface to aid in uploading, binarization, line and word segmentation, labeling, recognition and testing of the Ottoman Turkish texts has been developed. It became possible to retrieve expert knowledge of scholars working with Ottoman archives through this interface, and apply this knowledge in developing further technologies in transliteration of historical manuscripts. © 2012 IEEE.
Open Access
Parsing Turkish using the lexical functional grammar formalism
(Springer/Kluwer Academic Publishers, 1995) Güngördü, Z.; Oflazer, K.
This paper describes our work on parsing Turkish using the lexical-functional grammar formalism [11]. This work represents the first effort for wide-coverage syntactic parsing of Turkish. Our implementation is based on Tomita's parser developed at Carnegie Mellon University Center for Machine Translation. The grammar covers a substantial subset of Turkish including structurally simple and complex sentences, and deals with a reasonable amount of word order freeness. The complex agglutinative morphology of Turkish lexical structures is handled using a separate two-level morphological analyzer, which has been incorporated into the syntactic parser. After a discussion of the key relevant issues regarding Turkish grammar, we discuss aspects of our system and present results from our implementation. Our initial results suggest that our system can parse about 82% of the sentences directly and almost all the remaining with very minor pre-editing. © 1995 Kluwer Academic Publishers.
Open Access
River: an intermediate language for stream processing
(John Wiley & Sons Ltd., 2016) Soulé R.; Hirzel M.; Gedik, B.; Grimm, R.
Summary This paper presents both a calculus for stream processing, named Brooklet, and its realization as an intermediate language, named River. Because River is based on Brooklet, it has a formal semantics that enables reasoning about the correctness of source translations and optimizations. River builds on Brooklet by addressing the real-world details that the calculus elides. We evaluated our system by implementing front-ends for three streaming languages, and three important optimizations, and a back-end for the System S distributed streaming runtime. Overall, we significantly lower the barrier to entry for new stream-processing languages and thus grow the ecosystem of this crucial style of programming.
Open Access
Text summarization of turkish texts using latent semantic analysis
(ACM, 2010) Ozsoy, M.G.; Çiçekli, İlyas; Alpaslan F.N.
Text summarization solves the problem of extracting important information from huge amount of text data. There are various methods in the literature that aim to find out well-formed summaries. One of the most commonly used methods is the Latent Semantic Analysis (LSA). In this paper, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish documents, and their performances are compared using their ROUGE-L scores. One of our algorithms produces the best scores.
Open Access
Topic-Centric Querying of Web Information Resourcest
(Springer, Berlin, Heidelberg, 2001) Altıngövde, İsmail Şengör; Özel, Selma A.; Ulusoy, Özgür; Özsoyoğlu G.; Özsoyoğlu, Z.M.
This paper deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a model for web information resources, and a query language SQL-TC (Topic-Centric SQL) to query the model. The model is composed of web-based information resources (XML or HTML documents on the web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized information about users (captured as user profiles, that indicate users’ preferences as to which expert advice they would like to follow, and which to ignore, etc). The query language SQL-TC makes use of the metadata information provided in expert advice repositories and embedded in information resources, and employs user preferences to further refine the query output. Query output objects/tuples are ranked with respect to the (expert-judged and user- preference-revised) importance values of requested topics/metalinks, and the query output is limited by either top n-ranked objects/tuples, or objects/tuples with importance values above a given threshold, or both. © Springer-Verlag Berlin Heidelberg 2001.
Open Access
Turing Test and conversation
(1999) Saygın, Ayşe Pınar
The Turing Test is one of the most disputed topics in Artificial Intelligence, Philosophy of Mind and Cognitive Science. It was proposed 50 years ago, as a method to determine whether machines can think or not. It embodies important philosophical issues, as well as computational ones. Moreover, because of its characteristics, it requires interdisciplinary attention. The Turing Test posits that, to be granted intelligence, a computer should imitate human conversational behavior so well that it should be indistinguishable from a real human being. From this it follows that conversation is a crucial concept in its study. Surprisingly, focusing on conversation in relation to the Turing Test has not been a prevailing approach in previous research. This thesis first provides a thorough and deep review of the 50 years of the Turing Test. Philosophical arguments, computational concerns, and repercussions in other disciplines are all discussed. Furthermore, this thesis studies the Turing Test as a special kind of conversation. In doing so, the relationship between existing theories of conversation and human-computer communication is explored. In particular, Grice's cooperative principle and conversational maxims are concentrated on. Viewing the Turing Test as conversation and computers as language users has significant effects on the way we look at Artificial Intelligence, and on communication in general.