Browsing by Subject "Natural language processing systems"

Now showing 1 - 10 of 10

Open Access
Automatic rule learning exploiting morphological features for named entity recognition in Turkish
(2011) Tatar, S.; Cicekli I.
Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language. © The Author(s) 2011.
Open Access
Collaborative workspaces for pathway curation
(CEUR-WS, 2016-08) Durupınar-Babur, F.; Siper, Metin Can; Doğrusöz, Uğur; Bahceci, İstemi; Babur, O.; Demir, E.
We present a web based visual biocuration workspace, focusing on curating detailed mechanistic pathways. It was designed as a flexible platform where multiple humans, NLP and AI agents can collaborate in real-time on a common model using an event driven API. We will use this platform for exploring disruptive technologies that can scale up biocuration such as NLP, human-computer collaboration, crowd-sourcing, alternative publishing and gamification. As a first step, we are designing a pilot to include an author-curation step into the scientific publishing, where the authors of an article create formal pathway fragments representing their discovery- heavily assisted by computer agents. We envision that this "microcuration" use-case will create an excellent opportunity to integrate multiple NLP approaches and semi-automated curation. © 2016, CEUR-WS. All rights reserved.
Open Access
Information-based approach to punctuation
(AAAI, 1997-07) Say, Bilge
This thesis analyzes, in an information-based framework, the semantic and discourse aspects of punctuation, drawing computational implications for Natural Language Processing (NLP) systems. The Discourse Representation Theory (DRT) is taken as the theoretical framework of the thesis. By following this analysis, it is hoped that NLP software writers will be able to make use of the punctuation marks effectively as well as reveal interesting linguistic phenomena in conjunction with punctuation marks.
Open Access
Measuring cross-lingual semantic similarity across European languages
(IEEE, 2017) Şenel, Lütfü Kerem; Yücesoy, V.; Koç, A.; Çukur, Tolga
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary in each language is projected onto a separate high-dimensional vector space, and these vector spaces are then compared using several different distance measures (i.e., correlation, cosine etc.) to measure their pairwise semantic similarities between these languages. A substantial degree of similarity is observed between the vector spaces learned from corpora of the European languages. Null hypothesis testing and bootstrap methods (by resampling without replacement) are utilized to verify the results.
Open Access
Natural language querying for video databases
(Elsevier Inc., 2008-06-15) Erozel, G.; Cicekli, N. K.; Cicekli, I.
The video databases have become popular in various areas due to the recent advances in technology. Video archive systems need user-friendly interfaces to retrieve video frames. In this paper, a user interface based on natural language processing (NLP) to a video database system is described. The video database is based on a content-based spatio-temporal video data model. The data model is focused on the semantic content which includes objects, activities, and spatial properties of objects. Spatio-temporal relationships between video objects and also trajectories of moving objects can be queried with this data model. In this video database system, a natural language interface enables flexible querying. The queries, which are given as English sentences, are parsed using link parser. The semantic representations of the queries are extracted from their syntactic structures using information extraction techniques. The extracted semantic representations are used to call the related parts of the underlying video database system to return the results of the queries. Not only exact matches but similar objects and activities are also returned from the database with the help of the conceptual ontology module. This module is implemented using a distance-based method of semantic similarity search on the semantic domain-independent ontology, WordNet. © 2008 Elsevier Inc. All rights reserved.
Open Access
Osmanlıca kelimeleri eşleme
(IEEE, 2007-06) Ataer, Esra; Duygulu, Pınar
Osmanlı arşivleri dünyanın pek çok yerinden araştırmacının ilgi alanına girmektedir. Fakat bu belgelerin elle çevirisi zor bir iş olduğu için, bu arşivler kullanılamaz durumdadır. Otomatik çeviri gerekmektedir, fakat Osmanlıca’nın yazma özelliklerinden dolayı karakter tabanlı tanıma sistemleri istenen başarıyı gösterememektedir. Ayrıca, belgeler minyatür ve tuğra gibi önemli kısımlar içerdiği için, imge formatında saklanmaları gerekmektedir. Bu nedenle, bu çalışmada Osmanlıca kelimeleri imge olarak görerek probleme imge erişim problemi olarak yaklaşıldı ve kelime eşleme tekniği üzerine bir çözüm önerisinde bulunuldu. Nesne tanımada başarılı olan görsel öğeler kümesi (bag-of-visterms) tekniği kelime eşleme işlemine uyarlandı ve böylece her kelime imgesi taç noktalarından çıkarılan SIFT özelliklerinin ¨ vektor¨ nicemlemesiyle sembolize edildi. Benzer kelimeler görsel ögelerin dağılımına göre eşlendi. Deneyler 10,000 kelimenin üzerindeki matbu ve elyazması belge üzerinde yapıldı. Sonuçlar sistemin benzer kelimeleri yüksek doğrulukla eşlediğini ve anlamsal benzerlikleri bulduğunu gösteriyor Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words.
Open Access
Parsing Turkish using the lexical functional grammar formalism
(Springer/Kluwer Academic Publishers, 1995) Güngördü, Z.; Oflazer, K.
This paper describes our work on parsing Turkish using the lexical-functional grammar formalism [11]. This work represents the first effort for wide-coverage syntactic parsing of Turkish. Our implementation is based on Tomita's parser developed at Carnegie Mellon University Center for Machine Translation. The grammar covers a substantial subset of Turkish including structurally simple and complex sentences, and deals with a reasonable amount of word order freeness. The complex agglutinative morphology of Turkish lexical structures is handled using a separate two-level morphological analyzer, which has been incorporated into the syntactic parser. After a discussion of the key relevant issues regarding Turkish grammar, we discuss aspects of our system and present results from our implementation. Our initial results suggest that our system can parse about 82% of the sentences directly and almost all the remaining with very minor pre-editing. © 1995 Kluwer Academic Publishers.
Open Access
Recognizing faces in news photographs on the web
(IEEE, 2009-09) Zitouni, Hilal; Bulut, Muhammed Fatih; Duygulu, Pınar
We propose a graph based method in order to recognize the faces that appear on the web using a small training set. First, relevant pictures of the desired people are collected by querying the name in a text based search engine in order to construct the data set. Then, detected faces in these photographs are represented using SIFT features extracted from facial features. The similarities of faces are represented in a graph which is then used in random walk with restart algorithm to provide links between faces. Those links are used for recognition by using two different methods. © 2009 IEEE.
Open Access
Using lexical chains for keyword extraction
(Elsevier Ltd, 2007-11) Ercan, G.; Cicekli, I.
Keywords can be considered as condensed versions of documents and short forms of their summaries. In this paper, the problem of automatic extraction of keywords from documents is treated as a supervised learning task. A lexical chain holds a set of semantically related words of a text and it can be said that a lexical chain represents the semantic content of a portion of the text. Although lexical chains have been extensively used in text summarization, their usage for keyword extraction problem has not been fully investigated. In this paper, a keyword extraction technique that uses lexical chains is described, and encouraging results are obtained. © 2007 Elsevier Ltd. All rights reserved.
Open Access
VISPool: enhancing transformer encoders with vector visibility graph neural networks
(Association for Computational Linguistics, 2024-08-16) Alikaşifoğlu, Tuna; Aras, Arda Can; Koç, Aykut
The emergence of transformers has revolutionized natural language processing (NLP), as evidenced in various NLP tasks. While graph neural networks (GNNs) show recent promise in NLP, they are not standalone replacements for transformers. Rather, recent research explores combining transformers and GNNs. Existing GNN-based approaches rely on static graph construction methods requiring excessive text processing, and most of them are not scalable with the increasing document and word counts. We address these limitations by proposing a novel dynamic graph construction method for text documents based on vector visibility graphs (VVGs) generated from transformer output. Then, we introduce visibility pooler (VISPool), a scalable model architecture that seamlessly integrates VVG convolutional networks into transformer pipelines. We evaluate the proposed model on the General Language Understanding Evaluation (GLUE) benchmark datasets. VISPool outperforms the baselines with less trainable parameters, demonstrating the viability of the visibility-based graph construction method for enhancing transformers with GNNs. © 2024 Association for Computational Linguistics.