Browsing by Subject "Query languages"

Now showing 1 - 20 of 36

Open Access
Algorithms for within-cluster searches using inverted files
(Springer, 2006-11) Altıngövde, İsmail Şengör; Can, Fazlı; Ulusoy, Özgür
Information retrieval over clustered document collections has two successive stages: first identifying the best-clusters and then the best-documents in these clusters that are most similar to the user query. In this paper, we assume that an inverted file over the entire document collection is used for the latter stage. We propose and evaluate algorithms for within-cluster searches, i.e., to integrate the best-clusters with the best-documents to obtain the final output including the highest ranked documents only from the best-clusters. Our experiments on a TREC collection including 210,158 documents with several query sets show that an appropriately selected integration algorithm based on the query length and system resources can significantly improve the query evaluation efficiency. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
An archiving model for a hierarchical information storage environment
(Elsevier, 2000) Moinzadeh, K.; Berk, E.
We consider an archiving model for a database consisting of secondary and tertiary storage devices in which the query rate for a record declines as it ages. We propose a `dynamic' archiving policy based on the number of records and the age of the records in the secondary device. We analyze the cases when the number of new records inserted in the system over time are either constant or follow a Poisson process. For both scenarios, we characterize the properties of the policy parameters and provide optimization results when the objective is to minimize the average record retrieval times. Furthermore, we propose a simple heuristic method for obtaining near-optimal policies in large databases when the record query rate declines exponentially with time. The e ectiveness of the heuristic is tested via a numerical experiment. Finally, we examine the behavior of performance measures such as the average record retrieval time and the hit rate as system parameters are varied.
Open Access
Automatic Ranking of Retrieval Systems in Imperfect Environments
(ACM, 2003-07-08) Nuray, Rabia; Can, Fazlı
The empirical investigation of the effectiveness of information retrieval (IR) systems requires a test collection, a set of query topics, and a set of relevance judgments made by human assessors for each query. Previous experiments show that differences in human relevance assessments do not affect the relative performance of retrieval systems. Based on this observation, we propose and evaluate a new approach to replace the human relevance judgments by an automatic method. Ranking of retrieval systems with our methodology correlates positively and significantly with that of human-based evaluations. In the experiments, we assume a Web-like imperfect environment: the indexing information for all documents is available for ranking, but some documents may not be available for retrieval. Such conditions can be due to document deletions or network problems. Our method of simulating imperfect environments can be used for Web search engine assessment and in estimating the effects of network conditions (e.g., network unreliability) on IR system performance.
Open Access
Automatic rule learning exploiting morphological features for named entity recognition in Turkish
(2011) Tatar, S.; Cicekli I.
Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language. © The Author(s) 2011.
Open Access
BilVideo: Design and implementation of a video database management system
(Springer, 2005) Dönderler, M. E.; Şaykol, E.; Arslan, U.; Ulusoy, Özgür; Güdükbay, Uğur
With the advances in information technology, the amount of multimedia data captured, produced, and stored is increasing rapidly. As a consequence, multimedia content is widely used for many applications in today's world, and hence, a need for organizing this data, and accessing it from repositories with vast amount of information has been a driving stimulus both commercially and academically. In compliance with this inevitable trend, first image and especially later video database management systems have attracted a great deal of attention, since traditional database systems are designed to deal with alphanumeric information only, thereby not being suitable for multimedia data. In this paper, a prototype video database management system, which we call BilVideo, is introduced. The system architecture of BilVideo is original in that it provides full support for spatio-temporal queries that contain any combination of spatial, temporal, object-appearance, external-predicate, trajectory-projection, and similarity-based object-trajectory conditions by a rule-based system built on a knowledge-base, while utilizing an object-relational database to respond to semantic (keyword, event/activity, and category-based), color, shape, and texture queries. The parts of BilVideo (Fact-Extractor, Video-Annotator, its Web-based visual query interface, and its SQL-like textual query language) are presented, as well. Moreover, our query processing strategy is also briefly explained. © 2005 Springer Science + Business Media, Inc.
Open Access
A comparison of epidemic algorithms in wireless sensor networks
(Elsevier BV, 2006-08-21) Akdere, M.; Bilgin, C. C.; Gerdaneri, O.; Korpeoglu, I.; Ulusoy, O.; Çetintemel, U.
We consider the problem of reliable data dissemination in the context of wireless sensor networks. For some application scenarios, reliable data dissemination to all nodes is necessary for propagating code updates, queries, and other sensitive information in wireless sensor networks. Epidemic algorithms are a natural approach for reliable distribution of information in such ad hoc, decentralized, and dynamic environments. In this paper we show the applicability of epidemic algorithms in the context of wireless sensor environments, and provide a comparative performance analysis of the three variants of epidemic algorithms in terms of message delivery rate, average message latency, and messaging overhead on the network. © 2006 Elsevier B.V. All rights reserved.
Open Access
A comparison of historical relational query languages
(ASME, 1994-07) Tansel, Abdullah Uz; Tin, E.
We introduce a historical relational data model in which N1NF relations are used and 1-level of nesting is allowed. Attributes can either be atomic or temporal atom. An atomic attribute represents a time invariant attribute. A temporal atom consists of two components, a value and a temporal set, which is a set of times denoting the validity period of the value. We define a relational tuple calculus for this model. We follow a comparative approach towards completeness of historical query languages.
Open Access
A database model for querying visual surveillance videos by integrating semantic and low-level features
(Springer, Berlin, Heidelberg, 2005) Şaykol, Ediz; Güdükbay, Uğur; Ulusoy, Özgür
Automated visual surveillance has emerged as a trendy application domain in recent years. Many approaches have been developed on video processing and understanding. Content-based access to surveillance video has become a challenging research area. The results of a considerable amount of work dealing with automated access to visual surveillance have appeared in the literature. However, the event models and the content-based querying and retrieval components have significant gaps remaining unfilled. To narrow these gaps, we propose a database model for querying surveillance videos by integrating semantic and low-level features. In this paper, the initial design of the database model, the query types, and the specifications of its query language are presented. © Springer-Verlag Berlin Heidelberg 2005.
Open Access
A distributed and measurement-based framework against free riding in peer-to-peer networks
(IEEE, 2004) Karakaya, Murat; Korpeoglu, İbrahim; Ulusoy, Özgür
In this paper, we propose a distributed and measurement-based method to reduce the degree of free riding in P2P networks. We primarily focus on developing schemes to locate free riders and on determining policies that can be used to take actions against them. Our proposed schemes require each peer to monitor its neighboring peers, make decisions if they exhibit any kind of free riding, and take appropriate actions if required.
Open Access
Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems
(Springer, 2006-11) Cambazoğlu, B. Barla; Çatal, A.; Aykanat, Cevdet
Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate the effect of these two index partitioning schemes on query processing. We conduct experiments on a 32-node PC cluster, considering the case where index is completely stored in disk. Performance results are reported for a large (30 GB) document collection using an MPI-based parallel query processing implementation. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
Effective early termination techniques for text similarity join operator
(Springer, Berlin, Heidelberg, 2005) Özalp, S. A.; Ulusoy, Özgür
Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics. © Springer-Verlag Berlin Heidelberg 2005.
Open Access
Efficiency and effectiveness of query processing in cluster-based retrieval
(Elsevier, 2004) Can, F.; Altingövde I.S.; Demir, E.
Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C3M), and the Financial Times database of TREC containing 210158 documents of size 564 MB defined by 229748 terms with total of 29545234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering. © 2003 Elsevier Ltd. All rights reserved.
Open Access
An efficient computation model for coarse grained reconfigurable architectures and its applications to a reconfigurable computer
(IEEE, 2010-07) Atak, Oğuzhan; Atalar, Abdullah
The mapping of high level applications onto the coarse grained reconfigurable architectures (CGRA) are usually performed manually by using graphical tools or when automatic compilation is used, some restrictions are imposed to the high level code. Since high level applications do not contain parallelism explicitly, mapping the application directly to CGRA is very difficult. In this paper, we present a middle level Language for Reconfigurable Computing (LRC). LRC is similar to assembly languages of microprocessors, with the difference that parallelism can be coded in LRC. LRC is an efficient language for describing control data flow graphs. Several applications such as FIR, multirate, multichannel filtering, FFT, 2D-IDCT, Viterbi decoding, UMTS and CCSDC turbo decoding, Wimax LDPC decoding are coded in LRC and mapped to the Bilkent Reconfigurable Computer with a performance (in terms of cycle count) close to that of ASIC implementations. The applicability of the computation model to a CGRA having low cost interconnection network has been validated by using placement and routing algorithms. © 2010 IEEE.
Open Access
Exploiting index pruning methods for clustering XML collections
(Springer, Berlin, Heidelberg, 2010) Altıngövde, İsmail Şengör; Atılgan, Duygu; Ulusoy, Özgür
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics. © 2010 Springer-Verlag Berlin Heidelberg.
Open Access
Exploiting query views for static index pruning in web search engines
(ACM, 2009-11) Altıngövde, İsmail Şengör; Özcan, Rıfat; Ulusoy, Özgür
We propose incorporating query views in a number of static pruning strategies, namely term-centric, document-centric and access-based approaches. These query-view based strategies considerably outperform their counterparts for both disjunctive and conjunctive query processing in Web search engines. Copyright 2009 ACM.
Open Access
Finding faces in news photos using both face and name information
(IEEE, 2006) Özkan, Derya; Duygulu, Pınar
We propose a method to associate names and faces for querying people in large news photo collections. On the assumption that a person's face is likely to appear when his/her name is mentioned in the caption, first all the faces associated with the query name are selected, Among these faces, there could be many faces corresponding to the queried person in different conditions, poses and times, but there could also be other faces corresponding to other people in the caption or some non-face images due to the errors in the face detection method used, However, in most cases, the number of corresponding faces of the queried person will be large, and these faces will be more similar to each other than to others. When the similarities of faces are represented in a graph structure, the set of most similar faces will be the densest component in the graph. In this study, we propose a graph-based method to find the most similar subset among the set of possible faces associated with the query name, where the most similar subset is likely to correspond to the faces of the queried person. © 2006 IEEE.
Open Access
First large-scale information retrieval experiments on Turkish texts
(ACM, 2006-08) Can, Fazlı; Koçberber, Seyit; Balcık, Erman; Kaynak, Cihan; Öcalan, H. Çağdaş; Vursavaş, Onur M.
We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.
Open Access
A graph based approach for naming faces in news photos
(I E E E Computer Society, 2006) Ozkan, D.; Duygulu, P.
We propose a method to associate names and faces for querying people in large news photo collections. On the assumption that a person's face is likely to appear when his/her name is mentioned in the caption, first all the faces associated with the query name are selected. Among these faces, there could be many faces corresponding to the queried person in different conditions, poses and times, but there could also be other faces corresponding to other people in the caption or some non-face images due to the errors in the face detection method used. However, in most cases, the number of corresponding faces of the queried person will be large, and these faces will be more similar to each other than to others. In this study, we propose a graph based method to find the most similar subset among the set of possible faces associated with the query name, where the most similar subset is likely to correspond to the faces of the queried person. When the similarity of faces are represented in a graph structure, the set of most similar faces will be the densest component in the graph. We represent the similarity of faces using SIFT descriptors. The matching interest points on two faces are decided after the application of two constraints, namely the geometrical constraint and the unique match constraint. The average distance of the matching points are used to construct the similarity graph. The most similar set of faces is then found based on a greedy densest component algorithm. The experiments are performed on thousands of news photographs taken in real life conditions and, therefore, having a large variety of poses, illuminations and expressions. © 2006 IEEE.
Open Access
HandVR: a hand-gesture-based interface to a video retrieval system
(Springer U K, 2015) Genç, S.; Baştan M.; Güdükbay, Uğur; Atalay, V.; Ulusoy, Özgür
Using one’s hands in human–computer interaction increases both the effectiveness of computer usage and the speed of interaction. One way of accomplishing this goal is to utilize computer vision techniques to develop hand-gesture-based interfaces. A video database system is one application where a hand-gesture-based interface is useful, because it provides a way to specify certain queries more easily. We present a hand-gesture-based interface for a video database system to specify motion and spatiotemporal object queries. We use a regular, low-cost camera to monitor the movements and configurations of the user’s hands and translate them to video queries. We conducted a user study to compare our gesture-based interface with a mouse-based interface on various types of video queries. The users evaluated the two interfaces in terms of different usability parameters, including the ease of learning, ease of use, ease of remembering (memory), naturalness, comfortable use, satisfaction, and enjoyment. The user study showed that querying video databases is a promising application area for hand-gesture-based interfaces, especially for queries involving motion and spatiotemporal relations.
Open Access
A histogram-based approach for object-based query-by-shape-and-color in image and video databases
(Elsevier, 2005) Şaykol, E.; Güdükbay, Uğur; Ulusoy, Özgür
Considering the fact that querying by low-level object features is essential in image and video data, an efficient approach for querying and retrieval by shape and color is proposed. The approach employs three specialized histograms, (i.e. distance, angle, and color histograms) to store feature-based information that is extracted from objects. The objects can be extracted from images or video frames. The proposed histogram-based approach is used as a component in the query-by-feature subsystem of a video database management system. The color and shape information is handled together to enrich the querying capabilities for content-based retrieval. The evaluation of the retrieval effectiveness and the robustness of the proposed approach is presented via performance experiments. © 2005 Elsevier Ltd. All rights reserved.