Browsing by Subject "Information retrieval"

Now showing 1 - 20 of 88

Open Access
Adaptive time-to-live strategies for query result caching in web search engines
(2012) Alıcı, Sadiye; Altıngövde, I. Ş.; Rıfat, Özcan; Cambazoğlu, B. Barla; Ulusoy, Özgür
An important research problem that has recently started to receive attention is the freshness issue in search engine result caches. In the current techniques in literature, the cached search result pages are associated with a fixed time-to-live (TTL) value in order to bound the staleness of search results presented to the users, potentially as part of a more complex cache refresh or invalidation mechanism. In this paper, we propose techniques where the TTL values are set in an adaptive manner, on a per-query basis. Our results show that the proposed techniques reduce the fraction of stale results served by the cache and also decrease the fraction of redundant query evaluations on the search engine backend compared to a strategy using a fixed TTL value for all queries. © 2012 Springer-Verlag Berlin Heidelberg.
Open Access
Authorship attribution: performance of various features and classification methods
(IEEE, 2007-11) Bozkurt, İlker Nadi; Bağlıoğlu, Özgür; Uyar, Erkan
Authorship attribution is the process of determining the writer of a document. In literature, there are lots of classification techniques conducted in this process. In this paper we explore information retrieval methods such as tf-idf structure with support vector machines, parametric and nonparametric methods with supervised and unsupervised (clustering) classification techniques in authorship attribution. We performed various experiments with articles gathered from Turkish newspaper Milliyet. We performed experiments on different features extracted from these texts with different classifiers, and combined these results to improve our success rates. We identified which classifiers give satisfactory results on which feature sets. According to experiments, the success rates dramatically changes with different combinations, however the best among them are support vector classifier with bag of words, and Gaussian with function words. ©2007 IEEE.
Open Access
Automatic detection of geospatial objects using multiple hierarchical segmentations
(Institute of Electrical and Electronics Engineers, 2008-07) Akçay, H. G.; Aksoy, S.
The object-based analysis of remotely sensed imagery provides valuable spatial and structural information that is complementary to pixel-based spectral information in classification. In this paper, we present novel methods for automatic object detection in high-resolution images by combining spectral information with structural information exploited by using image segmentation. The proposed segmentation algorithm uses morphological operations applied to individual spectral bands using structuring elements in increasing sizes. These operations produce a set of connected components forming a hierarchy of segments for each band. A generic algorithm is designed to select meaningful segments that maximize a measure consisting of spectral homogeneity and neighborhood connectivity. Given the observation that different structures appear more clearly at different scales in different spectral bands, we describe a new algorithm for unsupervised grouping of candidate segments belonging to multiple hierarchical segmentations to find coherent sets of segments that correspond to actual objects. The segments are modeled by using their spectral and textural content, and the grouping problem is solved by using the probabilistic latent semantic analysis algorithm that builds object models by learning the object-conditional probability distributions. The automatic labeling of a segment is done by computing the similarity of its feature distribution to the distribution of the learned object models using the Kullback-Leibler divergence. The performances of the unsupervised segmentation and object detection algorithms are evaluated qualitatively and quantitatively using three different data sets with comparative experiments, and the results show that the proposed methods are able to automatically detect, group, and label segments belonging to the same object classes. © 2008 IEEE.
Open Access
Automatic detection of salient objects and spatial relations in videos for a video database system
(Elsevier BV, 2008-10) Sevilmiş, T.; Baştan M.; Güdükbay, Uğur; Ulusoy, Özgür
Multimedia databases have gained popularity due to rapidly growing quantities of multimedia data and the need to perform efficient indexing, retrieval and analysis of this data. One downside of multimedia databases is the necessity to process the data for feature extraction and labeling prior to storage and querying. Huge amount of data makes it impossible to complete this task manually. We propose a tool for the automatic detection and tracking of salient objects, and derivation of spatio-temporal relations between them in video. Our system aims to reduce the work for manual selection and labeling of objects significantly by detecting and tracking the salient objects, and hence, requiring to enter the label for each object only once within each shot instead of specifying the labels for each object in every frame they appear. This is also required as a first step in a fully-automatic video database management system in which the labeling should also be done automatically. The proposed framework covers a scalable architecture for video processing and stages of shot boundary detection, salient object detection and tracking, and knowledge-base construction for effective spatio-temporal object querying. © 2008 Elsevier B.V. All rights reserved.
Open Access
Automatic performance evaluation of Web search engines
(Elsevier, 2004) Can, F.; Nuray, R.; Sevdik, A. B.
Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. In this study we introduce automatic Web search engine evaluation method as an efficient and effective assessment tool of such systems. The experiments based on eight Web search engines, 25 queries, and binary user relevance judgments show that our method provides results consistent with human-based evaluations. It is shown that the observed consistencies are statistically significant. This indicates that the new method can be successfully used in the evaluation of Web search engines. © 2003 Elsevier Ltd. All rights reserved.
Open Access
Automatic ranking of information retrieval systems using data fusion
(Elsevier Ltd, 2006-05) Nuray, R.; Can, F.
Measuring effectiveness of information retrieval (IR) systems is essential for research and development and for monitoring search quality in dynamic environments. In this study, we employ new methods for automatic ranking of retrieval systems. In these methods, we merge the retrieval results of multiple systems using various data fusion algorithms, use the top-ranked documents in the merged result as the "(pseudo) relevant documents," and employ these documents to evaluate and rank the systems. Experiments using Text REtrieval Conference (TREC) data provide statistically significant strong correlations with human-based assessments of the same systems. We hypothesize that the selection of systems that would return documents different from the majority could eliminate the ordinary systems from data fusion and provide better discrimination among the documents and systems. This could improve the effectiveness of automatic ranking. Based on this intuition, we introduce a new method for the selection of systems to be used for data fusion. For this purpose, we use the bias concept that measures the deviation of a system from the norm or majority and employ the systems with higher bias in the data fusion process. This approach provides even higher correlations with the human-based results. We demonstrate that our approach outperforms the previously proposed automatic ranking methods. © 2005 Elsevier Ltd. All rights reserved.
Open Access
BilKristal 4.0: A tool for crystal parameters extraction and defect quantification
(Elsevier, 2015) Okuyan, E.; Okuyan, C.
In this paper, we present a revised version of BilKristal 3.0 tool. Raycast screenshot functionality is added to provide improved visual analysis. We added atomic distance analysis functionality to assess crystalline defects. We improved visualization capabilities by adding high level cut function definitions. Discovered bugs are fixed and small performance optimizations are made. © 2015 Elsevier B.V. All rights reserved.
Open Access
Chat mining for gender prediction
(Springer, 2006-10) Küçükyılmaz, Tayfun; Cambazoğlu, B. Barla; Aykanat, Cevdet; Can, Fazlı
The aim of this paper is to investigate the feasibility of predicting the gender of a text document's author using linguistic evidence. For this purpose, term- and style-based classification techniques are evaluated over a large collection of chat messages. Prediction accuracies up to 84.2% are achieved, illustrating the applicability of these techniques to gender prediction. Moreover, the reverse problem is exploited, and the effect of gender on the writing style is discussed. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
ChiBE: interactive visualization and manipulation of BioPAX pathway models
(Oxford University Press, 2010-02-01) Babur, Özgün; Doğrusöz, Uğur; Demir, Emek; Sander, C.
SUMMARY: Representing models of cellular processes or pathways in a graphically rich form facilitates interpretation of biological observations and generation of new hypotheses. Solving biological problems using large pathway datasets requires software that can combine data mapping, querying and visualization as well as providing access to diverse data resources on the Internet. ChiBE is an open source software application that features user-friendly multi-view display, navigation and manipulation of pathway models in BioPAX format. Pathway views are rendered in a feature-rich format, and may be laid out and edited with state-of-the-art visualization methods, including compound or nested structures for visualizing cellular compartments and molecular complexes. Users can easily query and visualize pathways through an integrated Pathway Commons query tool and analyze molecular profiles in pathway context. AVAILABILITY: http://www.bilkent.edu.tr/%7Ebcbi/chibe.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Open Access
Client-server synchronization and buffering for variable rate multimedia retrievals
(1996) Hui J.Y.; Karasan, E.; Li J.; Zhang, J.
We consider the use of large buffers and feedback as a mechanism to maintain loosely coupled synchronization between a multimedia server and a client. The multimedia stream is modeled as a fluid flow through rate controlled valves and buffers with multiple thresholds. These thresholds are used to control the rates upstream. The quality of service for the multimedia connection is characterized in terms of the jitter in the received media stream due to buffer underflow and overflow. This quality of service is used to exercise rate and admission control in the presence of congestion. The feedback mechanism is, implemented in GRAMS, an adaptive multimedia client-server system. Experimental statistics are gathered for the purpose of traffic engineering. We employ a fluid flow and first passage time analysis to understand the traffic process through the pipelines and the buffers and to estimate the amount of signaling required by the feedback mechanism.
Open Access
Conceptfusion: A flexible scene classification framework
(Springer, 2015-03-04) Saraç, Mustafa İlker; işcen, Ahmet; Gölge, Eren; Duygulu, Pınar
We introduce ConceptFusion, a method that aims high accuracy in categorizing large number of scenes, while keeping the model relatively simpler and efficient for scalability. The proposed method combines the advantages of both low-level representations and high-level semantic categories, and eliminates the distinctions between different levels through the definition of concepts. The proposed framework encodes the perspectives brought through different concepts by considering them in concept groups that are ensembled for the final decision. Experiments carried out on benchmark datasets show the effectiveness of incorporating concepts in different levels with different perspectives. © Springer International Publishing Switzerland 2015.
Open Access
Constrained min-cut replication for K-way hypergraph partitioning
(Institute for Operations Research and the Management Sciences (I N F O R M S), 2014) Yazici V.; Aykanat, Cevdet
Replication is a widely-used technique in information retrieval and database systems for providing fault tolerance and reducing parallelization and processing costs. Combinatorial models based on hypergraph partitioning are proposed for various problems arising in information retrieval and database systems. We consider the possibility of using vertex replication to improve the quality of hypergraph partitioning. In this study, we focus on the constrained min-cut replication (CMCR) problem, where we are initially given a maximum replication capacity and a K-way hypergraph partition with an initial imbalance ratio. The objective in the CMCR problem is finding the optimal vertex replication sets for each part of the given partition such that the initial cut size of the partition is minimized, where the initial imbalance is either preserved or reduced under the given replication capacity constraint. In this study, we present a complexity analysis of the CMCR problem and propose a model based on a unique blend of coarsening and integer linear programming (ILP) schemes. This coarsening algorithm is derived from a novel utilization of the Dulmage-Mendelsohn decomposition. Experiments show that the ILP formulation coupled with the Dulmage-Mendelsohn decomposition-based coarsening provides high quality results in practical execution times for reducing the cut size of a given K-way hypergraph partition. © 2014 INFORMS.
Open Access
Context learning in Okapi
(1997) Göker, A.
A user who makes repeated use of a retrieval system may be assumed to have a context which is common to successive uses (even if the immediate need differs). An IR system which could make use of this context may be better able to match the specific need. A machine-learning approach to inferring the user's context is described, and the results of an evaluation experiment are given. There appears to be scope for IR systems to operate in this way.
Open Access
Cost-aware strategies for query result caching in Web search engines
(Association for Computing Machinery, 2011) Ozcan, R.; Altingovde, I. S.; Ulusoy, O.
Search engines and large-scale IR systems need to cache query results for efficiency and scalability purposes. Static and dynamic caching techniques (as well as their combinations) are employed to effectively cache query results. In this study, we propose cost-aware strategies for static and dynamic caching setups. Our research is motivated by two key observations: (i) query processing costs may significantly vary among different queries, and (ii) the processing cost of a query is not proportional to its popularity (i.e., frequency in the previous logs). The first observation implies that cache misses have different, that is, nonuniform, costs in this context. The latter observation implies that typical caching policies, solely based on query popularity, can not always minimize the total cost. Therefore, we propose to explicitly incorporate the query costs into the caching policies. Simulation results using two large Web crawl datasets and a real query log reveal that the proposed approach improves overall system performance in terms of the average query execution time. © 2011 ACM.
Open Access
Cover coefficient-based multi-document summarization
(Springer, 2009-04) Ercan, Gönenç; Can, Fazlı
In this paper we present a generic, language independent multi-document summarization system forming extracts using the cover coefficient concept. Cover Coefficient-based Summarizer (CCS) uses similarity between sentences to determine representative sentences. Experiments indicate that CCS is an efficient algorithm that is able to generate quality summaries online. © Springer-Verlag Berlin Heidelberg 2009.
Open Access
A database model for querying visual surveillance videos by integrating semantic and low-level features
(Springer, Berlin, Heidelberg, 2005) Şaykol, Ediz; Güdükbay, Uğur; Ulusoy, Özgür
Automated visual surveillance has emerged as a trendy application domain in recent years. Many approaches have been developed on video processing and understanding. Content-based access to surveillance video has become a challenging research area. The results of a considerable amount of work dealing with automated access to visual surveillance have appeared in the literature. However, the event models and the content-based querying and retrieval components have significant gaps remaining unfilled. To narrow these gaps, we propose a database model for querying surveillance videos by integrating semantic and low-level features. In this paper, the initial design of the database model, the query types, and the specifications of its query language are presented. © Springer-Verlag Berlin Heidelberg 2005.
Open Access
Değişen kullanıcı alışkanlıkları doğrultusunda bir web keşif aracı model önerisi
(2017) Kaya, Ebru
Günümüz kütüphane kullanıcılarının ortak beklentisi ihtiyaç duydukları bilgiye en hızlı ve kolay şekilde erişebilmektir. Son yıllarda internet kullanımının her alanda artması ve erişim hızının yükselmesi, kullanıcıların internet üzerinde yapılan tarama alışkanlıklarında değişiklik yaratmıştır. Bilgi teknolojilerinde yaşanan gelişmeye bağlı olarak kullanıcılar internet, bilgisayar, cep telefonu gibi dijital medya araçlarını kullanan bir grup haline gelmiştir. Bu grubun en belirgin özelliği bilgiye en hızlı ve en kolay şekilde erişmek istemesidir. Benzer şekilde teknoloji alanındaki yenilikler, kütüphane koleksiyonlarındaki kaynak türlerinde çeşitliliğe ve bilgi erişim konusunda yeni olanaklara imkân sağlamıştır. Günümüzde üniversite kütüphanelerinde ulusal ve uluslararası düzeyde birbirinden farklı kütüphane katalogları ve konularına göre farklılık gösteren değişik arayüzlere sahip veri tabanları bulunmaktadır. Bu nedenle kullanıcılar, birbirinden farklı kullanıcı arayüzüne sahip veri tabanlarını ve kütüphane kataloglarını kullanmak yerine Google ve benzeri arama motorları üzerinde tarama yapmayı tercih etmektedirler. Günümüz bilgi ortamında hızla artan kaynak sayısını yönetebilmek, bilgiyi kullanıcıya eş zamanlı olarak eriştirebilmek, birbiriyle bağlantılı kaynakları bir arada sunabilmek için web tabanlı bilgi erişim araçlarını kullanmayı zorunlu hale getirmektedir. Böylelikle, kullanıcılara tek arayüzden tüm kütüphane kaynaklarına hızlı bir şekilde erişim imkânı sunulabilmektedir. Yapılan araştırmanın amacı, kullanıcıların bilgi kaynaklarına erişimlerini kolaylaştıracak bir web tabanlı kaynak keşif aracının ana öğelerini belirlemek ve bu alt yapıya uygun olarak yönetimsel ve işlevsel uygulama adımlarını içeren bir model önerisi sunmaktır. Bu amaçla kullanıcı beklentilerini tespit etmek için araştırmacılardan ve kütüphane daire başkanlarından anket yöntemiyle veri toplanmıştır. Araştırmadan elde edilen veriler değerlendirilmiş ve araştırmanın hipotezleri doğrulanmıştır. Buna göre, kullanıcılar Google gibi tek ve basit bir arama kutucuğu üzerinden ilgililik durumuna göre sıralanabilen ve listelenen sonuçlar üzerinden doğrudan erişebildikleri kaynaklara yönelmektedirler. Mevcut web keşif araçları araştırmacıların bilgi ihtiyacını gidermede yetersiz kalmaktadır. Ülkemizdeki üniversite kütüphanelerinde kullanıcıların bilgi ihtiyacını karşılayacak, tüm bilgi kaynaklarının taranabildiği hızlı ve basit tek arayüz sunan, ticari olmayan bir platforma ihtiyaç duyulmaktadır. Bu kanıtlanan hipotezler ışığında ihtiyacı karşılayacak olan web keşif aracının genel öğeleri saptanmış; yönetimsel ve işlevsel açıdan uygulama adımları model önerisi olarak sunulmuştur.
Unknown
A distributed and measurement-based framework against free riding in peer-to-peer networks
(IEEE, 2004) Karakaya, Murat; Korpeoglu, İbrahim; Ulusoy, Özgür
In this paper, we propose a distributed and measurement-based method to reduce the degree of free riding in P2P networks. We primarily focus on developing schemes to locate free riders and on determining policies that can be used to take actions against them. Our proposed schemes require each peer to monitor its neighboring peers, make decisions if they exhibit any kind of free riding, and take appropriate actions if required.
Unknown
Diversity and novelty in information retrieval
(ACM, 2013-07-08) Santos, R. L. T.; Castells, P.; Altıngövde, I. S.; Can, Fazlı
This tutorial aims to provide a unifying account of current research on diversity and novelty in different IR domains, namely, in the context of search engines, recommender sys- tems, and data streams.
Unknown
Diversity based Relevance Feedback for Time Series Search
(2013) Eravci, B.; Ferhatosmanoglu H.
We propose a diversity based relevance feedback approach for time series data to improve the accuracy of search results. We first develop the concept of relevance feedback for time series based on dual-tree complex wavelet (CWT) and SAX based approaches. We aim to enhance the search quality by incorporating diversity in the results presented to the user for feedback. We then propose a method which utilizes the representation type as part of the feedback, as opposed to a human choosing based on a preprocessing or training phase. The proposed methods utilize a weighting to handle the relevance feedback of important properties for both single and multiple representation cases. Our experiments on a large variety of time series data sets show that the proposed diversity based relevance feedback improves the retrieval performance. Results confirm that representation feedback incorporates item diversity implicitly and achieves good performance even when using simple nearest neighbor as the retrieval method. To the best of our knowledge, this is the first study on diversification of time series search to improve retrieval accuracy and representation feedback. © 2013 VLDB Endowment.