Browsing by Subject "Knowledge management"

Now showing 1 - 16 of 16

Open Access
An automatic approach to construct domain-specific web portals
(ACM, 2007-11) Altıngövde, İsmail Şengör; Özcan, Rıfat; Çetintaş, Süleyman; Yılmaz, Hakan; Ulusoy, Özgür
We describe the architecture of an automatic domain-specific Web portal construction system. The system has three major components: i) a focused crawler that collects the domain-specific pages on the Web, ii) an information extraction engine that extracts useful fields from these Web pages, and iii) a query engine that allows both typical keyword based queries on the pages and advanced queries on the extracted data fields. We present a prototype system that works for the course homepages domain on the Web. A user study with the prototype system shows that our approach produces high quality results and achieves better precision figures than the typical keyword based search. Copyright 2007 ACM.
Open Access
BFSig: leveraging file significance in bus factor estimation
(Association for Computing Machinery, Inc., 2023) Haratian, Vahid; Evtikhiev, M.; Derakhshanfar, P.; Tüzün, Eray; Kovalenko, V.
Software projects experience the departure of developers due to various reasons. As developers are one of the main sources of knowl edge in software projects, their absence will inevitably result in a certain degree of knowledge depletion. Bus Factor (BF) is a met ric to evaluate how this knowledge loss can affect the project’s continuity
Open Access
Bus factor explorer
(IEEE, 2023-11-08) Klimov, E.; Ahmed, Muhammad Umair; Sviridov, N.; Derakhshanfar, P.; Tüzün, Eray; Kovalenko, V.
Bus factor (BF) is a metric that tracks knowledge distribution in a project. It is the minimal number of engineers that have to leave for a project to stall. Despite the fact that there are several algorithms for calculating the bus factor, only a few tools allow easy calculation of bus factor and convenient analysis of results for projects hosted on Git-based providers. We introduce Bus Factor Explorer, a web application that provides an interface and an API to compute, export, and explore the Bus Factor metric via treemap visualization, simulation mode, and chart editor. It supports repositories hosted on GitHub and enables functionality to search repositories in the interface and process many repositories at the same time. Our tool allows users to identify the files and subsystems at risk of stalling in the event of developer turnover by analyzing the VCS history. The application and its source code are publicly available on GitHub at https://github.com/JetBrains-Research/bus-factor-explorer. The demonstration video can be found on YouTube: https://youtu.be/uIoV79N14z8
Open Access
Characterizing web search queries that match very few or no results
(ACM, 2012-11) Altıngövde, İ. Ş.; Blanco, R.; Cambazoğlu, B. B.; Özcan, Rıfat; Sarıgil, Erdem; Ulusoy, Özgür
Despite the continuous efforts to improve the web search quality, a non-negligible fraction of user queries end up with very few or even no matching results in leading web search engines. In this work, we provide a detailed characterization of such queries based on an analysis of a real-life query log. Our experimental setup allows us to characterize the queries with few/no results and compare the mechanisms employed by the major search engines in handling them.
Open Access
CoDet: Sentence-based containment detection in news corpora
(ACM, 2011) Varol, Emre; Can, Fazlı; Aykanat, Cevdet; Kaya, Oğuz
We study a generalized version of the near-duplicate detection problem which concerns whether a document is a subset of another document. In text-based applications, document containment can be observed in exact-duplicates, near-duplicates, or containments, where the first two are special cases of the third. We introduce a novel method, called CoDet, which focuses particularly on this problem, and compare its performance with four well-known near-duplicate detection methods (DSC, full fingerprinting, I-Match, and SimHash) that are adapted to containment detection. Our method is expandable to different domains, and especially suitable for streaming news. Experimental results show that CoDet effectively and efficiently produces remarkable results in detecting containments. © 2011 ACM.
Open Access
A color-based face tracking algorithm for enhancing interaction with mobile devices
(Springer, 2010-05) Bulbul, A.; Cipiloglu, Z.; Capin, T.
A color-based face tracking algorithm is proposed to be used as a human-computer interaction tool on mobile devices. The solution provides a natural means of interaction enabling a motion parallax effect in applications. The algorithm considers the characteristics of mobile useconstrained computational resources and varying environmental conditions. The solution is based on color comparisons and works on images gathered from the front camera of a device. In addition to color comparisons, the coherency of the facial pixels is considered in the algorithm. Several applications are also demonstrated in this work, which use the face position to determine the viewpoint in a virtual scene, or for browsing large images. The accuracy of the system is tested under different environmental conditions such as lighting and background, and the performance of the system is measured in different types of mobile devices. According to these measurements the system allows for accurate (7% RMS error) face tracking in real time (20-100 fps). © Springer-Verlag 2010.
Open Access
Differential privacy with bounded priors: Reconciling utility and privacy in genome-wide association studies
(ACM, 2015-10) Tramèr, F.; Huang, Z.; Hubaux J.-P.; Ayday, Erman
Differential privacy (DP) has become widely accepted as a rigorous definition of data privacy, with stronger privacy guarantees than traditional statistical methods. However, recent studies have shown that for reasonable privacy budgets, differential privacy significantly affects the expected utility. Many alternative privacy notions which aim at relaxing DP have since been proposed, with the hope of providing a better tradeoff between privacy and utility. At CCS'13, Li et al. introduced the membership privacy framework, wherein they aim at protecting against set membership disclosure by adversaries whose prior knowledge is captured by a family of probability distributions. In the context of this framework, we investigate a relaxation of DP, by considering prior distributions that capture more reasonable amounts of background knowledge. We show that for different privacy budgets, DP can be used to achieve membership privacy for various adversarial settings, thus leading to an interesting tradeoff between privacy guarantees and utility. We re-evaluate methods for releasing differentially private χ2-statistics in genome-wide association studies and show that we can achieve a higher utility than in previous works, while still guaranteeing membership privacy in a relevant adversarial setting. © 2015 ACM.
Open Access
Exploiting query views for static index pruning in web search engines
(ACM, 2009-11) Altıngövde, İsmail Şengör; Özcan, Rıfat; Ulusoy, Özgür
We propose incorporating query views in a number of static pruning strategies, namely term-centric, document-centric and access-based approaches. These query-view based strategies considerably outperform their counterparts for both disjunctive and conjunctive query processing in Web search engines. Copyright 2009 ACM.
Open Access
A face tracking algorithm for user interaction in mobile devices
(IEEE, 2009-09) Bülbül, Abdullah; Çipiloğlu, Zeynep; Çapin, Tolga
A new face tracking algorithm, and a human-computer interaction technique based on this algorithm, are proposed for use on mobile devices. The face tracking algorithm considers the limitations of mobile use case - constrained computational resources and varying environmental conditions. The solution is based on color comparisons and works on images gathered from the front camera of a device. The face tracking system generates 2D face position as an output that can be used for controlling different applications. Two of such applications are also presented in this work; the first example uses face position to determine the viewpoint, and the second example enables an intuitive way of browsing large images. © 2009 IEEE.
Open Access
Incorporating the surfing behavior of web users into PageRank
(ACM, 2013-10-11) Ashyralyyev, Shatlyk; Cambazoğlu, B. B.; Aykanat, Cevdet
In large-scale commercial web search engines, estimating the importance of a web page is a crucial ingredient in ranking web search results. So far, to assess the importance of web pages, two different types of feedback have been taken into account, independent of each other: the feedback obtained from the hyperlink structure among the web pages (e.g., PageRank) or the web browsing patterns of users (e.g., BrowseRank). Unfortunately, both types of feedback have certain drawbacks. While the former lacks the user preferences and is vulnerable to malicious intent, the latter suffers from sparsity and hence low web coverage. In this work, we combine these two types of feedback under a hybrid page ranking model in order to alleviate the above-mentioned drawbacks. Our empirical results indicate that the proposed model leads to better estimation of page importance according to an evaluation metric that relies on user click feedback obtained from web search query logs. We conduct all of our experiments in a realistic setting, using a very large scale web page collection (around 6.5 billion web pages) and web browsing data (around two billion web page visits). Copyright is held by the owner/author(s).
Open Access
Linear MMSE-optimal turbo equalization using context trees
(IEEE, 2013) Kim, K.; Kalantarova, N.; Kozat, S. S.; Singer, A. C.
Formulations of the turbo equalization approach to iterative equalization and decoding vary greatly when channel knowledge is either partially or completely unknown. Maximum aposteriori probability (MAP) and minimum mean-square error (MMSE) approaches leverage channel knowledge to make explicit use of soft information (priors over the transmitted data bits) in a manner that is distinctly nonlinear, appearing either in a trellis formulation (MAP) or inside an inverted matrix (MMSE). To date, nearly all adaptive turbo equalization methods either estimate the channel or use a direct adaptation equalizer in which estimates of the transmitted data are formed from an expressly linear function of the received data and soft information, with this latter formulation being most common. We study a class of direct adaptation turbo equalizers that are both adaptive and nonlinear functions of the soft information from the decoder. We introduce piecewise linear models based on context trees that can adaptively approximate the nonlinear dependence of the equalizer on the soft information such that it can choose both the partition regions as well as the locally linear equalizer coefficients in each region independently, with computational complexity that remains of the order of a traditional direct adaptive linear equalizer. This approach is guaranteed to asymptotically achieve the performance of the best piecewise linear equalizer, and we quantify the MSE performance of the resulting algorithm and the convergence of its MSE to that of the linear minimum MSE estimator as the depth of the context tree and the data length increase.
Open Access
Strategies for setting time-to-live values in result caches
(ACM, 2013-10-11) Sazoğlu, Fethi Burak; Cambazoğlu, B. B.; Özcan, R.; Altıngövde, İsmail Şengör; Ulusoy, Özgür
In web query result caching, staleness of queries are often bounded via a time-to-live (TTL) mechanism, which expires the validity of cached query results at some point in time. In this work, we evaluate the performance of three alternative TTL mechanisms: time-based TTL, frequency-based TTL, and click-based TTL. Moreover, we propose hybrid approaches obtained by pair-wise combination of these mechanisms. Our results indicate that combining time-based TTL with frequency-based TTL yields superior performance (i.e., lower stale query traffic and less redundant computation) than using a particular mechanism in isolation. Copyright is held by the owner/author(s).
Open Access
A theoretical framework on the ideal number of classifiers for online ensembles in data streams
(ACM, 2016-10) Bonab, Hamed R.; Can, Fazlı
A priori determining the ideal number of component classifiers of an ensemble is an important problem. The volume and velocity of big data streams make this even more crucial in terms of prediction accuracies and resource requirements. There is a limited number of studies addressing this problem for batch mode and none for online environments. Our theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. We prove the existence of an ideal number of classifiers for an ensemble, using the weighted majority voting aggregation rule. In our experiments, we use two state-of-the-art online ensemble classifiers with six synthetic and six real-world data streams. The violation of providing independent component classifiers for our theoretical framework makes determining the exact ideal number of classifiers nearly impossible. We suggest upper bounds for the number of classifiers that gives the highest accuracy. An important implication of our study is that comparing online ensemble classifiers should be done based on these ideal values, since comparing based on a fixed number of classifiers can be misleading. © 2016 ACM.
Open Access
A tool to enhance cooperation and knowledge transfer among software developers
(Springer, Berlin, Heidelberg, 2009) Aydın, Seçil; Mishra, D.
Software developers have been successfully tailoring software development methods according to the project situation and more so in small scale software development organizations. There is a need to share this knowledge with other developers who may be facing the same project situation so that they can benefit from other people experiences. In this paper, an approach to enhance cooperation among software developers, in terms of sharing the knowledge that was used successfully in past projects, is proposed. A web-based tool is developed that can assist in creation, storage and extraction of methods related with requirement elicitation phase. These methods are categorized according to certain criteria which helps in searching a method that will be most appropriate in a given project situation. This approach and tool can also be used for other software development activities. © 2009 Springer Berlin Heidelberg.
Open Access
Utilization of navigational queries for result presentation and caching in search engines
(ACM, 2008-10) Özcan, Rıfat; Altıngövde, İsmail Şengör; Ulusoy, Özgür
We propose result page models with varying granularities for navigational queries and show that this approach provides a better utilization of cache space and reduces bandwidth requirements.
Open Access
Which shape representation is the best for real-time hand interface system?
(Springer, Berlin, Heidelberg, 2009) Genç, Serkan; Atalay V.
Hand is a very convenient interface for immersive human-computer interaction. Users can give commands to a computer by hand signs (hand postures, hand shapes) or hand movements (hand gestures). Such a hand interface system can be realized by using cameras as input devices, and software for analyzing the images. In this hand interface system, commands are recognized by analyzing the hand shapes and its trajectories in the images. Therefore, success of the recognition of hand shape is vital and depends on the discriminative power of the hand shape representation. There are many shape representation techniques in the literature. However, none of them are working properly for all shapes. While a representation leads to a good result for a set of shapes, it may fail in another one. Therefore, our aim is to find the most appropriate shape representation technique for hand shapes to be used in hand interfaces. Our candidate representations are Fourier Descriptors, Hu Moment Invariant, Shape Descriptors and Orientation Histogram. Based on widely-used hand shapes for an interface, we compared the representations in terms of their discriminative power and speed. © 2009 Springer-Verlag.