Show simple item record

dc.contributor.authorAcar, A.C.en_US
dc.contributor.authorMotro, A.en_US
dc.date.accessioned2016-02-08T12:17:32Z
dc.date.available2016-02-08T12:17:32Z
dc.date.issued2011en_US
dc.identifier.issn3029743en_US
dc.identifier.urihttp://hdl.handle.net/11693/28325
dc.description.abstractWhen gathering information from multiple independent data sources, users will generally pose a sequence of queries to each source, combine (union) or cross-reference (join) the results in order to obtain the information they need. Furthermore, when gathering information, there is a fair bit of trial and error involved, where queries are recursively refined according to the results of a previous query in the sequence. From the point of view of an outside observer, the aim of such a sequence of queries may not be immediately obvious. We investigate the problem of isolating and characterizing subsequences representing coherent information retrieval goals out of a sequence of queries sent by a user to different data sources over a period of time. The problem has two sub-problems: segmenting the sequence into subsequences, each representing a discrete goal; and labeling each query in these subsequences according to how they contribute to the goal. We propose a method in which a discriminative probabilistic model (a Conditional Random Field) is trained with pre-labeled sequences. We have tested the accuracy with which such a model can infer labels and segmentation on novel sequences. Results show that the approach is very accurate (> 95% accuracy) when there are no spurious queries in the sequence and moderately accurate even in the presence of substantial noise (∼70% accuracy when 15% of queries in the sequence are spurious). © 2011 Springer-Verlag.en_US
dc.language.isoEnglishen_US
dc.source.titleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/978-3-642-25109-2_24en_US
dc.subjectData Managementen_US
dc.subjectInformation Integrationen_US
dc.subjectQuery Processingen_US
dc.subjectConditional random fielden_US
dc.subjectData sourceen_US
dc.subjectInformation Integrationen_US
dc.subjectMultidatabasesen_US
dc.subjectProbabilistic modelsen_US
dc.subjectQuery sequenceen_US
dc.subjectSub-problemsen_US
dc.subjectTrial and erroren_US
dc.subjectImage segmentationen_US
dc.subjectInformation managementen_US
dc.subjectInformation retrievalen_US
dc.subjectInterneten_US
dc.subjectQuery processingen_US
dc.subjectSearch enginesen_US
dc.titleSegmenting and labeling query sequences in a multidatabase environmenten_US
dc.typeConference Paperen_US
dc.departmentDepartment of Computer Engineering, Bilkent University, Ankara 06800, Turkey; Department of Computer Science, George Mason University, Fairfax, VA 22030, United Statesen_US
dc.citation.spage367en_US
dc.citation.epage384en_US
dc.citation.volumeNumber7044 LNCSen_US
dc.citation.issueNumberPART 1en_US
dc.identifier.doi10.1007/978-3-642-25109-2_24en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record