Segmenting and labeling query sequences in a multidatabase environment
dc.citation.epage | 384 | en_US |
dc.citation.issueNumber | PART 1 | en_US |
dc.citation.spage | 367 | en_US |
dc.citation.volumeNumber | 7044 | en_US |
dc.contributor.author | Acar, Aybar C. | en_US |
dc.contributor.author | Motro, A. | en_US |
dc.coverage.spatial | Hersonissos, Crete, Greece | en_US |
dc.date.accessioned | 2016-02-08T12:17:32Z | |
dc.date.available | 2016-02-08T12:17:32Z | |
dc.date.issued | 2011 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description | Conference name: Confederated International Conferences: CoopIS, DOA-SVI, and ODBASE 2011 | en_US |
dc.description | Date of Conference: October 17-21, 2011 | en_US |
dc.description.abstract | When gathering information from multiple independent data sources, users will generally pose a sequence of queries to each source, combine (union) or cross-reference (join) the results in order to obtain the information they need. Furthermore, when gathering information, there is a fair bit of trial and error involved, where queries are recursively refined according to the results of a previous query in the sequence. From the point of view of an outside observer, the aim of such a sequence of queries may not be immediately obvious. We investigate the problem of isolating and characterizing subsequences representing coherent information retrieval goals out of a sequence of queries sent by a user to different data sources over a period of time. The problem has two sub-problems: segmenting the sequence into subsequences, each representing a discrete goal; and labeling each query in these subsequences according to how they contribute to the goal. We propose a method in which a discriminative probabilistic model (a Conditional Random Field) is trained with pre-labeled sequences. We have tested the accuracy with which such a model can infer labels and segmentation on novel sequences. Results show that the approach is very accurate (> 95% accuracy) when there are no spurious queries in the sequence and moderately accurate even in the presence of substantial noise (∼70% accuracy when 15% of queries in the sequence are spurious). © 2011 Springer-Verlag. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T12:17:32Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2011 | en |
dc.identifier.doi | 10.1007/978-3-642-25109-2_24 | en_US |
dc.identifier.doi | 10.1007/978-3-642-25109-2 | en_US |
dc.identifier.issn | 0302-9743 | |
dc.identifier.uri | http://hdl.handle.net/11693/28325 | |
dc.language.iso | English | en_US |
dc.publisher | Springer, Berlin, Heidelberg | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1007/978-3-642-25109-2_24 | en_US |
dc.relation.isversionof | https://doi.org/10.1007/978-3-642-25109-2 | en_US |
dc.source.title | On the Move to Meaningful Internet Systems: OTM 2011 | en_US |
dc.subject | Data Management | en_US |
dc.subject | Information Integration | en_US |
dc.subject | Query Processing | en_US |
dc.subject | Conditional random field | en_US |
dc.subject | Data source | en_US |
dc.subject | Information Integration | en_US |
dc.subject | Multidatabases | en_US |
dc.subject | Probabilistic models | en_US |
dc.subject | Query sequence | en_US |
dc.subject | Sub-problems | en_US |
dc.subject | Trial and error | en_US |
dc.subject | Image segmentation | en_US |
dc.subject | Information management | en_US |
dc.subject | Information retrieval | en_US |
dc.subject | Internet | en_US |
dc.subject | Query processing | en_US |
dc.subject | Search engines | en_US |
dc.title | Segmenting and labeling query sequences in a multidatabase environment | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Segmenting and labeling query sequences in a multidatabase environment.pdf
- Size:
- 332.29 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version