Effective early termination techniques for text similarity join operator
Author
Özalp, S. A.
Ulusoy, Özgür
Date
2005Source Title
Computer and Information Sciences - ISCIS 2005
Print ISSN
0302-9743
Publisher
Springer, Berlin, Heidelberg
Volume
3733
Pages
791 - 801
Language
English
Type
Conference PaperItem Usage Stats
145
views
views
139
downloads
downloads
Abstract
Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics. © Springer-Verlag Berlin Heidelberg 2005.
Keywords
Bibliographic retrieval systemsComputation theory
Computer operating procedures
Data mining
Data reduction
Information retrieval
Integration
Query languages
Application domains
Data querying
Filter heuristics
Text similarity
Text processing
Permalink
http://hdl.handle.net/11693/27360Published Version (Please cite this version)
https://doi.org/10.1007/11569596_81https://doi.org/10.1007/11569596