• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Effective early termination techniques for text similarity join operator

      Thumbnail
      View / Download
      228.4 Kb
      Author
      Özalp, S. A.
      Ulusoy, Özgür
      Date
      2005
      Source Title
      Computer and Information Sciences - ISCIS 2005
      Print ISSN
      0302-9743
      Publisher
      Springer, Berlin, Heidelberg
      Volume
      3733
      Pages
      791 - 801
      Language
      English
      Type
      Conference Paper
      Item Usage Stats
      145
      views
      139
      downloads
      Abstract
      Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics. © Springer-Verlag Berlin Heidelberg 2005.
      Keywords
      Bibliographic retrieval systems
      Computation theory
      Computer operating procedures
      Data mining
      Data reduction
      Information retrieval
      Integration
      Query languages
      Application domains
      Data querying
      Filter heuristics
      Text similarity
      Text processing
      Permalink
      http://hdl.handle.net/11693/27360
      Published Version (Please cite this version)
      https://doi.org/10.1007/11569596_81
      https://doi.org/10.1007/11569596
      Collections
      • Department of Computer Engineering 1398
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy