• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      CoDet: Sentence-based containment detection in news corpora

      Thumbnail
      View / Download
      1.5 Mb
      Author
      Varol, Emre
      Can, Fazlı
      Aykanat, Cevdet
      Kaya, Oğuz
      Date
      2011
      Source Title
      CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management
      Publisher
      ACM
      Pages
      2049 - 2052
      Language
      English
      Type
      Conference Paper
      Item Usage Stats
      110
      views
      80
      downloads
      Abstract
      We study a generalized version of the near-duplicate detection problem which concerns whether a document is a subset of another document. In text-based applications, document containment can be observed in exact-duplicates, near-duplicates, or containments, where the first two are special cases of the third. We introduce a novel method, called CoDet, which focuses particularly on this problem, and compare its performance with four well-known near-duplicate detection methods (DSC, full fingerprinting, I-Match, and SimHash) that are adapted to containment detection. Our method is expandable to different domains, and especially suitable for streaming news. Experimental results show that CoDet effectively and efficiently produces remarkable results in detecting containments. © 2011 ACM.
      Keywords
      Corpus tree
      Document containment
      Duplicate detection
      Similarity
      Test Collection
      Knowledge management
      Software agents
      Permalink
      http://hdl.handle.net/11693/28250
      Published Version (Please cite this version)
      http://dx.doi.org/10.1145/2063576.2063887
      Collections
      • Department of Computer Engineering 1368
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy