• About
  • Policies
  • What is openaccess
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Exploiting index pruning methods for clustering XML collections

      Thumbnail
      View / Download
      212.2 Kb
      Author
      Altıngövde, İsmail Şengör
      Atılgan, Duygu
      Ulusoy, Özgür
      Date
      2010
      Source Title
      Focused Retrieval and Evaluation
      Print ISSN
      0302-9743
      Publisher
      Springer, Berlin, Heidelberg
      Volume
      6203
      Pages
      379 - 386
      Language
      English
      Type
      Conference Paper
      Item Usage Stats
      146
      views
      104
      downloads
      Abstract
      In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics. © 2010 Springer-Verlag Berlin Heidelberg.
      Keywords
      Cover-coefficient based clustering
      Index pruning
      Clustering index
      Cover-coefficient based clustering
      Document vectors
      Evaluation metrics
      Pruning methods
      Pruning techniques
      Based clustering
      Document vectors
      Evaluation metrics
      Pruning methods
      Pruning techniques
      Query languages
      XML
      Markup languages
      Quality control
      Permalink
      http://hdl.handle.net/11693/28561
      Published Version (Please cite this version)
      http://dx.doi.org/10.1007/978-3-642-14556-8_37
      https://doi.org/10.1007/978-3-642-14556-8
      Collections
      • Department of Computer Engineering 1398
      Show full item record

      Related items

      Showing items related by title, author, creator and subject.

      • Thumbnail

        Parallel pruning for k-means clustering on shared memory architectures 

        Gürsoy, Attila; Cengiz, Ilker (Springer Verlag, 2001)
        We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that ...
      • Thumbnail

        Site-based dynamic pruning for query processing in search engines 

        Altıngövde İsmail Şengör; Demir, Engin; Can, Fazlı; Ulusoy, Özgür (ACM, 2008-07)
        Web search engines typically index and retrieve at the page level. In this study, we investigate a dynamic pruning strategy that allows the query processor to first determine the most promising websites and then proceed ...
      • Thumbnail

        Improving the efficiency of search engines : strategies for focused crawling, searching, and index pruning 

        Altıngövde, İsmail Sengör (Bilkent University, 2009)
        Search engines are the primary means of retrieval for text data that is abundantly available on the Web. A standard search engine should carry out three fundamental tasks, namely; crawling the Web, indexing the crawled ...

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartments

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 1771
      Copyright © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy