Exploiting interclass rules for focused crawling
Date
2004Source Title
IEEE Intelligent Systems
Print ISSN
1541-1672 1941-1294
Publisher
IEEE
Volume
19
Issue
6
Pages
66 - 73
Language
English
Type
ReviewItem Usage Stats
235
views
views
239
downloads
downloads
Abstract
A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant materials. The rule-based Web-crawling approach uses linkage statistics among topics to improve a baseline focused crawler's harvest rate and coverage. The crawler also employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps determine the relevancy of crawled pages.
Keywords
Best First SearchBreadth First Search
Domain Name Systems (DNS)
Web Crawling Approaches
Classification (of information)
Data Acquisition
Indexing (of information)
Knowledge Based Systems
Network Protocols
Online Searching
Queueing Theory
Websites