Exploiting interclass rules for focused crawling
Date
2004
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
IEEE Intelligent Systems
Print ISSN
1541-1672
1941-1294
1941-1294
Electronic ISSN
Publisher
IEEE
Volume
19
Issue
6
Pages
66 - 73
Language
English
Type
Journal Title
Journal ISSN
Volume Title
Series
Abstract
A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant materials. The rule-based Web-crawling approach uses linkage statistics among topics to improve a baseline focused crawler's harvest rate and coverage. The crawler also employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps determine the relevancy of crawled pages.