Exploiting interclass rules for focused crawling

Date
2004
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
IEEE Intelligent Systems
Print ISSN
1541-1672
1941-1294
Electronic ISSN
Publisher
IEEE
Volume
19
Issue
6
Pages
66 - 73
Language
English
Type
Review
Journal Title
Journal ISSN
Volume Title
Series
Abstract

A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant materials. The rule-based Web-crawling approach uses linkage statistics among topics to improve a baseline focused crawler's harvest rate and coverage. The crawler also employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps determine the relevancy of crawled pages.

Course
Other identifiers
Book Title
Keywords
Best First Search, Breadth First Search, Domain Name Systems (DNS), Web Crawling Approaches, Classification (of information), Data Acquisition, Indexing (of information), Knowledge Based Systems, Network Protocols, Online Searching, Queueing Theory, Websites
Citation
Published Version (Please cite this version)