Exploiting interclass rules for focused crawling

Date

2004

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

IEEE Intelligent Systems

Print ISSN

1541-1672
1941-1294

Electronic ISSN

Publisher

IEEE

Volume

19

Issue

6

Pages

66 - 73

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant materials. The rule-based Web-crawling approach uses linkage statistics among topics to improve a baseline focused crawler's harvest rate and coverage. The crawler also employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps determine the relevancy of crawled pages.

Course

Other identifiers

Book Title

Citation