Developing a text categorization template for Turkish news portals

Date

2011

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats
2
views
28
downloads

Citation Stats

Series

Abstract

In news portals, text category information is needed for news presentation. However, for many news stories the category information is unavailable, incorrectly assigned or too generic. This makes the text categorization a necessary tool for news portals. Automated text categorization (ATC) is a multifaceted difficult process that involves decisions regarding tuning of several parameters, term weighting, word stemming, word stopping, and feature selection. In this study we aim to find a categorization setup that will provide highly accurate results in ATC for Turkish news portals. We also examine some other aspects such as the effects of training dataset set size and robustness issues. Two Turkish test collections with different characteristics are created using Bilkent News Portal. Experiments are conducted with four classification methods: C4.5, KNN, Naive Bayes, and SVM (using polynomial and rbf kernels). Our results recommends a text categorization template for Turkish news portals and provides some future research pointers. © 2011 IEEE.

Source Title

2011 International Symposium on Innovations in Intelligent Systems and Applications

Publisher

IEEE

Course

Other identifiers

Book Title

Degree Discipline

Degree Level

Degree Name

Citation

Published Version (Please cite this version)

Language

English