• About
  • Policies
  • What is open access
  • Library
  • Contact
Advanced search
      View Item 
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      •   BUIR Home
      • Scholarly Publications
      • Faculty of Engineering
      • Department of Computer Engineering
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Developing a text categorization template for Turkish news portals

      Thumbnail
      View / Download
      1.3 Mb
      Author(s)
      Toraman, Çağrı
      Can, Fazlı
      Koçberber, Seyit
      Date
      2011
      Source Title
      2011 International Symposium on Innovations in Intelligent Systems and Applications
      Publisher
      IEEE
      Pages
      379 - 383
      Language
      English
      Type
      Conference Paper
      Item Usage Stats
      224
      views
      443
      downloads
      Abstract
      In news portals, text category information is needed for news presentation. However, for many news stories the category information is unavailable, incorrectly assigned or too generic. This makes the text categorization a necessary tool for news portals. Automated text categorization (ATC) is a multifaceted difficult process that involves decisions regarding tuning of several parameters, term weighting, word stemming, word stopping, and feature selection. In this study we aim to find a categorization setup that will provide highly accurate results in ATC for Turkish news portals. We also examine some other aspects such as the effects of training dataset set size and robustness issues. Two Turkish test collections with different characteristics are created using Bilkent News Portal. Experiments are conducted with four classification methods: C4.5, KNN, Naive Bayes, and SVM (using polynomial and rbf kernels). Our results recommends a text categorization template for Turkish news portals and provides some future research pointers. © 2011 IEEE.
      Keywords
      Turkish news
      Automated text categorization
      Classification methods
      Naive Bayes
      news portals
      RBF kernels
      Robustness issues
      Term weighting
      Test Collection
      Text categorization
      Training dataset
      Turkishs
      Word-stemming
      Feature extraction
      Intelligent systems
      Text processing
      Permalink
      http://hdl.handle.net/11693/28376
      Published Version (Please cite this version)
      http://dx.doi.org/10.1109/INISTA.2011.5946096
      Collections
      • Department of Computer Engineering 1510
      Show full item record

      Browse

      All of BUIRCommunities & CollectionsTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCoursesThis CollectionTitlesAuthorsAdvisorsBy Issue DateKeywordsTypeDepartmentsCourses

      My Account

      Login

      Statistics

      View Usage StatisticsView Google Analytics Statistics

      Bilkent University

      If you have trouble accessing this page and need to request an alternate format, contact the site administrator. Phone: (312) 290 2976
      © Bilkent University - Library IT

      Contact Us | Send Feedback | Off-Campus Access | Admin | Privacy