Enhancing feature selection with contextual relatedness filtering using Wikipedia

Baydar, Melih

Enhancing feature selection with contextual relatedness filtering using Wikipedia

Available

The embargo period has ended, and this item is now available.

Files

melih_tez.pdf (1.48 MB)

Date

2017-08

Authors

Baydar, Melih

Advisor

Can, Fazlı

BUIR Usage Stats

2
views

39
downloads

Abstract

Feature selection is an important component of information retrieval and natural language processing applications. It is used to extract distinguishing terms for a group of documents; such terms, for example, can be used for clustering, multi-document summarization and classi cation. The selected features are not always the best representatives of the documents due to some noisy terms. Addressing this issue, our contribution is twofold. First, we present a novel approach of ltering out the noisy, unrelated terms from the feature lists with the usage of contextual relatedness information of terms to their topics in order to enhance the feature set quality. Second, we propose a new method to assess the contextual relatedness of terms to the topic of their documents. Our approach automatically decides the contextual relatedness of a term to the topic of a set of documents using co-occurrences with the distinguishing terms of the document set inside an external knowledge source, Wikipedia for our work. Deletion of unrelated terms from the feature lists gives a better, more related set of features. We evaluate our approach for cluster labeling problem where feature sets for clusters can be used as label candidates. We work on commonly used 20NG and ODP datasets for the cluster labeling problem, nding that it successfully detects relevancy information of terms to topics, and ltering out irrelevant label candidates results in signi cantly improved cluster labeling quality.

Keywords

Feature Selection, Contextual Relatedness, Cluster Labeling

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/33564

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Enhancing feature selection with contextual relatedness filtering using Wikipedia

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Enhancing feature selection with contextual relatedness filtering using Wikipedia

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type