Enhancing feature selection with contextual relatedness filtering using Wikipedia

buir.advisorCan, Fazlı
dc.contributor.authorBaydar, Melih
dc.date.accessioned2017-08-29T07:53:06Z
dc.date.available2017-08-29T07:53:06Z
dc.date.copyright2017-08
dc.date.issued2017-08
dc.date.submitted2017-08-15
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionIncludes bibliographical references (leaves 37-40).en_US
dc.description.abstractFeature selection is an important component of information retrieval and natural language processing applications. It is used to extract distinguishing terms for a group of documents; such terms, for example, can be used for clustering, multi-document summarization and classi cation. The selected features are not always the best representatives of the documents due to some noisy terms. Addressing this issue, our contribution is twofold. First, we present a novel approach of ltering out the noisy, unrelated terms from the feature lists with the usage of contextual relatedness information of terms to their topics in order to enhance the feature set quality. Second, we propose a new method to assess the contextual relatedness of terms to the topic of their documents. Our approach automatically decides the contextual relatedness of a term to the topic of a set of documents using co-occurrences with the distinguishing terms of the document set inside an external knowledge source, Wikipedia for our work. Deletion of unrelated terms from the feature lists gives a better, more related set of features. We evaluate our approach for cluster labeling problem where feature sets for clusters can be used as label candidates. We work on commonly used 20NG and ODP datasets for the cluster labeling problem, nding that it successfully detects relevancy information of terms to topics, and ltering out irrelevant label candidates results in signi cantly improved cluster labeling quality.en_US
dc.description.statementofresponsibilityby Melih Baydar.en_US
dc.embargo.release2019-08-10
dc.format.extentxi, 43 leaves : charts (some color) ; 29 cm.en_US
dc.identifier.itemidB156099
dc.identifier.urihttp://hdl.handle.net/11693/33564
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectFeature Selectionen_US
dc.subjectContextual Relatednessen_US
dc.subjectCluster Labelingen_US
dc.titleEnhancing feature selection with contextual relatedness filtering using Wikipediaen_US
dc.title.alternativeWikipedia yolu ile bağlamsal ilişki filtrelemesi kullanarak geliştirilmiş özellik seçmeen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
melih_tez.pdf
Size:
1.48 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: