Characteristics of Web-based textual communications

Küçükyılmaz, Tayfun

Characteristics of Web-based textual communications

Files

0006246.pdf (1.01 MB)

Date

2012

Authors

Küçükyılmaz, Tayfun

Advisor

Aykanat, Cevdet

BUIR Usage Stats

3
views

51
downloads

Abstract

In this thesis, we analyze different aspects of Web-based textual communications and argue that all such communications share some common properties. In order to provide practical evidence for the validity of this argument, we focus on two common properties by examining these properties on various types of Web-based textual communications data. These properties are: All Web-based communications contain features attributable to their author and reciever; and all Web-based communications exhibit similar heavy tailed distributional properties. In order to provide practical proof for the validity of our claims, we provide three practical, real life research problems and exploit the proposed common properties of Web-based textual communications to find practical solutions to these problems. In this work, we first provide a feature-based result caching framework for real life search engines. To this end, we mined attributes from user queries in order to classify queries and estimate a quality metric for giving admission and eviction decisions for the query result cache. Second, we analyzed messages of an online chat server in order to predict user and mesage attributes. Our results show that several user- and message-based attributes can be predicted with significant occuracy using both chat message- and writing-style based features of the chat users. Third, we provide a parallel framework for in-memory construction of term partitioned inverted indexes. In this work, in order to minimize the total communication time between processors, we provide a bucketing scheme that is based on term-based distributional properties of Web page contents.

Keywords

Web search engine, result caching, cache, chat mining, data mining, index inversion, inverted index, posting list

Degree Discipline

Computer Engineering

Degree Level

Doctoral

Degree Name

Ph.D. (Doctor of Philosophy)

Permalink

http://hdl.handle.net/11693/15512

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Characteristics of Web-based textual communications

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Characteristics of Web-based textual communications

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type