Parallel sequence mining on distributed- memory systems

Karapınar, Embiya

Parallel sequence mining on distributed- memory systems

buir.advisor	Gürsoy, Atilla
dc.contributor.author	Karapınar, Embiya
dc.date.accessioned	2016-07-01T11:11:37Z
dc.date.available	2016-07-01T11:11:37Z
dc.date.issued	2001
dc.description	Cataloged from PDF version of article.	en_US
dc.description.abstract	Discovering all the frequent sequences in very large databases is a time consuming task. However, large databases forces to partition the original database into chunks of data to process in main-memory. Most current algorithms require as many database scans as the longest frequent sequences. Spade is a fast algorithm which reduces the number of database scans to three by using lattice-theoretic approach to decompose origional problem into small pieces(equivalence classes) which can be processed in main-memory independently. In this thesis work, we present dSpade, a parallel algorithm, based on Spade, for discov- ering the set of all frequent sequences, targeting distributed-memory systems. In dSpade, horizontal database partitioning method is used, where each processor stores equal number of customer transactions. dSpade is a synchronous algorithm for discovering frequent 1-sequences (F1) and frequent 2-sequences ( F2). Each processor performs the same computation on its local data to get local support counts and broadcasts the results to other processors to nd global frequent sequences during F1 and F2 computation. After discovering all F1 and F2, all frequent sequences are inserted into lattice to decompose the original problem into equivalence classes. Equivalence classes are mapped in a greedy heuristic to the least loaded processors in a roundrobin manner. Finally, each processor asynchronously begins to compute Fk on its mapped equivalence classes to nd all frequent sequences. We present results of performance experiments conducted on a 32-node Beowulf Cluster. Experiments show that dSpade delivers good speedup and scales linearly in the database size.	en_US
dc.description.statementofresponsibility	Karapınar, Embiya	en_US
dc.format.extent	50 leaves	en_US
dc.identifier.itemid	BILKUTUPB056066
dc.identifier.uri	http://hdl.handle.net/11693/30067
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Lattice	en_US
dc.subject	equivalence class	en_US
dc.subject	horizontal database partitioning method	en_US
dc.subject.lcc	QA76.9.D343 K37 2001	en_US
dc.subject.lcsh	Data mining.	en_US
dc.title	Parallel sequence mining on distributed- memory systems	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0001607.pdf
Size:: 465.42 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Graduate School of Engineering and Science