Browsing by Subject "hypergraph partitioning"

Now showing 1 - 3 of 3

Open Access
Page-to-processor assignment techniques for parallel crawlers
(2004) Türk, Ata
In less than a decade, the World Wide Web has evolved from a research project to a cultural phenomena effective in almost every facet of our society. The increase in the popularity and usage of the Web enforced an increase in the efficiency of information retrieval techniques used over the net. Crawling is among such techniques and is used by search engines, web portals, and web caches. A crawler is a program which downloads and stores web pages, generally to feed a search engine or a web repository. In order to be of use for its target applications, a crawler must download huge amounts of data in a reasonable amount of time. Generally, the high download rates required for efficient crawling cannot be achieved by single-processor systems. Thus, existing large-scale applications use multiple parallel processors to solve the crawling problem. Apart from the classical parallelization issues such as load balancing and minimization of the communication overhead, parallel crawling poses problems such as overlap avoidance and early retrieval of high quality pages. This thesis addresses parallelization of the crawling task, and its major contribution is mainly on partitioning/page-to-processor assignment techniques applied in parallel crawlers. We propose two new pageto-processor assignment techniques based on graph and hypergraph partitioning, which respectively minimize the total communication volume and the number of messages, while balancing the storage load and page download requests of processors. We implemented the proposed models, and our theoretic approaches have been supported with empirical findings. We also implemented an efficient parallel crawler which uses the proposed models.
Open Access
Parallel image restoration
(2004) Malas, Tahir
In this thesis, we are concerned with the image restoration problem which has been formulated in the literature as a system of linear inequalities. With this formulation, the resulting constraint matrix is an unstructured sparse-matrix and even with small size images we end up with huge matrices. So, to solve the restoration problem, we have used the surrogate constraint methods, that can work efficiently for large size problems and are amenable for parallel implementations. Among the surrogate constraint methods, the basic method considers all of the violated constraints in the system and performs a single block projection in each step. On the other hand, parallel method considers a subset of the constraints, and makes simultaneous block projections. Using several partitioning strategies and adopting different communication models we have realized several parallel implementations of the two methods. We have used the hypergraph partitioning based decomposition methods in order to minimize the communication costs while ensuring load balance among the processors. The implementations are evaluated based on the per iteration performance and on the overall performance. Besides, the effects of different partitioning strategies on the speed of convergence are investigated. The experimental results reveal that the proposed parallelization schemes have practical usage in the restoration problem and in many other real-world applications which can be modeled as a system of linear inequalities.
Open Access
Storage and access schemes for aggregate query processing on road networks
(2009) Demir, Engin
A well-known example of spatial networks is road networks, which form an integral part of many geographic information system applications, such as locationbased services, intelligent traveling systems, vehicle telematics, and locationaware advertising. In practice, road network data is too large to fit into the volatile memory. A considerable portion of the data must be stored on the secondary storage since several spatial and non-spatial attributes as well as points-ofinterest data are associated with junctions and links. In network query processing, the spatial coherency that exists in accessing data leads to a temporal coherency; in this way, connected junctions are accessed almost concurrently. Taking this fact into consideration, it seems reasonable to place the data associated with connected junctions in the same disk pages so that the data can be fetched to the memory with fewer disk accesses. We show that the state-of-the-art clustering graph model for allocation of data to disk pages is not able to correctly capture the disk access cost of successor retrieval operations. We propose clustering models based on hypergraph partitioning, which correctly encapsulate the spatial and temporal coherency in query processing via the utilization of query logs in order to minimize the number of disk accesses during aggregate query processing. We introduce the link-based storage scheme for road networks as an alternative to the widely used junction-based storage scheme. We present GetUnevaluated-Successors (GUS) as a new successor retrieval operation for network queries, where the candidate successors to be retrieved are pruned during processing a query. We investigate two different instances of GUS operation: the Get-unProcessed-Successors operation typically arises in Dijkstra’s single source shortest path algorithm, and the Get-unVisited-Successors operation typically arises in the incremental network expansion framework. The simulation resultsshow that our storage and access schemes utilizing the proposed clustering hypergraph models are quite effective in reducing the number of disk accesses during aggregate query processing.