Browsing by Subject "Stochastic gradient descent"

Now showing 1 - 10 of 10

Open Access
Blind federated learning at the wireless edge with low-resolution ADC and DAC
(IEEE, 2021-06-15) Teğin, Büşra
We study collaborative machine learning systems where a massive dataset is distributed across independent workers which compute their local gradient estimates based on their own datasets. Workers send their estimates through a multipath fading multiple access channel with orthogonal frequency division multiplexing to mitigate the frequency selectivity of the channel. We assume that there is no channel state information (CSI) at the workers, and the parameter server (PS) employs multiple antennas to align the received signals. To reduce the power consumption and the hardware costs, we employ complex-valued low-resolution digital-to-analog converters (DACs) and analog-to-digital converters (ADCs), at the transmitter and the receiver sides, respectively, and study the effects of practical low-cost DACs and ADCs on the learning performance. Our theoretical analysis shows that the impairments caused by low-resolution DACs and ADCs, including those of one-bit DACs and ADCs, do not prevent the convergence of the federated learning algorithms, and the multipath channel effects vanish when a sufficient number of antennas are used at the PS. We also validate our theoretical results via simulations, and demonstrate that using low-resolution, even one-bit, DACs and ADCs causes only a slight decrease in the learning accuracy.
Open Access
Distributed caching and learning over wireless channels
(2020-01) Tegin, Büşra
Coded caching and coded computing have drawn signiﬁcant attention in recent years due to their advantages in reducing the traﬃc load and in distributing computational burden to edge devices. There have been many research results addressing diﬀerent aspects of these problems; however, there are still various challenges that need to be addressed. In particular, their use over wireless channels is not fully understood. With this motivation, this thesis considers these two distributed systems over wireless channels taking into account realistic channel eﬀects as well as practical implementation constraints. In the ﬁrst part of the thesis, we study coded caching over a wireless packet erasure channel where each receiver encounters packet erasures independently with the same probability. We propose two diﬀerent schemes for packet erasure channels: sending the same message (SSM) and a greedy approach. Also, a simpliﬁed version of the greedy algorithm called the grouped greedy algorithm is proposed to reduce the system complexity. For the grouped greedy algorithm, an upper bound for transmission rate is derived, and it is shown that this upper bound is very close to the simulation results for small packet erasure probabilities. We then study coded caching over non-ergodic fading channels. As the multicast capacity of a broadcast channel is restricted by the user experiencing the worst channel conditions, we formulate an optimization problem to minimize the transmission time by grouping users based on their channel conditions, and transmit coded messages according to the worst channel in the group, as opposed to the worst among all. We develop two algorithms to determine the user groups: a locally optimal iterative algorithm and a numerically more eﬃcient solution through a shortest path problem. In the second part of the thesis, we study collaborative machine learning (ML) systems, which is also known as federated learning, where a massive dataset is distributed across independent workers that compute their local gradient estimates based on their own datasets. Workers send their estimates through a multipath fading multiple access channel (MAC) with orthogonal frequency division multiplexing (OFDM) to mitigate the frequency selectivity of the channel. We assume that the parameter server (PS) employs multiple antennas to align the received signals with no channel state information (CSI) at the workers. To reduce the power consumption and hardware costs, we employ complex-valued low-resolution analog to digital converters (ADCs) at the receiver side and study the eﬀects of practical low cost ADCs on the learning performance of the system. Our theoretical analysis shows that the impairments caused by a low-resolution ADC do not prevent the convergence of the learning algorithm, and fading eﬀects vanish when a suﬃcient number of antennas are used at the PS. We also validate our theoretical results via simulations, and further, we show that using one-bit ADCs causes only a slight decrease in the learning accuracy.
Open Access
Federated learning with over-the-air aggregation over time-varying channels
(Institute of Electrical and Electronics Engineers, 2023-01-17) Tegin, Büşra; Duman, Tolga Mete
We study federated learning (FL) with over-the-air aggregation over time-varying wireless channels. Independent workers compute local gradients based on their local datasets and send them to a parameter server (PS) through a time-varying multipath fading multiple access channel via orthogonal frequency-division multiplexing (OFDM). We assume that the workers do not have channel state information, hence the PS employs multiple antennas to alleviate the fading effects. Wireless channel variations result in inter-carrier interference, which has a detrimental effect on the performance of OFDM systems, especially when the channel is rapidly varying. We examine the effects of the channel time variations on the convergence of the FL with over-the-air aggregation, and show that the resulting undesired interference terms have only limited destructive effects, which do not prevent the convergence of the learning algorithm. We also validate our results via extensive simulations, which corroborate the theoretical expectations.
Open Access
Hybrid parallelization of Stochastic Gradient Descent
(2022-02) Büyükkaya, Kemal
The purpose of this study is to investigate the eﬃcient parallelization of the Stochastic Gradient Descent (SGD) algorithm for solving the matrix comple-tion problem on a high-performance computing (HPC) platform in distributed memory setting. We propose a hybrid parallel decentralized SGD framework with asynchronous communication between processors to show the scalability of parallel SGD up to hundreds of processors. We utilize Message Passing In-terface (MPI) for inter-node communication and POSIX threads for intra-node parallelism. We tested our method by using four diﬀerent real-world benchmark datasets. Experimental results show that the proposed algorithm yields up to 6× better throughput on relatively sparse datasets, and displays comparable perfor-mance to available state-of-the-art algorithms on relatively dense datasets while providing a ﬂexible partitioning scheme and a highly scalable hybrid parallel ar-chitecture.
Open Access
Load balanced locality-aware parallel SGD on multicore architectures for latent factor based collaborative filtering
(Elsevier BV * North-Holland, 2023-04-20) Gülcan, Selçuk; Özdal, Muhammet Mustafa; Aykanat, Cevdet
We investigate the parallelization of Stochastic Gradient Descent (SGD) for matrix completion on multicore architectures. We provide an experimental analysis of current SGD algorithms to find out their bottlenecks and limitations. Grid-based methods suffer from load imbalance among 2D blocks of the rating matrix, especially when datasets are skewed and sparse. Asynchronous methods, on the other hand, can face cache issues due to their memory access pattern. We propose bin-packing-based block balancing methods that are alternative to the recently proposed BaPa method. We then introduce Locality Aware SGD (LASGD), a grid-based asynchronous parallel SGD algorithm that efficiently utilizes cache by changing nonzero update sequence without affecting factor update order and carefully arranging latent factor matrices in the memory. Combined with our proposed load balancing methods, our experiments show that LASGD performs significantly better than alternative approaches in parallel shared-memory systems.
Open Access
Matrix factorization with stochastic gradient descent for recommender systems
(2019-02) Aktulum, Ömer Faruk
Matrix factorization is an efficient technique used for disclosing latent features of real-world data. It finds its application in areas such as text mining, image analysis, social network and more recently and popularly in recommendation systems. Alternating Least Squares (ALS), Stochastic Gradient Descent (SGD) and Coordinate Descent (CD) are among the methods used commonly while factorizing large matrices. SGD-based factorization has proven to be the most successful among these methods after Netflix and KDDCup competitions where the winners’ algorithms relied on methods based on SGD. Parallelization of SGD then became a hot topic and studied extensively in the literature in recent years. We focus on parallel SGD algorithms developed for shared memory and distributed memory systems. Shared memory parallelizations include works such as HogWild, FPSGD and MLGF-MF, and distributed memory parallelizations include works such as DSGD, GASGD and NOMAD. We design a survey that contains exhaustive analysis of these studies, and then particularly focus on DSGD by implementing it through message-passing paradigm and testing its performance in terms of convergence and speedup. In contrast to the existing works, many real-wold datasets are used in the experiments that we produce using published raw data. We show that DSGD is a robust algorithm for large-scale datasets and achieves near-linear speedup with fast convergence rates.
Embargo
Optimal stochastic gradient descent algorithm for filtering
(Elsevier, 2024-12) Turalı, Mehmet Yiğit; Koç, Ali Taha; Kozat, Süleyman Serdar
Stochastic Gradient Descent (SGD) is a fundamental optimization technique in machine learning, due to its efficiency in handling large-scale data. Unlike typical SGD applications, which rely on stochastic approximations, this work explores the convergence properties of SGD from a deterministic perspective. We address the crucial aspect of learning rate settings, a common obstacle in optimizing SGD performance, particularly in complex environments. In contrast to traditional methods that often provide convergence results based on statistical expectations (which are usually not justified), our approach introduces universally applicable learning rates. These rates ensure that a model trained with SGD matches the performance of the best linear filter asymptotically, applicable irrespective of the data sequence length and independent of statistical assumptions about the data. By establishing learning rates that scale as 𝜇 = 𝑂 ( 1/𝑡 ), we offer a solution that sidesteps the need for prior data knowledge, a prevalent limitation in real-world applications. To this end, we provide a robust framework for SGD's application across varied settings, guaranteeing convergence results that hold under both deterministic and stochastic scenarios without any underlying assumptions.
Open Access
Parallel stochastic gradient descent on multicore architectures
(2020-09) Gülcan, Selçuk
The focus of the thesis is efficient parallelization of the Stochastic Gradient Descent (SGD) algorithm for matrix completion problems on multicore architectures. Asynchronous methods and block-based methods utilizing 2D grid partitioning for task-to-thread assignment are commonly used approaches for sharedmemory parallelization. However, asynchronous methods can have performance issues due to their memory access patterns, whereas grid-based methods can suffer from load imbalance especially when data sets are skewed and sparse. In this thesis, we first analyze parallel performance bottlenecks of the existing SGD algorithms in detail. Then, we propose new algorithms to alleviate these performance bottlenecks. Specifically, we propose bin-packing-based algorithms to balance thread loads under 2D partitioning. We also propose a grid-based asynchronous parallel SGD algorithm that improves cache utilization by changing the entry update order without affecting the factor update order and rearranging the memory layouts of the latent factor matrices. Our experiments show that the proposed methods perform significantly better than the existing approaches on shared-memory multi-core systems.
Open Access
Parallel stochastic gradient descent with sub-iterations on distributed memory systems
(2022-02) Çağlayan, Orhun
We investigate parallelization of the stochastic gradient descent (SGD) algorithm for solving the matrix completion problem. Applications in the literature show that stale data usage and communication costs are important concerns that affect the performance of parallel SGD applications. We first briefly visit the stochastic gradient descent algorithm and matrix partitioning for parallel SGD. Then we define the stale data problem and communication costs. In order to improve the performance of parallel SGD, we propose a new algorithm with intra-iteration synchronization (referred as sub-iterations) to decrease communication costs and stale data usage. Experimental results show that using sub-iterations can de-crease staleness up to 95% and communication volume up to 47%. Furthermore, using sub-iterations can improve test error up to 60% when compared to the conventional parallel SGD implementation that does not use sub-iterations.
Open Access
Stochastic Gradient Descent for matrix completion: hybrid parallelization on shared- and distributed-memory systems
(ELSEVIER BV, 2024-01-11) Büyükkaya, Kemal; Karsavuran, M. Ozan; Aykanat, Cevdet
The purpose of this study is to investigate the hybrid parallelization of the Stochastic Gradient Descent (SGD) algorithm for solving the matrix completion problem on a high-performance computing platform. We propose a hybrid parallel decentralized SGD framework with asynchronous inter-process communication and a novel flexible partitioning scheme to attain scalability up to hundreds of processors. We utilize Message Passing Interface (MPI) for inter-node communication and POSIX threads for intra-node parallelism. We tested our method by using different real-world benchmark datasets. Experimental results on a hybrid parallel architecture showed that, compared to the state-of-the-art, the proposed algorithm achieves 6x higher throughput on sparse datasets, while it achieves comparable throughput on relatively dense datasets.