Browsing by Subject "ALS"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Open Access Parafac-spark: parallel tensor decompositions on spark(2019-08) Bekçe, Selim ErenTensors are higher order matrices, widely used in many data science applications and scienti c disciplines. The Canonical Polyadic Decomposition (also known as CPD/PARAFAC) is a widely adopted tensor factorization to discover and extract latent features of tensors usually applied via alternating squares (ALS) method. Developing e cient parallelization methods of PARAFAC on commodity clusters is important because as common tensor sizes reach billions of nonzeros, a naive implementation would require infeasibly huge intermediate memory sizes. Implementations of PARAFAC-ALS on shared and distributedmemory systems are available, but these systems require expensive cluster setups, are too low level, not compatible with modern tooling and not fault tolerant by design. Many companies and data science communities widely prefer Apache Spark, a modern distributed computing framework with in-memory caching, and Hadoop ecosystem of tools for their ease of use, compatibility, ability to run on commodity hardware and fault tolerance. We developed PARAFAC-SPARK, an e cient, parallel, open-source implementation of PARAFAC on Spark, written in Scala. It can decompose 3D tensors stored in common coordinate format in parallel with low memory footprint by partitioning them as grids and utilizing compressed sparse rows (CSR) format for e cient traversals. We followed and combined many of the algorithmic and methodological improvements of its predecessor implementations on Hadoop and distributed memory, and adapted them for Spark. During the kernel MTTKRP operation, by applying a multi-way dynamic partitioning scheme, we were also able to increase the number of reducers to be on par with the number of cores to achieve better utilization and reduced memory footprint. We ran PARAFAC-SPARK with some real world tensors and evaluated the e ectiveness of each improvement as a series of variants compared with each other, as well as with some synthetically generated tensors up to billions of rows to measure its scalability. Our fastest variant (PS-CSRSX ) is up to 67% faster than our baseline Spark implementation (PS-COO) and up to 10 times faster than the state of art Hadoop implementations.Item Open Access Revisiting the complex architecture of ALS in Turkey: expanding genotypes, shared phenotypes, molecular networks, and a public variant database(John Wiley and Sons, 2020) Tunca, C.; Şeker, T.; Akçimen, F.; Coşkun, C.; Bayraktar, E.; Palvadeau, R.; Zor, S.; Koçoğlu, C.; Kartal, E.; Şen, N. E.; Hamzeiy, H.; Özoğuz-Erimiş, A.; Norman, Utku; Karakahya, Oğuzhan; Olgun, Gülden; Akgün, T.; Durmuş, H.; Şahin, E.; Çakar, A.; Başar-Gürsoy, E.; Babacan-Yıldız, G.; İşak, B.; Uluç, K.; Hanağası, H.; Bilgiç, B.; Turgut, N.; Aysal, F.; Ertaş, M.; Boz, C.; Kotan, D.; İdrisoğlu, H.; Soysal, A.; Uzun-Adatepe, N.; Akalın, M. A.; Koç, F.; Tan, E.; Oflazer, P.; Deymeer, F.; Taştan, Ö.; Çiçek, A. Ercüment; Kavak, E.; Parman, Y.; Başak, A. N.The last decade has proven that amyotrophic lateral sclerosis (ALS) is clinically and genetically heterogeneous, and that the genetic component in sporadic cases might be stronger than expected. This study investigates 1,200 patients to revisit ALS in the ethnically heterogeneous yet inbred Turkish population. Familial ALS (fALS) accounts for 20% of our cases. The rates of consanguinity are 30% in fALS and 23% in sporadic ALS (sALS). Major ALS genes explained the disease cause in only 35% of fALS, as compared with ~70% in Europe and North America. Whole exome sequencing resulted in a discovery rate of 42% (53/127). Whole genome analyses in 623 sALS cases and 142 population controls, sequenced within Project MinE, revealed well‐established fALS gene variants, solidifying the concept of incomplete penetrance in ALS. Genome‐wide association studies (GWAS) with whole genome sequencing data did not indicate a new risk locus. Coupling GWAS with a coexpression network of disease‐associated candidates, points to a significant enrichment for cell cycle‐ and division‐related genes. Within this network, literature text‐mining highlights DECR1, ATL1, HDAC2, GEMIN4, and HNRNPA3 as important genes. Finally, information on ALS‐related gene variants in the Turkish cohort sequenced within Project MinE was compiled in the GeNDAL variant browser (www.gendal.org).