SeGraM: A universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mapping

buir.contributor.authorBingöl, Zülal
buir.contributor.authorFırtına, Can
buir.contributor.authorAlkan, Can
buir.contributor.orcidBingöl, Zülal|0000-0002-2828-9665
buir.contributor.orcidFırtına, Can|0000-0002-6548-7863
buir.contributor.orcidAlkan, Can|0000-0002-5443-0706
dc.citation.epage655en_US
dc.citation.spage638en_US
dc.contributor.authorCali, D.Ş
dc.contributor.authorKanellopoulos, K.
dc.contributor.authorLindegger, J.
dc.contributor.authorBingöl, Zülal
dc.contributor.authorKalsi, G.S.
dc.contributor.authorZuo, Z.
dc.contributor.authorFırtına, Can
dc.contributor.authorCavlak, M.B.
dc.contributor.authorKim, J.
dc.contributor.authorGhiasi, N.M.
dc.contributor.authorSingh, G.
dc.contributor.authorGómez-Luna, J.
dc.contributor.authorAlmadhoun Alserr, N.
dc.contributor.authorAlser, M.
dc.contributor.authorSubramoney, S.
dc.contributor.authorAlkan, Can
dc.contributor.authorGhose, S.
dc.contributor.authorMutlu, O.
dc.date.accessioned2023-02-27T06:06:25Z
dc.date.available2023-02-27T06:06:25Z
dc.date.issued2020-06-11
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractA critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-to-sequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in a population. Mapping reads to the graph-based reference genome (i.e., sequence-to-graph mapping) results in notable quality improvements in genome analysis. Unfortunately, while sequence-to-sequence mapping is well studied with many available tools and accelerators, sequence-to-graph mapping is a more difficult computational problem, with a much smaller number of practical software tools currently available. We analyze two state-of-the-art sequence-to-graph mapping tools and reveal four key issues. We find that there is a pressing need to have a specialized, high-performance, scalable, and low-cost algorithm/hardware co-design that alleviates bottlenecks in both the seeding and alignment steps of sequence-to-graph mapping. Since sequence-to-sequence mapping can be treated as a special case of sequence-to-graph mapping, we aim to design an accelerator that is efficient for both linear and graph-based read mapping. To this end, we propose SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efficiently support both <u>se</u>quence-to-<u>gra</u>ph <u>m</u>apping and sequence-to-sequence mapping, for both short and long reads. To our knowledge, SeGraM is the first algorithm/hardware co-design for accelerating sequence-to-graph mapping. SeGraM consists of two main components: (1) MinSeed, the first <u>min</u>imizer-based <u>seed</u>ing accelerator, which finds the candidate locations in a given genome graph; and (2) BitAlign, the first <u>bit</u>vector-based sequence-to-graph <u>align</u>ment accelerator, which performs alignment between a given read and the subgraph identified by MinSeed. We couple SeGraM with high-bandwidth memory to exploit low latency and highly-parallel memory access, which alleviates the memory bottleneck. We demonstrate that SeGraM provides significant improvements for multiple steps of the sequence-to-graph (i.e., S2G) and sequence-to-sequence (i.e., S2S) mapping pipelines. First, SeGraM outperforms state-of-the-art S2G mapping tools by 5.9×/3.9× and 106×/- 742× for long and short reads, respectively, while reducing power consumption by 4.1×/4.4× and 3.0×/3.2×. Second, BitAlign outperforms a state-of-the-art S2G alignment tool by 41×-539× and three S2S alignment accelerators by 1.2×-4.8×. We conclude that SeGraM is a high-performance and low-cost universal genomics mapping accelerator that efficiently supports both sequence-to-graph and sequence-to-sequence mapping pipelines.en_US
dc.identifier.doi10.1145/3470496.3527436en_US
dc.identifier.eissn1557-735X
dc.identifier.urihttp://hdl.handle.net/11693/111783
dc.language.isoEnglishen_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.isversionofhttps://doi.org/10.1145/3470496.3527436en_US
dc.source.titleISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architectureen_US
dc.subjectGenomicsen_US
dc.subjectGenome analysisen_US
dc.subjectGenome graphsen_US
dc.subjectRead mappingen_US
dc.subjectAlgorithm/hardware co-designen_US
dc.subjectHardware acceleratoren_US
dc.subjectRead alignmenten_US
dc.subjectSeedingen_US
dc.subjectMinimizeren_US
dc.subjectBitvectoren_US
dc.titleSeGraM: A universal hardware accelerator for genomic sequence-to-graph and sequence-to-sequence mappingen_US
dc.typeConference Paperen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SeGraM _a _universal _hardware _accelerator _for _genomic _sequence-to-graph _and _sequence-to-sequence _mapping.pdf
Size:
8.04 MB
Format:
Adobe Portable Document Format
Description:
Makale Dosyası
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: