Exploiting architectural features of a computer vision platform towards reducing memory stalls

buir.contributor.authorMustafa, Naveed Ul
buir.contributor.authorO’Riordan
buir.contributor.authorÖztürk, Özcan
dc.citation.epage870en_US
dc.citation.issueNumber4en_US
dc.citation.spage853en_US
dc.citation.volumeNumber17en_US
dc.contributor.authorMustafa, Naveed Ulen_US
dc.contributor.authorO’Riordan, M. J.en_US
dc.contributor.authorRogers, S.en_US
dc.contributor.authorÖztürk, Özcanen_US
dc.date.accessioned2021-02-19T10:25:11Z
dc.date.available2021-02-19T10:25:11Z
dc.date.issued2020
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractComputer vision applications are becoming more and more popular in embedded systems such as drones, robots, tablets, and mobile devices. These applications are both compute and memory intensive, with memory bound stalls (MBS) making a significant part of their execution time. For maximum reduction in memory stalls, compilers need to consider architectural details of a platform and utilize its hardware components efficiently. In this paper, we propose a compiler optimization for a vision-processing system through classification of memory references to reduce MBS. As the proposed optimization is based on the architectural features of a specific platform, i.e., Myriad 2, it can only be applied to other platforms having similar architectural features. The optimization consists of two steps: affinity analysis and affinity-aware instruction scheduling. We suggest two different approaches for affinity analysis, i.e., source code annotation and automated analysis. We use LLVM compiler infrastructure for implementation of the proposed optimization. Application of annotation-based approach on a memory-intensive program shows a reduction in stall cycles by 67.44%, leading to 25.61% improvement in execution time. We use 11 different image-processing benchmarks for evaluation of automated analysis approach. Experimental results show that classification of memory references reduces stall cycles, on average, by 69.83%. As all benchmarks are both compute and memory intensive, we achieve improvement in execution time by up to 30%, with a modest average of 5.79%.en_US
dc.description.sponsorshipThis work is supported by European Union’s Horizon2020 research and innovation programme under grant agreement number 687698 and Ph.D. scholarship from Higher Education Commission (HEC) of Pakistan awarded to Naveed Ul Mustafa.en_US
dc.identifier.doi10.1007/s11554-018-0830-8en_US
dc.identifier.issn1861-8200
dc.identifier.urihttp://hdl.handle.net/11693/75485
dc.language.isoEnglishen_US
dc.publisherSpringeren_US
dc.relation.isversionofhttps://dx.doi.org/10.1007/s11554-018-0830-8en_US
dc.source.titleJournal of Real-Time Image Processingen_US
dc.subjectComputer visionen_US
dc.subjectCompiler optimizationen_US
dc.subjectExecution timeen_US
dc.subjectMemory bound stallsen_US
dc.titleExploiting architectural features of a computer vision platform towards reducing memory stallsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Exploiting_architectural_features_of_a_computer_vision_platform_towards_reducing_memory_stalls.pdf
Size:
1.57 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: