Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2307.04339 (cs)

[Submitted on 10 Jul 2023]

Title:Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Authors:Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing

View PDF

Abstract:Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show that Miriam can increase system throughput by 92% while only incurring less than 10\% latency overhead for critical tasks, compared to state of art baselines.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2307.04339 [cs.DC]
	(or arXiv:2307.04339v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2307.04339

Submission history

From: Zhihe Zhao [view email]
[v1] Mon, 10 Jul 2023 04:30:44 UTC (31,021 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators