Avoid common mistakes on your manuscript.
Embedded systems are increasingly important for signal processing, machine learning, and multimedia applications in recent years. While such systems will look to play an important role ahead for signal processing and AI application designs, many challenging problems, such as large data computations and memory footprints, remain to be resolved, especially from the efficiency aspect. Programming models, compilers, API designs, architecture designs, and software tools all need to contribute to the advance of system designs. This special issue aims to bring together researchers in the related areas to present the latest developments and technical solutions concerning various aspects of signal processing and AI applications in system optimizations.
This special issue consists of seven papers that are briefly discussed as follows:
First off, Wu et al.’s paper on accelerating OpenVX provides a compiler framework to generate efficient binary code. Specifically, given a program written in OpenVX, the framework proposed in this work translates the program to Halide for scheduling, and then converts it to MLIR. The results show that an order of magnitude speedup for kernels used in image processing and AI applications can be achieved via utilizing well-designed dialects of MLIR and Halide scheduler.
Hsieh et al. present an optimization assistant DLOOPT to make the auto-tuning module, namely AutoTVM, efficient in Apache TVM, which is a well-known open-source deep learning compiler. Although AutoTVM facilitates the tuning of optimization configurations (e.g., tiling size and loop order) for users, it takes quite a long time to search the optimum configurations. DLOOPT adopts a set of tuning strategies to simplify the tuning process, which can reduce more than 99% of time in terms of developing adequate optimizations for operators in a model, and make the optimization process much easier to improve the development of learning-based applications.
Lee et al. proposes an effective method for programmers to improve the performance of matrix multiplication layers in DNN applications through Halide scheduling primitives. With the base version of OpenCL framework, this work provides Halide scheduling primitives for sparse matrix compression and includes sparse matrix multiplication and sparse convolution method. Experimental results show that the DNN model with the proposed Halide compression scheduler can be executed in almost 2x speed-up, in comparison to the original model without any compression schemes.
Lai and Yang present a new binary translator named Rabbit, which uses the latest version of the LLVM framework (version 8.0), and introduces a novel optimization technique, i.e., platform-independent hyperchaining. The main idea is to chain static and dynamic translated code blocks together. In the past, translated code blocks can only be chained with the same type of translated blocks. Moreover, platform-dependent hyperchaining is also introduced, which can recompile source binaries to x86-64 and RISC-V and gain further performance improvement.
Through Halide scheduling, Zhao et al. investigate how to use Halide to build a framework for fast prototyping and optimization on OpenVX DAG graphs. After rewriting OpenVX kernels to Halide (i.e., over different data access modes), the auto-scheduler of Halide is utilized in a systematic way to achieve the acceleration. The usage of Halide can greatly shorten the codes, reduce the coding time, but also improve the performance. Experimental results show that the Halide scheduling can achieve great improvement for the OpenVX NNE.
Several approaches for sparse signal recovery have been developed to provide accurate recovery from a small portion of available data. Zaric et al. provides an approach to combine gradient and threshold-based approaches, by which both accurate and computationally efficienct signal reconstruction is possible. A software tool is also provided, which allows users to choose a signal from the database, select the sparsity domain, calculate its initial transformation in the selected domain and perform the reconstruction using the designed approach.
Wang et al., presents an approach to consider EfficientNets as backbone to build lightweight version and adjust the architecture toward a more hardware-friendly structure, namely Network Candidate Search. To improve the efficiency of the searching process, grouping and elimination steps are additionally introduced. Based on state-of-the-art for hardware-friendly CNN, this work focuses on the scaling down principle through three dimensions, i.e., input resolution, depths and channels, where outperformed models consuming similar hardware cost can be derived.
Overall, these papers provide frontier information related to embedded computing, embedded compilers, and embedded programming tools. The first five papers cover language and compilation techniques, and some specifically focus on AI applications. The last two papers advance the techniques of signal processing and learning model with hardware-awareness.
We thank all the authors, the reviewers, the JSPS journal administrative staff and the JSPS Editor-in-Chief for all their contributions to making the high quality of this JSPS Special Issue possible.
We hope you enjoy reading the articles.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, KH., Lin, YC. & Lee, JK. Guest Editorial: Special Issue on Systems Optimizations for DSP and AI Applications. J Sign Process Syst 95, 569–570 (2023). https://doi.org/10.1007/s11265-023-01854-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-023-01854-y