2020 IEEE Transactions on Verly large Scale Integrated Systems (TVLSI) Best Paper Award to L. Benini and D. Rossi

Luca Benini and Davide Rossi have received the TVLSI Transactions Best Paper Award for the paper "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices".

Published on 20 November 2020 | Award

2020 VLSI Transactions Best Paper Award of IEEE CAS Society

The paper is the result of a partnership of University of Bologna, GreenWaves Technologies, Integrated Systems Laboratory of ETH Zürich.

"Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices" IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Volume 25, Issue 10, Oct. 2017).

Authors: Michael Gautschi, Pasquale Davide Schiavone, Andreas Traber, Igor Loi, Antonio Pullini, Davide Rossi, Eric Flammand, Frank K. Gürkaynak, Luca Benini

Abstract:

Endpoint devices for Internet-of-Things not only need to work under extremely tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from a few kOPS to GOPS. Near-threshold (NT) operation can achieve higher energy efficiency, and the performance scalability can be gained through parallelism. In this paper, we describe the design of an open-source RISC-V processor core specifically designed for NT operation in tightly coupled multicore clusters. We introduce instruction extensions and microarchitectural optimizations to increase the computational density and to minimize the pressure toward the shared-memory hierarchy. For typical data-intensive sensor processing workloads, the proposed core is, on average, 3.5× faster and 3.2× more energy efficient, thanks to a smart L0 buffer to reduce cache access contentions and support for compressed instructions. Single Instruction Multiple Data extensions, such as dot products, and a built-in L0 storage further reduce the shared-memory accesses by 8× reducing contentions by 3.2×. With four NT-optimized cores, the cluster is operational from 0.6 to 1.2 V, achieving a peak efficiency of 67 MOPS/mW in a low-cost 65-nm bulk CMOS technology. In a low-power 28-nm FD-SOI process, a peak efficiency of 193 MOPS/mW (40 MHz and 1 mW) can be achieved.