### Davide Rossi, Ph.D.

e-mail: davide.rossi@unibo.it

# **Curriculum Vitae**

#### **Present Academic Positions**

- Associate Professor in Electronics at Energy Efficient Embedded Systems Lab, Department of Electrical, Electronic and Information Engineering "Guglielmo Marconi" (DEI) of University of Bologna (since November, 2022).
- Member of Center for Industrial Research on Information and Communication Technologies (CIRI ICT) of the University of Bologna (since 2016).
- Member of ARCES Advanced Research Center on Electronic Systems "Ercole De Castro" of the University of Bologna (since 2020).
- Member of Alma AI Alma Mater Research Institute for Human-Centered Artificial Intelligence of the University of Bologna (since 2021).

# Fellowships and Visiting Positions

- Visiting Assistant Professor at the Integrated System Laboratory (IIS) of Eidgenössische Technische Hochschule Zürich (ETHZ) (Sep. - Nov. 2017, Sep. - Nov. 2018).

### **Experience**

- *Technical Consultant* for the development of Ultra-Low-Power SoCs for IoT end-nodes processing. GreenWaves Technologies (2015-2021).
- *Tenure Track Assistant Professor* in Electronics at Energy Efficient Embedded Systems Lab, Department of Electrical, Electronic and Information Engineering "Guglielmo Marconi" (DEI) of University of Bologna (2018 2022).
- *Junior Assistant Professor* in Electronics at Energy Efficient Embedded Systems Lab, Department of Electronic and Information Engineering "Guglielmo Marconi" (DEI) of University of Bologna (2015 2018).
- *Post Doc Researcher* at Energy Efficient Embedded Systems Lab, Department of Electrical, Electronic and Information Engineering "Guglielmo Marconi" (DEI) of University of Bologna (2013 2015).
- Ph. D. student at Advanced Research Center on Electronics Systems, Univ. Bologna (2009 2012).
- Junior Member of R&D Staff, STMicroelectronics, Agrate Brianza, Italy (2008 2012).
- Research Assistant at Tampere University of Technology, Tampere, Finland (2005).

#### **Education**

- Ph.D. in Electronics, Telecommunications, and Information Technologies Engineering, University Of Bologna, 2012.
- M.Sc. in Electronics Engineering, University of Bologna/Tampere University of Technology, 2007 (grade: 104/110).
- B.Sc. Electronics Engineering, University of Bologna, 2004 (grade: 105/110).

# **Scientific Profile**

## **Research Activities**

My research focuses on computing systems architecture and the design of digital integrated systems, with a particular emphasis on low-power architectures, reconfigurable architectures, and their applications. My research journey began with an internship at Tampere University of Technology (TUT) during my Master's studies, culminating in earning my Master's degree from both TUT and the University of Bologna. My research at the time centered around embedded systems and coarse-grained reconfigurable architectures. During this period, I co-authored two journal papers and three conference papers.

Early in my career as a graduate researcher, I spent a year at ST Microelectronics before starting my Ph.D. at the Advanced Research Center on Electronic Systems (ARCES) at the University of Bologna. My Ph.D. research focused on heterogeneous reconfigurable architectures, specifically investigating flexible techniques for accelerating embedded multi-core Systems-On-Chip. I explored two distinct approaches: the first involved heterogeneous reconfigurable fabrics featuring functional units with varying granularities, while the second focused on techniques for configuring accelerators at both run-time and design-time, balancing trade-offs in performance, power, and cost. Between 2009 and 2012, I designed and taped-out two chips (serving as the lead architect and chip designer) and co-authored four journal papers, two book chapters, and four conference papers.

In 2013, I joined the Energy Efficient Embedded Systems (EEES) laboratory as a postdoctoral researcher, where I began work on near-threshold multi-processing. I lead the PULP project (www.pulp-platform.org), an ambitious initiative aimed at developing and researching an open-source, ultra-low-power hardware-software platform for embedded processing in IoT end-nodes. This project spans silicon implementation, programming, and applications. As Chief Architect of the PULP platform, I am responsible for several chips produced under this initiative. PULP and its related intellectual property (IPs) are now used by over 30 companies and universities worldwide, including Google, Micron, IBM, CEVA,

STMicroelectronics, Mentor, and Cadence. I am also actively involved in developing a product based on PULP in collaboration with the French startup GreenWaves Technologies.

Over the past five years, my research has focused primarily on the hardware/software co-design of low-power multi-core embedded platforms and their applications. I have made significant contributions to the design and programming of lowpower multi-core architectures, embedded applications for low-power systems—particularly in the areas of embedded vision and biometric signal processing—and emerging low-power computing technologies such as brain-inspired computing, approximate computing, and transprecision computing.

To date, I have co-authored over 150 papers and three book chapters on the topics covered in this summary. I strongly believe in collaborative, multi-disciplinary research and have an extensive record of international cooperation with leading companies and institutions such as Meta, STMicroelectronics, NXP, Infineon, Global Foundries, Quicklogic, Tampere University of Technology, Thales, Braunschweig University, ETH Zurich, EPFL, CEA.

# A) Teaching

# **Teaching Activities**

During the past years I taught in 4 courses at University of Bologna. The overall number of hours per year taught during last 8 years has been:

- Academic year 2015-2016: 30 hours
- Academic year 2016-2017: 60 hours
- Academic year 2017-2018: 60 hours
- Academic year 2018-2019: 120 hours
- Academic year 2019-2020: 120 hours
- Academic year 2020-2021: 120 hours
- Academic year 2022-2023: 120 hours
- Academic year 2023-2025: 120 hours

More details about the courses can be found below:

- 73801 - LAB OF HARDWARE-SOFTWARE DESIGN M (84419 - LAB OF DIGITAL ELECTRONICS M) - 30 HOURS. University of Bologna - Academic years 2015-2024.

Aim of this course is to enrich the practical experience of the students on advanced digital hardware design tools and methodologies. The students are expected to work on a practical project to deeper their knowledge in digital hardware design, integration of hardware modules into Systems on Chip, and prototyping of digital systems on FPGA devices. The course also covers aspects related to interactions between software and hardware components in Systems on Chip.

- 73731 - ARCHITETTURE E PROGRAMMAZIONE DEI SISTEMI ELETTRONICI T-A - Modulo 2 (29035 -LABORATORIO DI ARCHITETTURE E PROGRAMMAZIONE DEI SISTEMI ELETTRONICI INDUSTRIALI T-A- Modulo 2) – 30 HOURS. Academic years 2016-2021.

Aim of this course is to teach the architecture of micro-controller based systems using ARM cortex M cores, and firmware programming for industrial applications. It covers both theoretical and practical aspects related to architecture and programming of ARM Cortex M microcontrollers.

- 73388 - DIGITAL SYSTEMS M - 60 HOURS CFU - 2018-2019

Aim of this course is to provide a vision of digital circuits at transistor and gate level so as to have clear ideas about the main factors determining circuit performance, power consumption, signal integrity digital throughput.

- 93390 - DIGITAL SYSTEMS AND INTRODUCTION TO COMPUTER ARCHITECTURES M - Modulo 1 (84447 -INTRODUCTION TO COMPUTER ARCHITECTURES M) - 60 HOURS - 2019-2024

Aim of this course is to provide a vision of digital circuits at transistor and gate level so as to have clear ideas about the main factors determining circuit performance, power consumption, signal integrity digital throughput. Overview of digital circuits at logic and register transfer level. Overview of microprocessor and memory architectures. Basics of testing, performance and power consumption at system level.

- 35364 - ARCHITETTURE DIGITALI PER L'ELABORAZIONE DEL SEGNALE M (Modulo 2) - 30 HOURS - 2019-2024

Illustrate the most commonly used digital architectures for signal processing. Starting from the study of some significant audio and video signal processing algorithms, the specifications that must be met by hardware architectures for signal processing are derived. The analysis of the most commonly used architectures, both serial and parallel, is carried out by observing the close correlation between algorithmic and architectural specifications within this class of machines.

#### **Tutor Activities**

- 28727 - PROGETTO DI SISTEMI ELETTRONICI T-A (Tutor). University of Bologna, Academic Years 2010-2012. Aim of this course is to teach the basics of digital hardware design, RTL hardware description languages and implementation of digital circuits on FPGA devices.

For more information, visit my personal page: https://www.unibo.it/sitoweb/davide.rossi/didattica

### Summer Schools for Ph.D. Students

- D. Rossi, Parallel Ultra Low-Power Processing (PULP) Systems, Nips Summer school, 19/07/2018, Perugia, Italy.
- D. Rossi, <u>PULP: A Multi-Core Platform for Micropower In-Sensor Analytics</u>, Nips Summer school, 03/09/2019, Perugia, Italy.
- D. Rossi, Digital computing platforms for near-sensor processing at the extreme edge of the IoT, 05/07/2021, SIE Summer School, Trieste, Italy.
- D. Rossi, <u>Digital computing platforms for near-sensor processing at the extreme edge of the IoT</u>, 05/07/2021, SIE Summer School, Trieste, Italy.
- L. Benini, D. Rossi, *Working with RISC-V: from open ISA to open Architecture to open Hardware*, HiPEAC ACACES Summer School, 13-18/09/2021, Fiuggi, Italy.
- D. Rossi, <u>PULP: Embedding AI at the Extreme Edge of the IoT</u>, CIS Edge AI School, Lausanne, 14/06/2022, Lausanne, Switzerland.
- D. Rossi, PULP: Energy-Efficient ML at the Extreme Edge of the IoT, SSIE Summer School, 13/07/2022, Bressanone, Italy.

# **Support Activities**

During last years I have been tutor of one bachelor course, advisor of 30 bachelor theses, 21 master theses, co-advisor of 7 master thesis, advisor of 8 Ph.D. students, co-advisor of 7 Ph.D. students, responsible for several research grants. Moreover I co-supervised other 8 Ph.D. students from UNIBO and ETH Zurich, even if not formally, as demonstrated by a number of joint publications reported in section B) Research. More information about teaching support activities can be found below.

### **Advisor of Bachelor Theses**

- Lorenzo Selvatici, Thesis Title: "CNN2FPGA A Convolutional Neural Network Compiler for FPGA", 23/07/2018.
- Annachiara Biguzzi, Thesis Title: "Towards Hardware Implementation of Real-Time Spike Sorting Algorithms", 05/10/2018.
- Armando Armerì, Thesis Title: "Implementazione ed ottimizzazione di una rete convoluzionale MobileNet quantizzata su architettura ARM Cortex-M7", 20/12/2018.
- Rea Dizdari, Thesis Title: "Modellazione del background basata su Gaussian mixture model su sistema a microcontrollore", 20/12/2018.
- Michele Gazzarri, Thesis Title: "Progettazione e sviluppo di un sistema di acquisizione dati per banchi prova sospensioni", 15/03/2019.
- Luca Serfilippi, Thesis Title: "Architettura di sistema del firmware di un nano-drone intelligente e analisi dell' integrabilità in un system-on-chip ad alta efficienza energetica", 3/10/2019.
- Nicolas Bruscoli, Thesis Title: "Progettazione e realizzazione di una toolchain Simulink per processori Zynq", 19/12/2019.
- Giovanni Giannone, Thesis Title: "Ottimizzazione di una libreria per il calcolo della FFT sviluppata su architettura PULP e adattata per la piattaforma ARM", 11/03/2020.
- Luca Barbieri, Thesis Title: "Analisi comparativa di algoritmi paralleli di decomposizione QR per l'elaborazione di biosegnali su architettura PULP", 09/10/2020.
- Veronica Gavagna, Thesis Title: "Progetto di un sistema automatico per il monitoraggio della produzione di rifiuti riciclabili basato su tecnologia IoT", 09/10/2020.
- Paolo Carboni, Thesis Title: "Estensione dell'ISA di un processore RISC per supportare reti neurali quantizzate a precisione mista", 10/03/2021.
- Arianna Aldrovandi, "Caratterizzazione di sensori PPG per applicazioni biomedicali indossabili", 02/12/2021.
- Luigi Giordano, "Gestione dell'orientamento del polso tramite segnali EMG in applicazioni di controllo protesico", 02/12/2021.
- Andrea Helga Bernardi, "Sviluppo di un sistema indossabile a microcontrollore basato su MicroPython per la raccolta di immagini e dati EMG in applicazioni di riconoscimento di gesti", 02/12/2021.
- Marzia Bianco, "Applicazione e analisi di data augmentation su segnali PPG ed accelerometrici per la predizione della frequenza cardiaca", 03/02/2022.
- Giovanni Orzalesi, "Sviluppo di un sistema embedded basato su elettrodi EMG dry per il controllo non invadente in applicazioni HMI", 21/03/2022.

- Edoardo Roda, "Sviluppo driver di controllo protesi di mano robotica", 21/03/2022.
- Simone Di Stasi, "Progettazione ed Ottimizzazione di Componenti Per il Trasferimento Autonomo di Dati nei Sistemi On Chip", 05/10/2022.
- Fabio Grimandi, "Low-cost, Low-Power Embedded Wireless Hand Gesture Tracking System for HMI applications", 05/10/2022.
- Fabio Armaroli, "Elettronica flessibile per display", 15/12/12022.
- Alice Afragoli, "Adattamento di un Ethernet MAC RGMII per integrazione in tecnologia CMOS 22nm FDX", 15/12/2022.
- Samuele Spadoni, "A Low-power Sterocamera System for Nano-drones", 15/12/2022.
- Salvatore Vangone, "Sviluppo di un ambiente di verifica per la periferica I3C", 15/12/2022.
- Matteo Buttazzi, "Progettazione di un ambiente di testing per periferiche in sistemi a microcontrollore", 22/03/2023.
- Alessandro Briccoli, "Progettazione di un ambiente di verifica per una periferica CAN", 22/03/2023.
- Mattia Girotti, "Prototipizzazione di una periferica ethernet RGMII per applicazioni embedded", 22/03/2023.
- Alexandru Trocan, "Design and testing of new sensor technologies for nanodrones", 22/03/2023.
- Lorenzo Ambrogiani "Progetto di un sistema di elettrodi asciutti ad alta densità per elettroencefalogramma", 14/10/2023.
- Marco Monaco, "Progettazione di un modulo Ethernet per FPGA ed integrazione di un analizzatore logico per la verifica funzionale di un Ethernet IP", 2/12/2023.
- Nico Conti, "Design and Implementation of a Multimodal Wearable System for Real-Time Acquisition and Processing of Electrodermal Activity (EDA) and Electrocardiogram (ECG) Signals", 18/03/2024.

### **Advisor of Master Theses**

- Velu Prabhakar Kumaravel, Thesis Title: "Experimental Evaluation of BITalino: a low-cost modular platform for biosignals acquisition", 16/03/2018.
- Weiwei Liao, Thesis Title: "low power serial chip-to-chip communication link for ultra low power IoT end nodes", 24/07/2018.
- Hunaina Farid, Thesis Title: "Influence of Partial Dynamic Reconfiguration on Power Consumption of FPGA Based Implementations", 23/07/2019.
- Giovanni Landi, Thesis Title: "Design of the Park Assist Based on a Rear Corner Radars and Rear-View Camera Sensor Fusion Strategy", 24/10/2019.
- Gianmarco Ottavi, Thesis Title: "Sviluppo e Ottimizzazione di un Processore Configurabile con Unità di Calcolo a Precisione Variabile", 19/12/2019.
- Luca Bertaccini, Thesis Title: "Design of a Cluster-Coupled Hardware Accelerator for FFT Computation", 06/02/2020.
- Nazareno Bruschi, Thesis Title: "Accelerating Mixed-Precision Quantized Neural Networks on Parallel Ultra Low Power IoT End Nodes", 11/03/2020.
- Ilario Coppola, Thesis Title: "A Reconfigurable MIMD/SIMD RISC-V Cluster for Energy-Efficient Parallel Computing", 11/03/2020.
- Mattia Sinigaglia, "Progettazione ed implementazione di un Sistema On Chip per applicazioni audio", 21/07/2021.
- Michele Brugnara, "A Neural Network approach for EEG Bad Channel Detection in Real -Time", 07/10/2021.
- Alessandro Nadalini, "Progettazione ed ottimizzazione di un processore dedicato per accelerazione di reti neurali quantizzate a precisione mista", 07/10/2021.
- Nico Orlando, "Sviluppo di driver con interfaccia OS-independent per la piattaforma PULP", 21/03/2022.
- Maicol Ciani, "System-Level Integration of a Security Enclave on a RISC-V based Embedded System On Chip", 05/10/2022.
- Giulia Remondini, "Progettazione e ottimizzazione di un Turbo Encoder parallelo su dispositivo FPGA", 05/12/2022.
- Saeed Pourghasemy, "Programmability and Acceleration on Edge Computing Devices: From Embedded Microcontrollers to High-Performance and Efficient Heterogeneous Systems"
- $\ Riccardo\ Tedeschi, "A\ low-cost\ fault\ tolerant\ technique\ for\ microcontroller-class\ RISC-V\ processors",\ 15/09/2023.$
- $\ Chaoqun\ Liang, "Design\ and\ Implementation\ of\ an\ FPGA-based\ CPI\ to\ CSI-2\ Protocol\ Adapter",\ 14/10/2023.$
- Andrea Helga Bernardi, "Smart Glasses as a Sensor Fusion Platform for Acquisition and Processing of ExG and Image Data", 18/03/2024.
- Marco Fontana, "Progettazione Hardware-Software di un modulo ethernet su Piattaforma FPGA", 18/03/2024.
- Edoardo Guerra, "progettazione di un acceleratore hardware integrato con pid e dithering per il controllo ottimizzato di convertitori di potenza", 18/03/2024.
- Lavinia Rossi, "Sviluppo di un modulo software di controllo per il calcolo in tempo reale del rapporto di sterzata di un carrello elevatore con assale anteriore bi-motore (differenziale elettronico)", 22/07/2024.

#### **Co-Advisor of Master Theses**

- Marco Donato Torsello, Thesis Title: "Utilizzo di metodi di configurazione automatica per un'applicazione di transprecision computing su piattaforma PULP", 06/10/2017.

- Riccardo Gandolfi, "Design of a memory-to-memory tensor reshuffle unit for ultra-low-power deep learning accelerators", 20/07/2021.
- Yvan Tortorella, "Design of a Low-Precision Floating-Point Matrix Multiplication Accelerator for On-Chip Deep Learning", 07/10/2021.
- Aurora Di Gianpietro, "Integrating a Tensor Datapath into a Small and Efficient Vector Processor", 01/02/2024.
- Luigi Ghionda, "Design of a Multi-Precision Floating-Point FFT Hardware Accelerator", 18/03/2024.
- Lorenzo Greco, "Progettazione di un cluster eterogeneo con acceleratore analogico per intelligenza artificiale basato su ePCM", 18/03/2024.
- Andrea Belano, "Softex: Softmax Computing Engine for Fast Exponential Acceleration", 23/07/2024.

### Advisor of Ph.D. Students

- *Jie Chen (University of Bologna, ETIT, 34° Cycle)*, Research Topic: Design and Optimization of the Memory Hierarchy of Parallel-Ultra-Low Power Architectures.
- Nazareno Bruschi (University of Bologna, ETIT, 36° Cycle), Research Topic: Parallel computing architectures for heterogenous acceleration of AI algorithms.
- Gianmarco Ottavi, (University of Bologna, ETIT, 37° Cycle), Research topic: High Performance RISC-V Processors for Embedded Applications
- Maicol Ciani, (University of Bologna, EIT4SEMM, 38° Cycle), Research Topic: Secure High Performance Computing for Autonomous Drone Navigation
- Mattia Sinigaglia, (University of Bologna, EIT4SEMM, 38° Cycle), Research Topic: Heterogeneous Platform for High-Performance Embedded Computing.
- Riccardo Tedeschi, (University of Bologna, ETIT, 39° Cycle), Research Topic: Dependable Processor for High-Performance Automotive Applications.
- Chaoqun Liang, (University of Bologna, EIT4SEMM, 39° Cycle), Research Topic: Dependable Interconnect Systems For Automotive Applications
- Victor Isachi, (University of Bologna, EIT4SEMM, 39° Cycle), Research Topic: High Performance Accelerators for AI

#### Co-Advisor of Ph.D. Students

- *Enrico Tabanelli (University of Bologna, ETIT, 35° Cycle)*, Research Topic: Acceleration of learning algorithms for parallel ultra-low power processing systems.
- Alessio Burello (University of Bologna, ETIT, 35° Cycle), Research topic: Edge machine learning algorithms for the IoT.
- Luca Valente, (University of Bologna, ETIT, 36° Cycle), Research Topic: Parallel Ultra-Low Power Processing for the IoT-Ultra Low-Power electronic systems.
- *Yvan Tortorella*, , (*University of Bologna, ETIT, 37° Cycle*), Research Topic: Heterogeneous Computing Platform for Space Applications.
- Amir Kiamarzi, (University of Bologna, EIT4SEMM, 39° Cycle), Research Topic: Vector Processors for Frequency domain applications.
- Alessandro Nadalini, (University of Bologna, ETIT, 39° Cycle), Research Topic: Reduced Precision Accelerators for AI.
- Massimo Micolitti, (University of Bologna, MUNER, 39° Cycle), Research Topic: Embedded Processing for Automotive Applications.

### Other supervision activities with Ph.D. Students

- Francesco Conti (University of Bologna). Hardware blocks for acceleration of convolutional neural networks in programmable deeply embedded system-on-chip.
- *Erfan Azarkhish (University of Bologna)*. Exploration of hardware architectures and systems for near-memory computing, exploiting capabilities delivered by new generation 3D stacked memories such as Hybrid Memory Cube.
- Manuele Rusci (University of Bologna, FBK). Deeply embedded systems for smart vision coupling analog imagers with energy efficient digital processing platforms.
- Renzo Andri (ETH Zurich). Binary convolutional accelerators to enable execution of convolutional neural networks in 10mW power envelope.
- Satyajit Das (Université de Bretagne-Sud, University of Bologna). Coarse-grained reconfigurable architectures for acceleration of both data- and control-intensive tasks in the field of IoT applications.
- Fabio Montagna (University of Bologna). Design and optimization of bio-signals processing applications on parallel embedded computing platforms.
- Angelo Garofalo (University of Bologna). Hardware-Software design of Ultra-Low power Multiprocessor Systems on Chip.
- Pasquale Davide Schiavone (ETH Zurich). Research Topic: Design of Microprocessors for IoT end-nodes

# B) Research

### **Research Activities**

### **Impact of Publications**

In the last years, I have published (international, peer-reviewed) **63 journal papers 101 conference papers**, and **4 book chapters**. My **h-index is 39, i10-index is 103** the total number of citations is **5953** (from Google Scholar, Sept 23<sup>th</sup>, 2024). I'm author of several papers in top solid-state circuit society conferences and journals including **2 ISSCC**, **6 JSSC**, **1 VLSI Symposium**, **3 Hot Chips**, **5 ESSCIRC**, **1 CICC**.

Scopus Page: https://www.scopus.com/authid/detail.uri?origin=resultslist&authorId=7103169675&zone=

Google Scholar Page: <a href="https://scholar.google.it/citations?user=FOkQ6qMAAAAJ&hl=it">https://scholar.google.it/citations?user=FOkQ6qMAAAAJ&hl=it</a>

# Participation to International Conferences as Speaker

- Rossi, F.Campi, A.Deledda, S.Spolzino, S.Pucillo, <u>A Heterogeneous Digital Signal Processor Implementation for Dynamically Reconfigurable Computing</u>, Custom Integrated Circuit Conference (CICC), 2009.
- D. Bortolotti, D. Rossi, A. Bartolini, L. Benini, <u>A Variation Tolerant Architecture for Ultra Low Power Multi-processor Cluster</u>, International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013.
- D. Rossi, A. Pullini, I. Loi, F. Conti, G. Tagliavini and A. Marongiu, <u>Energy efficient parallel computing on the PULP platform with support for OpenMP</u>, 2014 IEEE 28-th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 2014.
- D. Rossi, F. Conti, A. Marongiu, A. Pullini, I. Loi, M. Gautschi, G. Tavaglini, A. Capotondi, P. Flatresse, L. Benini, <u>PULP: A Parallel Ultra-Low-Power Platform for Next Generation IoT Applications</u>, Hot Chips: A Symposium on High Performance Chips, 2015.
- D. Rossi, A. Pullini, M. Gautschi, I. Loi; F. K. Gurkaynak, P. Flatresse, L. Benini, <u>A –1.8V to 0.9V body bias, 60 GOPS/W</u> <u>4-core cluster in low-power 28nm UTBB FD-SOI technology</u>, in SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2015 IEEE, vol., no., pp.1-3, 5-8 Oct. 2015.
- R. Andri, L. Cavigelli, D. Rossi, L. Benini, YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights, ISVLSI 2016.
- D. Rossi et al., "4.4 A 1.3TOPS/W @ 32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode," 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021.

## Participation to National and International Conferences and Workshops as Invited Speaker and Keynote

- D. Rossi, <u>A 0.44 to 1.2V Voltage</u>, -1.8V to 0.9V Body Bias, 60 GOPS/W 4-core Cluster in conventional-well 28nm UTBB <u>FD-SOI technology</u>, LetiWorkshop FDSOICE 2015, 22/06/2015, CEA-LETI, Minatec Campus, Grenoble, France. (Invited Talk).
- D. Rossi, <u>Sub-pj per Operation Scalable Computing with the PULP platform</u>, Workshop Commissione Calcolo e Reti Istituto Nazionale di Fisica Nucleare (INFN), 19/05/2016, La Biodola, Isola d'Elba, Italy. (Invited Talk).
- D. Rossi, <u>Sub-pj per Operation Scalable Computing: the PULP experience</u>, IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 11/10/2016, San Francisco (CA), USA. (Invited Talk).
- D. Rossi, <u>Sub-pj per Operation Scalable Computing: the next challenge</u>, 2016 ICSEE International Conference on the Science of Electrical Engineering, 17/11/2016, Eilat, Israel. (Invited Talk).
- D. Rossi, *Smart Integrated Microsystems for the IoT: The Energy Efficiency Challenge*, WEEE 2017, 12/06/2017, Ystad, Sweden. (Invited Talk).
- D. Rossi, *Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes*, 23/06/2017, Fraunhofer ITWM, Kaiserslautern, Germany. (Invited Talk).
- D. Rossi, <u>Sub-pJ per Operation Scalable Computing with the PULP Platform</u>", MCC2017, 30/11/2017 Uppsala, Sweden (Invited Keynote).
- P. Stenström, D. Rossi, J. Grönqvist, S. Kaxiras, *Is the Multicore dead?*, Multicore Day 2018, Monday, November 29, 2017, Uppsala, Sweden. (Invited Panel).
- D. Rossi, <u>Ultra-Low-Power Digital Architectures for the Internet of Things</u>, 14/03/2018, ISQED, Santa Clara, CA, USA (Invited Tutorial).
- D. Rossi, <u>Quentin: A Near-Threshold SoC for Energy-Efficient IoT End-Nodes in 22nm FDX Technology</u>, DATE 2018, 22/03/2018, Dresden, Germany. (Invited Talk).
- D. Rossi, *Quentin: An Ultra-Low-Power PULPissimo SoC in 22nm FDX*, IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference, 2018, 15/10/2018 San Francisco, USA. (Invited Talk).
- D. Rossi, *PULP: A Transprecision Multi-Core Platform for Micropower In-Sensor Analytics*, The 16th International System-on-Chip (SoC) Conference, Exhibit, and Workshops, 17/10/2018, Irvine, California. (Invited Talk).
- D. Rossi, *Parallel Ultra-Low-Power Systems*, Bosch, 14/11/2018, Nurmberg, Germany. (Invited Talk).
- D. Rossi, <u>Mr. Wolf: A RISC-V Parallel Ultra Low Power SoC for IoT Edge Processing</u>, Multicore Day 2018, Monday, November 26, 2018, Uppsala, Sweden. (Invited Talk).

- D. Rossi, PULP RISC-V Training, 24-25/07/2018, Silicon Laboratories (SILABS), Austin, Texas, USA. (Invited Tutorial).
- D. Rossi, PULP RISC-V Training, 18-19/09/2018, Silicon Laboratories (SILABS), Rennes, France. (Invited Tutorial).
- D. Rossi, <u>PULP: An Open-Source RISC-V based Multi-Core Platform for In-Sensor Analytics</u>, Workshop on Open Source Design Automation (OSDA) at DATE 2019, 29/03/2019, Florence, Italy. (Invited Keynote).
- D. Rossi, PULP Tutorial, 13/06/2019, WOSH, ETHZ, Switzerland. (Invited Tutorial).
- D. Rossi, *PULP RISC-V Training*, WISEKEY SA, 18-19/06/2019, Aix-en-Provence, FR. (Invited Tutorial).
- D. Rossi, PULP: Open Hardware at the Edge of the IoT, WOC, HiPEAC 2020. (Invited Talk).
- D. Rossi, R. Aitken, E. Alon, B. Khailany, S. Kottapalli, S. Ouyang, D. Patterson, *Is an Open Source Hardware Revolution on the Horizon?*, 18/02/2020, ISSCC 2020, San Francisco, California. (Invited Panel).
- D. Rossi, *PULP Platform Overview*, 03/02/2020 OFA Workshop, Bruxelles. (Invited Talk).
- D. Rossi, <u>Extending RISC V Platforms for ML at the Extreme Edge of the IoT</u>, 21/02/2021, ISSCC 2021 FORUM. (Invited Talk).
- D. Rossi, T.Benz, L. Bertaccini, F. Zaruba, *Special Session: Open Source On-Chip Communication from Edge to Cloud:* the *PULP experience*, *NOCS* 2021, 15/10/2021, Virtual. (Invited Talk).
- D. Rossi, *Open Source HW for IoT and its impact on the Industrial Ecosystem: the PULP Experience*, OSHEAN Open Source Hardware European Alliances and iNitiatives Workshop, DATE 2022, Friday, 18 March 2022 08:30. (Invited Talk).
- D. Rossi, <u>PULP: Open-Source Heterogeneous Parallel Computing from IoT to HPC</u>, 2023 European Innovation Stars Workshop on Future Wireless Communication Technologies and their Challenges on IC Design, Munich, 09/03/2023.(Invited Talk).
- D. Rossi, <u>Demystifying NV-AIMC: Towards End-to-End DNN Inference on Massively Parallel Analog In-Memory Computing Architectures</u>, Brain-Inspired Computing Workshop 2023, 9/6/2023, Modena. (Invited Talk).
- D. Rossi, <u>Ten years of PULP: The Evolution of the Species from IoT to HPC</u>, ORCONF 2023, Munich, 16/09/2023. (Invited Talk).
- D. Rossi, <u>Ten years of PULP: The Evolution of the Species from IoT to HPC</u>, Univ. Louisiana, Virtual, 29/09/2023. (Invited Talk).
- D. Rossi and others, *The Chiplet Revolution: Building Tomorrow's Electronics Piece by Piece*, CadenceLIVE Europe 2023, Munich, 10/10/2023. (Invited Panel).
- D. Rossi, <u>PULP: RISC-V based Heterogeneous Parallel Computing from IoT to HPC</u>, SSCS Texas chapter, 16/11/2023, Virtual. (Invited Talk).
- D, Rossi, <u>PULP: Open Source Heterogeneous Parallel Computing from IoT to HPC</u>, Huawei 2024 future Wireless Terminal Technology, Lund, 23/05/2024. (Invited Talk).
- D. Rossi, <u>PULP: A Heterogeneous RISC-V Platform for AI from IoT to HPC</u>, NOVCA-FCT workshop on edge computing, Lisbon, 11/07/2024. (Invited Talk).
- D. Rossi, Past, Present, Future of RISC-V: The PULP Perspective, Tristan Workshop, 11/09/2024 (Invited Keynote).

### Organizer and Chair of Special Sessions at National and International Conferences

- L. Benini, D. Rossi, Parallel Ultra-Low-Power Computing for the IoT: Applications, Platforms, Circuits, DATE 2017.
- J Nurmi, D. Rossi, C. Malossi, Approximate and Transprecision Computing Circuits and Systems, ISCAS 2018.
- D. Rossi, F. Zaruba, T. Benz, L, Bertaccini, *Open Source On-Chip Communication from Edge to Cloud: the PULP experience*, ESWEEK NoCs 2021.
- Session Chair at 2019 26th IEEE International Conference on Electronics, Circuits and Systems Conference in Genova, Italy from 27-29 November 2019, Session: Machine Learning.
- OSHEAN Open Source Hardware European Alliances and iNitiatives, in conjunction with DATE 2022, Friday, 18 March 2022.
- 3 days on RISC-V and Open-Source Hardware! Tuesday-Thursday, CICSU, Campus Pierre et Marie Curie, Paris, May 3-5, 2022.
- SSH-SoC: Safety and Security in Heterogeneous Open System-on-Chip Platforms In conjunction with Design and Automation Conference (DAC) 2023, July 9th, 2023, San Francisco, CA.
- Safety and Security in Heterogeneous Open System-on-Chip Platforms, In conjunction with Design and Automation Conference (DAC) 2024, June 23th, 2024.

# National and International Grants (As Principal Investigator)

- <u>Principal Investigator</u> (March 2016 July 2018): OPEN-NEXT Strutture software real-time e open-source per piattaforme embedded industriali di prossima generazione (POR-FESR 2014-2020, founded by Regione Emila Romagna, overall project founding € 739.250, UNIBO founding: € 213.000). OPEN NEXT aims at the deployment of multi- and many-core architectures in the fields of industrial automation applications and automotive applications. The key challenge addressed in this project is to enable effective architectural and programming model support to let parallel platforms to meet the strict real-time constraints of industrial applications while delivering significant higher performance and energy efficiency than single-core architectures traditionally employed in applications with real-time constraints. In this project I was main principal investigator and responsible to the activities related to WP2 programming models, leading a team of 2 post doc researchers.
- <u>Principal Investigator</u> (June 2018 May 2019): EverMORE Energy-efficient Variation awarE MulticORE (Financial Support to Third Parties from TETRAMAX TEchnology TRAnsfer via Multinational Application eXperiments, Grant Agreement 761349, overall project founding: € 42.000, UNIBO founding: € 37.000). EVERMORE TTX experiment aims at developing the next generation GAP-8 IoT processor from GreenWaves Technologies (<a href="https://greenwaves-principles.org/linearing-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-new-parties-

technologies.com/). Exploiting the adaptive management architecture for process and temperature compensation developed at University of Bologna, coupled to the low-voltage capabilities of 22nm FD-SOI technology, is expected to significantly improve the energy efficiency of current generation GreenWaves Technology processors, enabling new applications and opening new market opportunities. In this project I was principal investigator and responsible for all the technical and management activities related to the academic partner.

- Principal Investigator: (Oct 2019 present) WiPLASH: Architecting More Than Moore Wireless Plasticity for Heterogeneous Massive Computer Architectures (GA 863337, overall project founding: € 3 M, UNIBO founding: € 328750). The main design principles in computer architecture have shifted from a monolithic scaling-driven approach towards an emergence of heterogeneous architectures that tightly co-integrate multiple specialized computing and memory units. However, hardware specialization requires interconnection mechanisms that integrate the architecture. However, traditional wired interconnects are unable to provide the efficiency and architectural flexibility required by current and future key ICT applications. WiPLASH project aims to pioneer an on-chip wireless communication plane able to provide architectural plasticity, reconfigurability and adaptation to the application requirements with near-ASIC efficiency but without loss of generality. In this project I'm principal investigator and responsible for WP4 architecture design, leading a team of 3 Ph.D. students and research fellows.
- <u>Principal Investigator</u>: (Starting in Nov 2021) The European Pilot: Pilot using Independent Local & Open Technologies (expected overall project founding: € 30 M, expected UNIBO founding: € 842375). The European PILOT (Pilot using Independent Local & Open Technology) will be the first demonstration of two ALL European HPC and High Performance Data Analytics (HPDA) (AI, ML, DL) accelerators, designed, implemented, manufactured, and owned by Europe. The European PILOT combines open source software (SW) and open and proprietary hardware (HW) to deliver the first completely European full stack software, accelerator, and integrated ecosystem based on RISC-V accelerators coupled to any general purpose processor. In this project I will be principal investigator for UNIBO and responsible to all the activities related to the design of machine learning accelerators, leading a team of 6 Ph.D. students and research fellows.

### Research Contracts with National and International Private Entities (As Principal Investigator)

- $\underline{-Principal\ Investigator}$ : (Jan 2021 − present) Elettronica S.p.A. https://www.elt-roma.com/ (€ 25.000). The project aims at the definition of the specifications for a heterogeneous reconfigurable SoC for defense applications. In this project I am responsible for the overall contract defining the hardware and software specifications of the system on chip architecture as well as requirements for testing.
- Principal Investigator: (April 2021 present) Technology Innovation Institute https://www.tii.ae/ (€ 750'000). The goal of the project is to develop an open RISC-V-based SoC architecture and software stack for adoption on secure application processors for drone flight computer applications to innovate in processor and platform design co-optimizing security, resilience, power efficiency and real-time performance. In this project I am the coordinator of the whole the activities of the project involving other 3 Universities, principal investigator for UNIBO, as well as responsible for the definition of the specifications of the overall SoC architecture, design and implementation, leading a team of 4 Ph.D. students and research fellow.
- Principal Investigator: (Jan 2022 present) Technology Innovation Institute https://www.tii.ae/ ( $\in$  400'000). ). The goal of the project is to enhance the architecture developed in previous year project with additional security features, such as control flow integrity, IO MMU and PMP, virtualized interrupts and accelerators for anomaly detection. In this project I am the coordinator of the whole the activities of the project involving other 6 Universities, principal investigator for UNIBO, as well as responsible for the definition of the specifications of the overall SoC architecture, design and implementation, leading a team of 4 Ph.D. students and research fellow.
- Principal Investigator (Nov 2022 present): ST Microelectronics https://www.st.com (€ 100'000 / year). The project aims at developing RISC-V based processors for automotive (scalar, vector, high-performance) and interconnect, along with the related reliability solutions. In this project I'm principal investigator supervising the activities of three Ph.D. students.

# Other Activities in National and European Research Projects

- <u>Responsible for UNIBO Activities in Accelerator Stream.</u> (Nov 2018 present) European Processor Initiative (EPI-SGA1, SGA2) (overall project founding: 120 M). The European Processor Initiative (EPI) is a project currently implemented under the first stage of the Framework Partnership Agreement signed by the Consortium with the European Commission (FPA: 800928), whose aim is to design and implement a roadmap for a new family of low-power European processors for extreme scale computing, high-performance Big-Data and a range of emerging applications. In this project I'm responsible for UNIBO activities in the accelerator stream of the project designing machine learning accelerators for high performance computing, leading a team of 3 Ph.D. students and research fellows.
- <u>Part of the project technical board</u> (Jan 2017 Dec 2020). OPRECOMP (Open transPREcision COMPuting, co-funded by the European Union's H2020-EU.1.2.2. FET Proactive research and innovation programme, founding: € 4 M). OPRECOMP aims at demolishing the ultra-conservative "precise" computing abstraction and replacing it with a more flexible and efficient one, namely transprecision computing. In this project I was senior member of the project technical boards supervising all the activities related to hardware design of a low-power demonstrator for IoT applications.

- <u>Work Package Leader</u> (Jan. 2008 Oct. 2009). WP4 Platform Integration and monitoring of project "MORPHEUS" (Multi-purpose dynamically Reconfigurable Platform for intensive Heterogeneous processing, Sixth Framework Programme, overall founding: € 2.463.865). The project aims at developing a global solution based on a modular SoC platform providing the disruptive technology of embedded dynamically reconfigurable computing completed by a software (SW) oriented design flow. In this project I was responsible for the design of the silicon demonstrator of the project in WP3.
- <u>Task Leader</u> (March 2009 February 2012). Task T4.4 Design of regular architectures and circuits for high manufacturability and yield of MODERN (MOdeling and DEsign of Reliable, process variation-aware Nanoelectronic devices, circuits and systems, ENIAC-2008-1, overall founding: € 11.816.000). The objective of the MODERN project is to develop new paradigms in integrated circuit design, with the aim of enabling the manufacturing of reliable, low cost, low EMI, high-yield complex products using unreliable and variable devices. In this project I was responsible for the reconfigurable fabrics demonstrators.

## Industrial and Academic Collaborations in Chip Design Track Record

### ST Microelectronics (2008 - 2009)

The project introduced one of the first embedded systems on chip architectures exploiting heterogeneous reconfigurable computing where a general-purpose processor is accelerated by multiple flavors of specialized reconfigurable engines for embedded signal processing. This approach is now employed in several fully reconfigurable SoC such as Xilinx Zynq.

#### Fabricated Prototypes



## Morpheus

| Process Technology  | 90 nm CMOS90GP Process,<br>7-metal layers                                                                                                                                                 |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Power Supply        | 1,0V for core, 3,3 for I/0                                                                                                                                                                |
| Area                | 110 mm <sup>2</sup>                                                                                                                                                                       |
| Transistor Count    | 44M Logic<br>1,1Mbyte SRAM                                                                                                                                                                |
| Pinout              | 256, 163 I/O                                                                                                                                                                              |
| Operating Frequency | ARM, BUS, NoC: 250 MHz<br>XPP: 0 - 160 MHz<br>DREAM: 0 - 200 MHz<br>eFPGA: 0-140 MHz                                                                                                      |
| Power Consumption   | Static Power: 235 mW ARM + NoC: 600 mW @ full speed XPP: 1200 mW @ full occupation - full speed DREAM: 420 mW @ full occupation - full speed eFPGA: 112 mW @ full occupation - full speed |

#### **Publications**

- D. Rossi, F. Campi, A. Deledda, S. Spolzino and S. Pucillo, <u>A heterogeneous digital signal processor implementation for dynamically reconfigurable computing</u>, 2009 IEEE Custom Integrated Circuits Conference, 2009, pp. 641-644.
- D. Rossi, F. Campi, S. Spolzino, S. Pucillo, R. Guerrieri, <u>A Heterogeneous Digital Signal Processor for Dynamically</u> Reconfigurable Computing, JSSC IEEE Journal of Solid-State Circuits (JSSC), vol. 45, no. 8, pp. 1615-1626, Aug. 2010.

### ST Microelectronics, ETH Zurich (2013-2014)

The project explores new programmable multicore architectures that will ease the exploitation of application data-parallelism thanks to an extremely low overhead and efficient multi-core cluster architecture exploiting body-bias technique and low-voltage capabilities of STMicroelectronics 28nm FD-SOI technology. In this project, in collaboration with ETH Zurich, and which led to 3 joint tape-outs in 28nm FD-SOI and related publications, I was leading a team of 3 Ph.D. students and research fellows.

#### Fabricated Prototype



| Technology      | UTB FDSOI 28nm                              |
|-----------------|---------------------------------------------|
| Transistors     | Conventional well<br>L = 24 nm              |
| Chip area       | 3 mm <sup>2</sup>                           |
| VDD range       | 0.44V - 1.2V                                |
| BB range        | -1.8V - 0.9V                                |
| #SRAM macros    | 72 x 4 Kbit                                 |
| Gates           | 180K                                        |
| Frequency range | NO BB: 0.74 - 452 MHz<br>FBB: 1.8 - 475 MHz |
| Power range     | NO FBB: 0.1 - 119 mW<br>FBB: 0.11 - 127 mW  |

#### **Publications**

D. Rossi, A. Pullini, M. Gautschi, I. Loi; F. K. Gurkaynak, P. Flatresse, L. Benini, <u>A – 1.8V to 0.9V body bias, 60 GOPS/W</u> <u>4-core cluster in low-power 28nm UTBB FD-SOI technology</u>, in SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2015 IEEE, vol., no., pp.1-3, 5-8 Oct. 2015.

D. Rossi, A. Pullini, I. Loi, M. Gautschi, F. K. Gürkaynak, A. Bartolini, P. Flatresse, L. Benini, A 60 GOPS/W, -1.8V to 0.9V Body Bias ULP Cluster in 28nm UTBB FD-SOI technology", Elsevier Journal of Solid State Electronics, 2016.

## ST Microelectronics, ETH Zurich, CEA, EPFL (2014-2016)

Project aimed at the development of a near-threshold parallel architecture exploiting low-power IPs such as low-power processors, memories, power management IPs, and dynamic reconfiguration techniques to break the pj/OP wall in next generation computing architectures for IoT applications. In this project, which lead to a joint tape-out in 28nm FD-SOI and related publications I was leading a team of 5 people from CEA, EPFL, ETHZ and UNIBO.

# Fabricated Prototypes

### PULP2



### **Publications**

D. Rossi, A. Pullini, I. Loi, M. Gautschi, F. K. Gurkaynak, A. Teman, J. Constantin, A. Burg, I. M. Panades, E. Beignè, F. Clermidy, F. Abouzeid, P. Flatresse, L. Benini, 193 MOPS/mW @ 162 MOPS, 0.32V to 1.15V Voltage Range Multi-Core Accelerator for Energy-Efficient Parallel and Sequential Digital Processing, Cool Chips, 2016.

D. Rossi, A. Pullini, I. Loi, M. Gautschi, F. K. Gürkaynak, A. Teman, J. Constantin, A. Burg, I. Miro-Panades, E. Beignè, F. Clermidy, P. Flatresse, L. Benini, <u>Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster</u>, in IEEE Micro, vol. 37, no. 5, pp. 20-31, September/October 2017.

#### STMicroelectronics, SOITEC, ETHZ, EPFL (2015 - 2018)

Process and Temperature Compensation with bosy-biasing in 28nm FD-SOI. This project, in collaboration with ETH Zurich, aimed at building a demonstrator of a system with in-the loop process and temperature compensation exploiting the body bias capabilities of 28nm FD-SOI technology. In this project I was supervising the activities of a Ph.D. student in ETH Zurich.

### Fabricated Prototypes

#### PULP3



#### **Publications**

D. Rossi, A. Pullini, C. Muller, I. Loi, F. Conti, A. Burg, P. Flatresse, L. Benini, <u>A Self-Aware Architecture for PVT Compensation and Power Nap in Near Threshold Processors</u>, in IEEE Design & Test, vol. 34, no. 6, pp. 46-53, Dec. 2017.

A. Di Mauro, D. Rossi, A. Pullini, P. Flatresse and L. Benini, <u>Temperature and process-aware performance monitoring and compensation for an ULP multi-core cluster in 28nm UTBB FD-SOI technology</u>, 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), Thessaloniki, Greece, 2017, pp. 1-8.

A. Di Mauro, D. Rossi, A. Pullini, P. Flatresse and L. Benini, <u>Live Demonstration: Body-Bias Based Performance Monitoring and Compensation for a Near-Threshold Multi-Core Cluster in 28nm FD-SOI Technology</u>, 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 2018, pp. 1-1.

## NXP (2013-2017)

Project exploring variation-aware multi-core architectures exploiting system monitors (temperature monitors, timing monitors, power monitors) for applying run-time management techniques to achieve the user/application goals in terms of Quality of Service, energy consumption and results reliability/accuracy in next generation low-power microcontrollers. In this project, which led to a joint patent, I was supervising one research fellow.

#### **Publications**

A. Gomez, C. Pinto, A. Bartolini, D. Rossi, H. Fatemi, J. Pineda de Gyvez, and L. Benini, <u>Reducing Energy Consumption in Microcontroller-based Platforms with Low Design Margin Co-Processors</u>, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015.

A. Gomez, A. Bartolini, D. Rossi, B. Can Kara, H. Fatemi, J. P. de Gyvez, L. Benini, <u>Increasing the Energy Efficiency of Microcontroller Platforms with Low-Design Margin Co-Processors</u>, Microprocessors and Microsystems, Available online 24 May 2017, ISSN 0141-9331, <a href="https://doi.org/10.1016/j.micpro.2017.05.012">https://doi.org/10.1016/j.micpro.2017.05.012</a>.

#### **Patents**

<u>Event-Based Power Manager</u>, published on 2019-05-16 to USPTO (United States Patent and Trademark Office: https://uspto.report/patent/app/20190146566).

## ETHZ (2016)

Developing the first Parallel Ultra Low Power (PULP) SoC featuring a deep neural network accelerator. In this project I was co-supervising a team of 3 Ph.D. student in ETH Zurich and I was responsible for the design of the architecture specifications and the physical implementation.

#### Mia Wallace



3.9 mm

#### **Publications**

A. Pullini, F. Conti, D. Rossi, I. Loi, M. Gautschi and L. Benini, <u>A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing</u>, in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 8, pp. 1094-1098, Aug. 2018, doi: 10.1109/TCSII.2017.2652982.

# ETHZ, ST Microelectrinics (2016)

This project focuses on the development of a wide-bandwidth lossless current sensor system-on-chip (SoC) designed for applications such as current monitoring in DC-DC converters and non-invasive load monitoring. The core of the system is a broadband CMOS Hall sensor, which provides a low-cost and easily integrable solution for current sensing in mixed-signal SoCs, extended with a multi-mode digital compressive sensing encoder to reduce the data rate. In this project I was supervising a Ph.D. student responsible for the digital part of the SoC.

### Fabricated Prototypes

## **Current Sensor SoC with Digital Compressed Sensing**



# Publications

M. Crescentini et al., "A 2 MS/s 10A Hall current sensor SoC with digital compressive sensing encoder in 0.16 µm BCD," ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, Lausanne, Switzerland, 2016, pp. 393-396, doi: 10.1109/ESSCIRC.2016.7598324.

This project developed a Parallel Ultra Low Power SoC combining workload of analytics and encryption in a tight power envelope, called Fulmine. The SoC based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supports software programmability for regular computing tasks.

### Fabricated Prototypes

### Fulmine



#### **Publications**

F. Conti, R. Schilling, P. D. Schiavone, A. Pullini, D. Rossi, F. K. Gürkaynak, M. Muehlberghuber, M. Gautschi, I. Loi, G. Haugou, S. Mangard, L. Benini, "An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics", in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 9, pp. 2481-2494, Sept. 2017.

# ETHZ (2017 - 2018)

This project aims at developing an ultra-low-power, fully programmable IoT system-on-chip (SoC) capable of executing Binary Neural Networks (BNNs) at ultra-low voltages. By utilizing a hybrid memory scheme, it combines error-prone SRAMs with reliable standard-cell memories to ensure data integrity under aggressive voltage scaling. Implemented in 22nm FDX technology, the SoC operates at 0.5V without accuracy loss on a BNN trained for the CIFAR-10 dataset, achieving a 2.2X improvement in energy efficiency.

### **Fabricated Prototypes**

### Quentin



#### **Publications**

P. D. Schiavone, D. Rossi, A. Pullini, A. Di Mauro, F. Conti and L. Benini, "Quentin: an Ultra-Low-Power PULPissimo SoC in 22nm FDX," 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), Burlingame, CA, USA, 2018, pp. 1-3, doi: 10.1109/S3S.2018.8640145.

A. D. Mauro, F. Conti, P. D. Schiavone, D. Rossi and L. Benini, "Always-On 674µ W@4GOP/s Error Resilient Binary Neural Networks With Aggressive SRAM Voltage Scaling on a 22-nm IoT End-Node," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 11, pp. 3905-3918, Nov. 2020, doi: 10.1109/TCSI.2020.3012576.

### ETHZ (2017 - 2018)

This project aims at developing a Binary-Weight Neural Network (BWN) accelerator optimized for ultra-low-power IoT devices. It introduces a novel binary-weight streaming approach to reduce I/O bandwidth while using a systolic-scalable architecture to handle high-resolution images.

**Hyperdrive** 

## Fabricated Prototypes





| Operating Point [V]          | 0.65  | 0.8   | 0.9   |
|------------------------------|-------|-------|-------|
| Op. Frequency [MHz]          | 160   | 282   | 344   |
| Power [mW]                   | 25.9  | 108.6 | 171   |
| Throughput [Op/cycle]        | 1568  | 1568  | 1568  |
| Throughput [GOp/s]           | 250.9 | 442.2 | 539.4 |
| Energy Eff. [TOp/s/W]        | 6.1   | 4.1   | 3.2   |
| Core Area [mm <sup>2</sup> ] | 1.92  | 1.92  | 1.92  |
| Memory [Mbit]                | 6.4   | 6.4   | 6.4   |

R. Andri, L. Cavigelli, D. Rossi and L. Benini, "Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes," 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Hong Kong, China, 2018, pp. 509-515, doi: 10.1109/ISVLSI.2018.00099.

R. Andri, L. Cavigelli, D. Rossi and L. Benini, "Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 309-322, June 2019, doi: 10.1109/JETCAS.2019.2905654.

## **QuickLogic, ETHZ (2017 - 2019)**

Heterogeneous SoC with embedded FPGA. This project, in collaboration also with ETH Zurich, aimed at developing a demonstrator of a heterogeneous SoC integrating an IoT processor with an embedded FPGA from the industrial party. In this project, which led to a tape-out in 22nm FD-SOI technology and a joint TVLSI publication I was supervising a Ph.D. student in ETH Zurich responsible for the design of the architecture and the physical implementation.

### Fabricated Prototypes

#### Arnold



TABLE III

AREA DISTRIBUTION OF THE MAIN COMPONENTS OF ARNOLD

| Module          | Area [µm²] | Percentage |
|-----------------|------------|------------|
| CPU             | 27′186     | 0.54%      |
| Main Memory     | 734'232    | 14.46%     |
| I/O DMA         | 21′755     | 0.43%      |
| eFPGA subsystem | 63′946     | 1.26%      |
| PAD Frame       | 229'519    | 4.52%      |
| eFPGA Macro     | 4'000'000  | 78.79%     |

#### **Publications**

P. D. Schiavone et al., "Arnold: An eFPGA-Augmented RISC-V SoC for Flexible and Low-Power IoT End Nodes," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 4, pp. 677-690, April 2021, doi: 10.1109/TVLSI.2021.3058162.

### Dolphin Integration, ETHZ (2017 - 2019)

Power-performance scalable processor for IoT. This project, in collaboration with ETH Zurich, aimed at the design of a power-performance scalable processor for IoT applications demonstrating the low-power capabilities of the analog IPs provided by the industrial partner at system level. In this project I was leading a team of 3 Ph.D. students and research fellows, and led to a joint ESSCIRC publication in 2018 and JSSC publication in 2019.

### Fabricated Prototypes



### Mr. Wolf

# TABLE III Mr.Wolf SoC Features

 Technology
 CMOS 40nm LP

 Chip Area
 10mm²

 Memory Transistors
 576 kB

 Equivalent Gates (NAND2)
 1.8 Mgates

 Voltage Range
 0.8 V - 1.1 V

 Frequency Range
 32 kHz - 450 MHz

 Power Range
 72 μW - 153mW

### **Publications**

A. Pullini, D. Rossi, I. Loi, A. Di Mauro, L. Benini, <u>Mr.Wolf: a 1 GFLOP/S Energy-Proportional Parallel Ultra Low Power SoC for IoT Edge Processing</u>, ESSCIRC 2018.

A. Pullini, D. Rossi, I. Loi, G. Tagliavini and L. Benini, "Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing," in IEEE Journal of Solid-State Circuits, vol. 54, no. 7, pp. 1970-1981, July 2019, doi: 10.1109/JSSC.2019.2912307.

# GreenWaves Technologies, ETHZ (2016-2021)

Technology transfer (third mission) activity related to the deployment of the PULP open-source platform to into a commercial system-of-chip (GAP-8) for volume production, and its revision (GAP-9). The main applications targeted by the design are near sensor processing, software defined modem for wireless communication and low power wireless communication for IoT applications (<a href="http://greenwaves-technologies.com">http://greenwaves-technologies.com</a>). In this project I supported technical activities related to the development of the product of the company, and led to a joint ISSCC publication in 2021.

# Fabricated Prototypes

GAP8



| CMOS 55nm LP                                       |
|----------------------------------------------------|
| 10 mm²                                             |
| 4608 Kbit<br>(4096kbit state-retentive)            |
| 2 Mgates                                           |
| 1 DC/DC, 1 LDO, 2 FLLs,<br>embedded power switches |
| 0.8V – 1.2V                                        |
| 32kHz – 250 MHz                                    |
| 3.6 μW – 75mW                                      |
|                                                    |

### **VEGA**



# TABLE III VEGA SoC FEATURES

| Technology               | CMOS 22nm FD-SOI               |
|--------------------------|--------------------------------|
| Chip Area                | $12$ mm $^2$                   |
| SRAM Memory              | 1728 kB                        |
| MRAM Memory              | 4 MB                           |
| Equivalent Gates (NAND2) | 1.8 Mgates                     |
| Voltage Range            | 0.6  V - 0.8  V                |
| Frequency Range          | 32 kHz – 450 MHz               |
| Power Range              | $1.2 \ \mu W - 49.4 \text{mW}$ |

## **Publications**

E. Flamand, D. Rossi, F. Conti, A. Pullini, I. Loi, F. Rotenberg and L. Benini, <u>GAP8: A RISC-V SoC for AI at the Edge of the IoT</u>, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2018.

D. Rossi et al., <u>4.4 A 1.3TOPS/W</u> @ <u>32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode</u>, 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 60-62, doi: 10.1109/ISSCC42613.2021.9365939.

D. Rossi et al., <u>Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode</u>, in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2021.3114881.

### ETHZ (2020-2021)

The project involves the development of an ultra-low-power System-on-Chip (SoC) for nano-sized UAVs, enabling onboard autonomy for complex tasks such as object detection and navigation. Fabricated in 22 nm FDX technology, it integrates event-based and frame-based visual sensors with three accelerators for efficient processing. The SoC achieves sub-microjoule SNN inference, supported by a high-performance RISC-V cluster and TNN accelerator, reaching up to 1036 TOp/s/W. My contribution includes mentoring three Ph.D. students who were integral to the chip's design.

### Fabricated Prototypes



| Technology             | GF 22 nm FDX      |
|------------------------|-------------------|
| Chip area              | $9\mathrm{mm}^2$  |
| L2 memory (SRAM)       | 1 MiB             |
| L1 Memory (SRAM)       | 128 KiB           |
| VDD Range              | 0.5 V-0.8 V       |
| Cluster Max. Frequency | $330\mathrm{MHz}$ |
| EHWPE Max. Frequency   | $330\mathrm{MHz}$ |
| FC Max. Frequency      | $330\mathrm{MHz}$ |
| Power Range            | 2 mW-300 mW       |
|                        |                   |

### **Publications**

A. Di Mauro, M. Scherer, D. Rossi and L. Benini, "Kraken: A Direct Event/Frame-Based Multi-sensor Fusion SoC for Ultra-Efficient Visual Processing in Nano-UAVs," 2022 IEEE Hot Chips 34 Symposium (HCS), Cupertino, CA, USA, 2022, pp. 1-19, doi: 10.1109/HCS55958.2022.9895621.

## ETHZ (2022)

This project focuses on developing a fully programmable compute cluster optimized for running deep learning algorithms on resource-constrained, battery-powered edge devices. It introduces the concept of Vector Lockstep Execution Mode (VLEM) aiming at improving energy efficiency in single instruction multiple data (SIMD) portions of code. I was leading this project supervising 3 Ph.D. students.

### Fabricated Prototypes



### **Publications**

G. Ottavi *et al.*, "Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 70, no. 6, pp. 2450-2463, June 2023, doi: 10.1109/TCSI.2023.3254810.

# Dolphin Integration, ETHZ (2019 - 2023)

AI capable edge processor. This project, in collaboration with ETH Zurich, aims at developing a low-power digital signal processor for embedded video and audio applications featuring capabilities of running embedded machine learning and artificial intelligence workloads, in this project I'm leading a team of 5 Ph.D. Students and research fellows and led to a joint ISSCC publication in 2023 and JSSC publication in 2024.

### Fabricated Prototypes

#### **MARSELLUS**



| Technology       | Area                                      | SoC + Clus SRAM   | Vdd range         |
|------------------|-------------------------------------------|-------------------|-------------------|
| GF 22FDX         | 18.7 mm² (full chip)<br>1.9 mm² (cluster) | 1152 KiB          | 0.5V - 0.8V       |
| Active Power     | SW 8b Perf/Eff                            | RBE 4x4b Perf/Eff | RBE 2x2b Perf/Eff |
| Range            | @0.8V                                     | @0.8V             | @0.8V             |
| 12.8 mW – 123 mW | 21 GOPS                                   | 373 GOPS          | 569 GOPS          |
|                  | 180 GOPS/W                                | 3.2 TOPS/W        | 5.4 TOPS/W        |

### **Publications**

F. Conti et al., "22.1 A 12.4TOPS/W @ 136GOPS AI-IoT System-on-Chip with 16 RISC-V, 2-to-8b Precision-Scalable DNN Acceleration and 30%-Boost Adaptive Body Biasing," 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2023, pp. 21-23, doi: 10.1109/ISSCC42615.2023.10067643.

F. Conti et al., "Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC With 2–8 b DNN Acceleration and 30%-Boost Adaptive Body Biasing," in IEEE Journal of Solid-State Circuits, vol. 59, no. 1, pp. 128-142, Jan. 2024, doi: 10.1109/JSSC.2023.3318301.

### Meta, ETHZ (2020 – present)

Low-power processor for augmented reality. This project, in collaboration with ETH Zurich, aims at developing an embedded processor for augmented reality applications, featuring embedded non-volatile memory and capable of running both traditional linear algebra algorithms and artificial intelligence workloads with unprecedent performance and energy efficiency. In this project I'm co-leading a team of 3 Ph.D. students. A tape-out in advanced technology nodes has been achieved in December 2021, leading to a one ESSCIRC and one JSSC publication.

# Fabricated Prototypes

### **SIRACUSA**



#### **Publications**

M. Scherer et al., "Siracusa: A Low-Power On-Sensor RISC-V SoC for Extended Reality Visual Processing in 16nm CMOS," ESSCIRC 2023- IEEE 49th European Solid State Circuits Conference (ESSCIRC), Lisbon, Portugal, 2023, pp. 217-220, doi: 10.1109/ESSCIRC59616.2023.10268718.

A. S. Prasad et al., "Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality With At-MRAM Neural Engine," in IEEE Journal of Solid-State Circuits, vol. 59, no. 7, pp. 2055-2069, July 2024, doi: 10.1109/JSSC.2024.3385987.

The student project focused on developing a cutting-edge System-on-Chip (SoC) designed for efficient deep neural network (DNN) inference and training at the Extreme-Edge (TinyML). Addressing critical demands such as low latency, high throughput, accuracy, and flexibility, employing a heterogeneous cluster of eight RISC-V cores with mixed-precision integer arithmetic (2-bit to 32-bit). To enhance performance and energy efficiency on compute-intensive DNN tasks, the cluster is integrated with three specialized accelerators: a high-throughput engine for depthwise convolution, an efficient data mover for flexible data handling, and a 16-bit floating-point tensor product engine for matrix-multiplication acceleration. This architecture is tailored to optimize DNN processing in resource-constrained environments. In this project I was supervising a team of 4 master students and 3 Ph.D. students responsible for the chip tape out.

#### Fabricated Prototypes

### **DARKSIDE**



## **Publications**

A. Garofalo *et al.*, "Darkside: 2.6GFLOPS, 8.7mW Heterogeneous RISC-V Cluster for Extreme-Edge On-Chip DNN Inference and Training," *ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)*, Milan, Italy, 2022, pp. 273-276, doi: 10.1109/ESSCIRC55480.2022.9911384.

A. Garofalo *et al.*, "DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training," in *IEEE Open Journal of the Solid-State Circuits Society*, vol. 2, pp. 231-243, 2022, doi: 10.1109/OJSSCS.2022.3210082.

### ETHZ (2021)

The project focuses on developing a cutting-edge System-On-a-Chip (SoC) tailored for the next generation of ultra-low-power and high-performance Internet of Things (IoT) end-nodes. Designed to meet the demanding requirements of complex near-sensor data analytics in domains such as audio processing, radar systems, and Structural Health Monitoring, Echoes enables highly efficient computations in the frequency domain. In this project I was supervising the master and Ph.D. students responsible for the tape out.

### Fabricated Prototypes



## **Publications**

M. Sinigaglia *et al.*, "ECHOES: a 200 GOPS/W Frequency Domain SoC with FFT Processor and I2S DSP for Flexible Data Acquisition from Microphone Arrays," *2023 IEEE International Symposium on Circuits and Systems (ISCAS)*, Monterey, CA, USA, 2023, pp. 1-5, doi: 10.1109/ISCAS46773.2023.10181862.

# TII, ZERO-DAY, Khalifa University, NYUAD (2021 - present)

The project aims to develop a highly energy-efficient and compact SoC to enable advanced computational capabilities and enhanced security in nano-sized UAVs. By addressing power, memory, processing, and security constraints, it supports real-time machine learning, digital signal processing, and autonomous functions. Additionally, the design incorporates virtualization and timing channel protection to ensure secure operations within resource-constrained environments. The goal is to create a versatile platform that combines high-performance computing, low power consumption, and robust security for the next generation of autonomous aerial systems. In this project, I coordinated the activities of all involved entities, managing a team of over 15 Ph.D. students and researchers from various partner organizations.

### Fabricated Prototypes

## **SHAHEEN**



### **Publications**

- L. Valente *et al.*, "Shaheen: An Open, Secure, and Scalable RV64 SoC for Autonomous Nano-UAVs," 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, CA, USA, 2023, pp. 1-12, doi: 10.1109/HCS59251.2023.10254698.
- L. Valente et al., "A Heterogeneous RISC-V Based SoC for Secure Nano-UAV Navigation," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 71, no. 5, pp. 2266-2279, May 2024, doi: 10.1109/TCSI.2024.3359044.

Chiplet-based architecture for high-performance computing: This project, in collaboration with ETH Zurich and various industrial partners, aimed to develop a demonstrator using Global Foundries' 12nm technology for high-performance computing, with a focus on sparse linear algebra. The design included 2x432 RISC-V core chiplets on a 12nm node and two high-bandwidth memories integrated on a 65nm passive interposer. In this project, I provided technical management and guidance to over 20 Ph.D. students. A tape-out on advanced technology nodes was achieved in September 2023, resulting in one ESSCIRC and one JSSC publication.

### Fabricated Prototypes





#### **Publications**

G. Paulin et al., "Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-Based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET," 2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, 2024, pp. 1-2, doi: 10.1109/VLSITechnologyandCir46783.2024.10631529.

#### **Patents**

- <u>Event-Based Power Manager</u>, published on 2019-05-16 to USPTO (United States Patent and Trademark Office: https://uspto.report/patent/app/20190146566). The patent is in collaboration with NXP. The patent is about power management policies in ultra-low-power architectures performing digital signal processing. In particular it envision the adaptation of functional characteristics of microcontrollers (e.g. voltage supply, operating frequency), on the basis of information collected from performance counters available within the architecture, easing the power management to the enduser making them pro-active, hence more effective.
- <u>Temporal Lockstep</u> 23-CACO-1204US01: LF Ref 072878/607266. The patent is in collaboration with ST Microelectronics. The patent is about a low-cost fault mitigation technique targeted at SETs called Temporal Lockstep (TL), which combines temporal redundancy and minimal spatial repetition to reduce the area overhead with respect to state-of-the-art solutions.

### **Awards**

- 2019 IEEE TCAD Donald O. Pederson Best Paper Award: R. Andri, L. Cavigelli, D. Rossi and L. Benini, "YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 48-60, Jan. 2018.
- 2019 ISLPED Design Contest 2<sup>nd</sup> prize award: Daniele Palossi, Francesco Conti, Davide Rossi, Luca Benini, "PULP-DroNet: Open Source and Open Hardware Artificial Intelligence for Fully Autonomous Navigation on Nano-UAVs", ISLPED 2019: ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED 2019).
- I4MS-SAE label received by EVErMORE poject, recognising its excellent implementation, high potential for further deployment and innovative aspect.
- 2020 IEEE Transactions on Circuits and Systems Darlington Best Paper Award: F. Conti, R. Schilling, P. D. Schiavone, A. Pullini, D. Rossi, F. K. Gürkaynak, M. Muehlberghuber, M. Gautschi, I. Loi, G. Haugou, S. Mangard, L. Benini, "An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics", in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 9, pp. 2481-2494, Sept. 2017.
- 2020 IEEE Transactions on Very Large Scale Integration Systems Prize Paper Award: M. Gautschi, P. D. Schiavone, A. Traber, I. Loi, A. Pullini, D. Rossi, E. Flamand, F. K. Gurkaynak, L. Benini, "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct. 2017.

- 2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Best Paper Award: A. Nadalini et al., "A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks," 2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Foz do Iguacu, Brazil, 2023, pp. 1-6, doi: 10.1109/ISVLSI59464.2023.10238679.

#### **Professional Services**

- Member of the editorial board of Elsevier Microelectronics Journal.
- Member of the technical program committee in a number of international conferences and symposia (DATE, DSD, ISCAS, VLSI-SOC, MicDAT, MWCAS).
- Publicity chair at ISLPED 2019.
- Reviewer for a number of international journals (IEE TCAS-I, IEEE TCAS-II, IEEE TCAD, IEEE TCSVT, IEEE TECS, IEEE TETC, JLPEA, Microprocessors and Microsystems, TACO).
- Organizer of the main international conference on Open-Source hardware in 2016: ORCONF2016, Oct 7-9 2016, Bologna (orconf.org).
- Member of FOSSI foundation "The Free and Open Source Silicon Foundation" promoting and assisting free and open digital hardware designs and their related ecosystems (<u>fossi-foundation.org</u>).

## **List of Publications**

# **Journal Papers**

- Claudio Brunelli, Fabio Campi, Claudio Mucci, Davide Rossi, Tapani Ahonen, Juha Kylliäinen, Fabio Garzia, Jari Nurmi, Design space exploration of an open-source, IP-reusable, scalable floating-point engine for embedded applications, Journal of Systems Architecture, Volume 54, Issue 12, 2008, Pages 1143-1154, ISSN 1383-7621, https://doi.org/10.1016/j.sysarc.2008.05.005.
- 2. D. Rossi, F. Campi, S. Spolzino, S. Pucillo and R. Guerrieri, "A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing," in IEEE Journal of Solid-State Circuits, vol. 45, no. 8, pp. 1615-1626, Aug. 2010, doi: 10.1109/JSSC.2010.2048149.
- 3. Claudio Brunelli, Fabio Garzia, Davide Rossi, and Jari Nurmi. 2010. A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations. J. Syst. Archit. 56, 1 (January, 2010), 38–47. DOI:https://doi.org/10.1016/j.sysarc.2009.11.003.
- 4. Grasset, P. Millet, P. Bonnot, S. Yehia, W. Putzke-Roeming, F. Campi, A. Rosti, M. Huebner, N. S. Voros, D. Rossi, "The MORPHEUS Heterogeneous Dynamically Reconfigurable Platform", Int J Parallel Prog 39, 328–356 (2011). https://doi.org/10.1007/s10766-010-0160-3.
- 5. D. Rossi, C. Mucci, F. Campi, S. Spolzino, L. Vanzolini, H. Sahlbach, S. Whitty, R. Ernst, W. Putzke-Röming, and R. Guerrieri, "Application Space Exploration of a Heterogeneous Run-Time Configurable Digital Signal Processor," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 2, pp. 193-205, Feb. 2013, doi: 10.1109/TVLSI.2012.2185963.
- 6. D. Rossi, C. Mucci, M. Pizzotti, L. Perugini, R. Canegallo and R. Guerrieri, "Multicore Signal Processing Platform With Heterogeneous Configurable Hardware Accelerators," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 9, pp. 1990-2003, Sept. 2014, doi: 10.1109/TVLSI.2013.2280295.
- 7. E. Azarkhish, D. Rossi, I. Loi and L. Benini, "A Modular Shared L2 Memory Design for 3-D Integration," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 8, pp. 1485-1498, Aug. 2015, doi: 10.1109/TVLSI.2014.2340013.
- 8. Conti, F., Rossi, D., Pullini, A. et al. PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision. J Sign Process Syst 84, 339–354 (2016). https://doi.org/10.1007/s11265-015-1070-9.
- 9. Davide Rossi, Antonio Pullini, Igor Loi, Michael Gautschi, Frank K. Grkaynak, Andrea Bartolini, Philippe Flatresse, Luca Benini, "A 60 GOPS/W, -1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology", Solid-State Electronics, Volume 117, March 2016, Pages 170-184, ISSN 0038-1101, http://dx.doi.org/10.1016/j.sse.2015.11.015.

- 10. Adam Teman, Davide Rossi, Pascal Meinerzhagen, Luca Benini, and Andreas Burg. 2016. Power, Area, and Performance Optimization of Standard Cell Memory Arrays Through Controlled Placement. ACM Trans. Des. Autom. Electron. Syst. 21, 4, Article 59 (September 2016), 25 pages. DOI:https://doi.org/10.1145/2890498.
- 11. M. Rusci, D. Rossi, M. Lecca, M. Gottardi, E. Farella and L. Benini, "An Event-Driven Ultra-Low-Power Smart Visual Sensor," in IEEE Sensors Journal, vol. 16, no. 13, pp. 5344-5353, July1, 2016, doi: 10.1109/JSEN.2016.2556421.
- 12. E. Azarkhish, C. Pfister, D. Rossi, I. Loi and L. Benini, "Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 1, pp. 210-223, Jan. 2017, doi: 10.1109/TVLSI.2016.2570283.
- 13. M. Gautschi et al., "Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2700-2713, Oct. 2017, doi: 10.1109/TVLSI.2017.2654506.
- 14. Pullini, F. Conti, D. Rossi, I. Loi, M. Gautschi and L. Benini, "A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 8, pp. 1094-1098, Aug. 2018, doi: 10.1109/TCSII.2017.2652982.
- F. Conti, R. Schilling, P. D. Schiavone, A. Pullini, D. Rossi, F. K. Gürkaynak, M. Muehlberghuber, M. Gautschi, I. Loi, G. Haugou, S. Mangard, L. Benini, "An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 9, pp. 2481-2494, Sept. 2017, doi: 10.1109/TCSI.2017.2698019.
- 16. M. Rusci, D. Rossi, E. Farella, L. Benini, "A Sub-mW IoT-Endnode for Always-On Visual Monitoring and Smart Triggering," in IEEE Internet of Things Journal, vol. 4, no. 5, pp. 1284-1295, Oct. 2017, doi: 10.1109/JIOT.2017.2731301.
- 17. Gomez, A. Bartolini, D. Rossi, B. Can Kara, H. Fatemi, J. P. de Gyvez, L. Benini, "Increasing the Energy Efficiency of Microcontroller Platforms with Low-Design Margin Co-Processors", Microprocessors and Microsystems, Available online 24 May 2017, ISSN 0141-9331, https://doi.org/10.1016/j.micpro.2017.05.012.
- 18. F. Montagna, S. Benatti, D. Rossi, "Flexible, Scalable and Energy Efficient Bio-Signals Processing on the PULP Platform: A Case Study on Seizure Detection", Journal of Low Power Electronics and Applications, Vol. 7, Num. 2, Art. Num. 16, 2017, DOI: http://dx.doi.org/10.3390/jlpea7020016.
- 19. F. Montagna, M. Buiatti, S. Benatti, D. Rossi, E. Farella, L. Benini, "A Machine Learning Approach for Automated Wide-Range Frequency Tagging Analysis in Embedded Neuromonitoring Systems", Methods, Volume 129, 2017, Pages 96-107, DOI: https://doi.org/10.1016/j.ymeth.2017.06.019.
- D. Rossi, A. Pullini, I. Loi, M. Gautschi, F. K. Gürkaynak, A. Teman, J. Constantin, A. Burg, I. Miro-Panades, E. Beignè, F. Clermidy, P. Flatresse, L. Benini, "Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster," in IEEE Micro, vol. 37, no. 5, pp. 20-31, September/October 2017, doi: 10.1109/MM.2017.3711645.
- 21. D. Rossi, A. Pullini, C. Muller, I. Loi, F. Conti, A. Burg, P. Flatresse, L. Benini, "A Self-Aware Architecture for PVT Compensation and Power Nap in Near Threshold Processors," in IEEE Design & Test, vol. 34, no. 6, pp. 46-53, Dec. 2017, doi: 10.1109/MDAT.2017.2750907.
- 22. G. Tagliavini, D. Rossi, A. Marongiu, L. Benini, "Synergistic HW/SW Approximation Techniques for Ultra-Low-Power Parallel Computing", in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 5, pp. 982-995, May 2018, doi: 10.1109/TCAD.2016.2633474.
- 23. R. Andri, L. Cavigelli, D. Rossi and L. Benini, "YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 48-60, Jan. 2018, doi: 10.1109/TCAD.2017.2682138.
- 24. E. Azarkhish, D. Rossi, I. Loi and L. Benini, "Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes," in IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 2, pp. 420-434, Feb. 1 2018, doi: 10.1109/TPDS.2017.2752706.

- 25. Victor Javier Kartsch, Simone Benatti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini, "A sensor fusion approach for drowsiness detection in wearable ultra-low-power systems", Information Fusion, Volume 43, 2018, Pages 66-76, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2017.11.005.
- 26. Loi, A. Capotondi, D. Rossi, A. Marongiu and L. Benini, "The Quest for Energy-Efficient I\$ Design in Ultra-Low-Power Clustered Many-Cores," in IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 2, pp. 99-112, 1 April-June 2018, doi: 10.1109/TMSCS.2017.2769046.
- 27. Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, and Luca Benini. 2018. NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs. ACM Trans. Reconfigurable Technol. Syst. 11, 3, Article 18 (December 2018), 24 pages. DOI:https://doi.org/10.1145/3284357.
- 28. Sajjad Nouri, Davide Rossi, Jari Nurmi, Power mitigation of a heterogeneous multicore architecture on FPGA/ASIC by DFS/DVFS techniques, Microprocessors and Microsystems, Volume 63, 2018, Pages 259-268, ISSN 0141-9331, https://doi.org/10.1016/j.micpro.2018.09.010.
- 29. S. Das, K. J. M. Martin, D. Rossi, P. Coussy and L. Benini, "An Energy-Efficient Integrated Programmable Array Accelerator and Compilation Flow for Near-Sensor Ultralow Power Processing," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 6, pp. 1095-1108, June 2019, doi: 10.1109/TCAD.2018.2834397.
- 30. R. Andri, L. Cavigelli, D. Rossi and L. Benini, "Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 309-322, June 2019, doi: 10.1109/JETCAS.2019.2905654.
- 31. Pullini, D. Rossi, I. Loi, G. Tagliavini and L. Benini, "Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing," in IEEE Journal of Solid-State Circuits, vol. 54, no. 7, pp. 1970-1981, July 2019, doi: 10.1109/JSSC.2019.2912307.
- 32. S. Benatti, F. Montagna, V. Kartsch, A. Rahimi, D. Rossi and L. Benini, "Online Learning and Classification of EMG-Based Gestures on a Parallel Ultra-Low Power Platform Using Hyperdimensional Computing," in IEEE Transactions on Biomedical Circuits and Systems, vol. 13, no. 3, pp. 516-528, June 2019, doi: 10.1109/TBCAS.2019.2914476.
- 33. V. Kartsch, G. Tagliavini, M. Guermandi, S. Benatti, D. Rossi and L. Benini, "BioWolf: A Sub-10-mW 8-Channel Advanced Brain–Computer Interface Platform With a Nine-Core Processor and BLE Connectivity," in *IEEE Transactions on Biomedical Circuits and Systems*, vol. 13, no. 5, pp. 893-906, Oct. 2019, doi: 10.1109/TBCAS.2019.2927551.
- 34. F. Renzini, C. Mucci, D. Rossi, E. F. Scarselli and R. Canegallo, "A Fully Programmable eFPGA-Augmented SoC for Smart Power Applications," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 2, pp. 489-501, Feb. 2020, doi: 10.1109/TCSI.2019.2930412.
- 35. B. W. Denkinger et al., "Impact of Memory Voltage Scaling on Accuracy and Resilience of Deep Learning Based Edge Devices," in IEEE Design & Test, vol. 37, no. 2, pp. 84-92, April 2020, doi: 10.1109/MDAT.2019.2947282.
- 36. Garofalo Angelo, Rusci Manuele, Conti Francesco, Rossi Davide and Benini Luca "PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors", Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2019, http://doi.org/10.1098/rsta.2019.0155.
- 37. P. Meloni, D. Loi, G. Deriu, M. Carreras, F. Conti, A. Capotondi and D. Rossi, "Exploring NEURAghe: A Customizable Template for APSoC-Based CNN Inference at the Edge," in IEEE Embedded Systems Letters, vol. 12, no. 2, pp. 62-65, June 2020, doi: 10.1109/LES.2019.2947312.
- 38. H. Zolfaghari, D. Rossi, J. Nurmi, "A custom processor for protocol-independent packet parsing", Microprocessors and Microsystems, Volume 72, 2020, 102910, ISSN 0141-9331, https://doi.org/10.1016/j.micpro.2019.102910.
- 39. Alfio Di Mauro, Davide Rossi, Antonio Pullini, Philippe Flatresse, and Luca Benini. 2020. Performance-aware predictive-model-based on-chip body-bias regulation strategy for an ULP multi-core cluster in 28 nm UTBB FD-SOI. Integr. VLSI J. 72, C (May 2020), 194–207. DOI:https://doi.org/10.1016/j.vlsi.2019.12.006.

- 40. H. Zolfaghari, D. Rossi, W. Cerroni, H. Okuhara, C. Raffaelli and J. Nurmi, "Flexible Software-defined Packet Processing using Low-area Hardware," in IEEE Access, doi: 10.1109/ACCESS.2020.2996660.
- 41. E. De Giovanni et al., "Modular Design and Optimization of Biomedical Applications for Ultralow Power Heterogeneous Platforms," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 11, pp. 3821-3832, Nov. 2020, doi: 10.1109/TCAD.2020.3012652.
- 42. D. Mauro, F. Conti, P. D. Schiavone, D. Rossi and L. Benini, "Always-On 674μ W@4GOP/s Error Resilient Binary Neural Networks With Aggressive SRAM Voltage Scaling on a 22-nm IoT End-Node," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 11, pp. 3905-3918, Nov. 2020, doi: 10.1109/TCSI.2020.3012576.
- 43. Elnaqib, H. Okuhara, T. Jang, D. Rossi and L. Benini, "A 0.5GHz 0.35mW LDO-Powered Constant-Slope Phase Interpolator With 0.22% INL," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 1, pp. 156-160, Jan. 2021, doi: 10.1109/TCSII.2020.3005246.
- 44. F. Glaser, G. Tagliavini, D. Rossi, G. Haugou, Q. Huang and L. Benini, "Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters," in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 3, pp. 633-648, 1 March 2021, doi: 10.1109/TPDS.2020.3028691.
- 45. P. D. Schiavone et al., "Arnold: An eFPGA-Augmented RISC-V SoC for Flexible and Low-Power IoT End Nodes," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 4, pp. 677-690, April 2021, doi: 10.1109/TVLSI.2021.3058162.
- 46. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi and F. Conti, "DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs," in IEEE Transactions on Computers, doi: 10.1109/TC.2021.3066883.
- 47. P. Palestri et al., "Analytical Modeling of Jitter in Bang-Bang CDR Circuits Featuring Phase Interpolation," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 7, pp. 1392-1401, July 2021, doi: 10.1109/TVLSI.2021.3068450.
- 48. Garofalo, G. Tagliavini, F. Conti, L. Benini and D. Rossi, "XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V based IoT End Nodes," in IEEE Transactions on Emerging Topics in Computing, doi: 10.1109/TETC.2021.3072337.
- 49. H. Okuhara et al., "A Fully Integrated 5-mW, 0.8-Gbps Energy-Efficient Chip-to-Chip Data Link for Ultralow-Power IoT End-Nodes in 65-nm CMOS," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 10, pp. 1800-1811, Oct. 2021, doi: 10.1109/TVLSI.2021.3108806.
- 50. F. Montagna et al., "A Low-Power Transprecision Floating-Point Cluster for Efficient Near-Sensor Data Analytics," in IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 5, pp. 1038-1053, 1 May 2022, doi: 10.1109/TPDS.2021.3101764.
- 51. D. Rossi et al., "Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode," in IEEE Journal of Solid-State Circuits, vol. 57, no. 1, pp. 127-139, Jan. 2022, doi: 10.1109/JSSC.2021.3114881.
- 52. A. Garofalo et al., "A Heterogeneous In-Memory Computing Cluster for Flexible End-to-End Inference of Real-World Deep Neural Networks," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 12, no. 2, pp. 422-435, June 2022, doi: 10.1109/JETCAS.2022.3170152.
- 53. S. Abadal et al., "Graphene-based Wireless Agile Interconnects for Massive Heterogeneous Multi-chip Processors," in IEEE Wireless Communications, doi: 10.1109/MWC.010.2100561.
- 54. A. Garofalo et al., "DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training," in IEEE Open Journal of the Solid-State Circuits Society, vol. 2, pp. 231-243, 2022, doi: 10.1109/OJSSCS.2022.3210082.
- 55. B. Frankel, E. Sarfati, D. Rossi and S. Wimer, "Energy Efficiency of Opportunistic Refreshing for Gain-Cell Embedded DRAM," in IEEE Transactions on Circuits and Systems I: Regular Papers, doi: 10.1109/TCSI.2022.3231866.

- 56. J. Chen, I. Loi, E. Flamand, G. Tagliavini, L. Benini and D. Rossi, "Scalable Hierarchical Instruction Cache for Ultralow-Power Processors Clusters," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, doi: 10.1109/TVLSI.2022.3228336.
- 57. G. Ottavi et al., "Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode," in IEEE Transactions on Circuits and Systems I: Regular Papers, doi: 10.1109/TCSI.2023.3254810.
- 58. F. Conti et al., "Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC With 2–8 b DNN Acceleration and 30%-Boost Adaptive Body Biasing," in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2023.3318301.
- 59. Michael Rogenmoser, Yvan Tortorella, Davide Rossi, Francesco Conti, and Luca Benini. 2023. Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space. ACM Trans. Cyber-Phys. Syst. Just Accepted (November 2023). https://doi.org/10.1145/3635161
- 60. Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti, "RedMule: A mixed-precision matrix-matrix operation engine for flexible and energy-efficient on-chip linear algebra and TinyML training acceleration, Future Generation Computer Systems, Volume 149, 2023, Pages 122-135, ISSN 0167-739X, https://doi.org/10.1016/j.future.2023.07.002."
- 61. B. Sá, L. Valente, J. Martins, D. Rossi, L. Benini and S. Pinto, "CVA6 RISC-V Virtualization: Architecture, Microarchitecture, and Design Space Exploration," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, doi: 10.1109/TVLSI.2023.3302837.
- 62. Ottaviano, A., Balas, R., Bambini, G. et al. ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation. Int J Parallel Prog (2024). https://doi.org/10.1007/s10766-024-00761-4
- 63. L. Valente et al., "A Heterogeneous RISC-V Based SoC for Secure Nano-UAV Navigation," in IEEE Transactions on Circuits and Systems I: Regular Papers, doi: 10.1109/TCSI.2024.3359044.
- 64. A. S. Prasad et al., "Siracusa: A 16 nm Heterogenous RISC-V SoC for Extended Reality With At-MRAM Neural Engine," in IEEE Journal of Solid-State Circuits, vol. 59, no. 7, pp. 2055-2069, July 2024, doi: 10.1109/JSSC.2024.3385987.

# **Conference Proceedings**

- 65. C. Brunelli, F. Garzia, J. Nurmi, C. Mucci, F. Campi and D. Rossi, "A FPGA Implementation of An Open-Source Floating-Point Computation System," 2005 International Symposium on System-on-Chip, 2005, pp. 29-32, doi: 10.1109/ISSOC.2005.1595636.
- 66. C. Brunelli, F. Cinelli, D. Rossi and J. Nurmi, "A VHDL model and Implementation of a Coarse-Grain Reconfigurable Coprocessor for a RISC Core," 2006 Ph.D. Research in Microelectronics and Electronics, 2006, pp. 229-232, doi: 10.1109/RME.2006.1689938.
- 67. F. Garzia, C. Brunelli, D. Rossi and J. Nurmi, "Implementation of a floating-point matrix-vector multiplication on a reconfigurable architecture," 2008 IEEE International Symposium on Parallel and Distributed Processing, 2008, pp. 1-6, doi: 10.1109/IPDPS.2008.4536538.
- 68. D. Rossi, F. Campi, S. Spolzino, S. Pucillo and R. Guerrieri, "A Heterogeneous Digital Signal Processor for Dynamically Reconfigurable Computing," in IEEE Journal of Solid-State Circuits, vol. 45, no. 8, pp. 1615-1626, Aug. 2010, doi: 10.1109/JSSC.2010.2048149.
- 69. D. Rossi, F. Campi, A. Deledda, C. Mucci, S. Pucillo, S. Whitty, R. Ernst, S. Chevobbe, S. Guyetant, M. Kühnle, M. Hübner, J. Becker and W. Putzke-Roeming, "A multi-core signal processor for heterogeneous reconfigurable computing," 2009 International Symposium on System-on-Chip, 2009, pp. 106-109, doi: 10.1109/SOCC.2009.5335668.

- F. Campi, R. König, M. Dreschmann, M. Neukirchner, D. Picard, M. Jüttner, E. Schüler, A. Deledda, D. Rossi, A. Pasini, M. Hübner, J. Becker, R. Guerrieri, "RTL-to-layout implementation of an embedded coarse grained architecture for dynamically reconfigurable computing in systems-on-chip," 2009 International Symposium on System-on-Chip, 2009, pp. 110-113, doi: 10.1109/SOCC.2009.5335665.
- 71. A. Manuzzato, F. Campi, D. Rossi, V. Liberali and D. Pandini, "Exploiting body biasing for leakage reduction: A case study," 2013 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2013, pp. 133-138, doi: 10.1109/ISVLSI.2013.6654635.
- 72. D. Bortolotti, D. Rossi, A. Bartolini and L. Benini, "A variation tolerant architecture for ultra low power multi-processor cluster," 2013 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013, pp. 32-38, doi: 10.1109/PATMOS.2013.6662152.
- 73. D. Bortolotti, A. Bartolini, C. Weis, D. Rossi and L. Beninio, "Hybrid memory architecture for voltage scaling in ultra-low power multi-core biomedical processors," 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, pp. 1-6, doi: 10.7873/DATE.2014.182.
- 74. Michael Gautschi, Davide Rossi, and Luca Benini. 2014. Customizing an open source processor to fit in an ultralow power cluster with a shared L1 memory. In Proceedings of the 24th edition of the great lakes symposium on VLSI (GLSVLSI '14). Association for Computing Machinery, New York, NY, USA, 87–88. DOI:https://doi.org/10.1145/2591513.2591569.
- 75. Davide Rossi, Igor Loi, Germain Haugou, and Luca Benini. 2014. Ultra-low-latency lightweight DMA for tightly coupled multi-core clusters. In Proceedings of the 11th ACM Conference on Computing Frontiers (CF '14). Association for Computing Machinery, New York, NY, USA, Article 15, 1–10. DOI:https://doi.org/10.1145/2597917.2597922.
- 76. F. Conti, D. Rossi, A. Pullini, I. Loi and L. Benini, "Energy-efficient vision on the PULP platform for ultra-low power parallel computing," 2014 IEEE Workshop on Signal Processing Systems (SiPS), 2014, pp. 1-6, doi: 10.1109/SiPS.2014.6986099.
- 77. D. Rossi, I. Loi, F. Conti, G. Tagliavini, A. Pullini and A. Marongiu, "Energy efficient parallel computing on the PULP platform with support for OpenMP," 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI), 2014, pp. 1-5, doi: 10.1109/EEEI.2014.7005803.
- 78. A. Teman, D. Rossi, P. Meinerzhagen, L. Benini and A. Burg, "Controlled placement of standard cell memory arrays for high density and low power in 28nm FD-SOI," The 20th Asia and South Pacific Design Automation Conference, 2015, pp. 81-86, doi: 10.1109/ASPDAC.2015.7058985.
- 79. E. Azarkhish, D. Rossi, I. Loi and L. Benini, "High performance AXI-4.0 based interconnect for extensible smart memory cubes," 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp. 1317-1322, doi: 10.7873/DATE.2015.0054.
- 80. Gomez, C. Pinto, A. Bartolini, D. Rossi, H. Fatemi, J. Pineda de Gyvez, and L. Benini, "Reducing energy consumption in microcontroller-based platforms with low design margin co-processors," 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp. 269-272, doi: 10.7873/DATE.2015.1013.
- 81. Igor Loi, Davide Rossi, Germain Haugou, Michael Gautschi, and Luca Benini. 2015. Exploring multi-banked shared-L1 program cache on ultra-low power, tightly coupled processor clusters. In Proceedings of the 12th ACM International Conference on Computing Frontiers (CF '15). Association for Computing Machinery, New York, NY, USA, Article 64, 1–8. DOI:https://doi.org/10.1145/2742854.2747288.
- 82. G. Tagliavini, D. Rossi, L. Benini and A. Marongiu, "Synergistic Architecture and Programming Model Support for Approximate Micropower Computing," 2015 IEEE Computer Society Annual Symposium on VLSI, 2015, pp. 280-285, doi: 10.1109/ISVLSI.2015.64.
- 83. D. Rossi, F. Conti, A. Marongiu, A. Pullini, I. Loi, M. Gautschi, G. Tavaglini, A. Capotondi, P. Flatresse, L. Benini, "PULP: A parallel ultra low power platform for next generation IoT applications," 2015 IEEE Hot Chips 27 Symposium (HCS), 2015, pp. 1-39, doi: 10.1109/HOTCHIPS.2015.7477325.
- 84. D. Rossi, A. Pullini, M. Gautschi, I. Loi; F. K. Gurkaynak, P. Flatresse, L. Benini, "A –1.8V to 0.9V body bias, 60 GOPS/W 4-core cluster in low-power 28nm UTBB FD-SOI technology," 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2015, pp. 1-3, doi: 10.1109/S3S.2015.7333483.

- 85. F. Conti, D. Palossi, A. Marongiu, D. Rossi and L. Benini, "Enabling the heterogeneous accelerator model on ultralow power microcontroller platforms," 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016, pp. 1201-1206.
- 86. Pahlevan, J. Picorel, A. P. Zarandi, D. Rossi, M. Zapater, A. Bartolini, P. G. Del Valle, D. Atienza, L. Benini, B. Falsafi, "Towards near-threshold server processors," 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016, pp. 7-12.
- 87. Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2016. Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube. In Proceedings of the 29th International Conference on Architecture of Computing Systems -- ARCS 2016 Volume 9637. Springer-Verlag, Berlin, Heidelberg, 19–31. DOI:https://doi.org/10.1007/978-3-319-30695-7\_2
- 88. D. Rossi, A. Pullini, I. Loi, M. Gautschi, F. K. Gurkaynak, A. Teman, J. Constantin, A. Burg, I. M. Panades, E. Beignè, F. Clermidy, F. Abouzeid, P. Flatresse, L. Benini, "193 MOPS/mW @ 162 MOPS, 0.32V to 1.15V voltage range multi-core accelerator for energy efficient parallel and sequential digital processing," 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX), 2016, pp. 1-3, doi: 10.1109/CoolChips.2016.7503670.
- 89. A. Pullini, F. Conti, D. Rossi, I. Loi, M. Gautschi and L. Benini, "A heterogeneous multi-core system-on-chip for energy efficient brain inspired vision," 2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016, pp. 2910-2910, doi: 10.1109/ISCAS.2016.7539213.
- 90. R. Andri, L. Cavigelli, D. Rossi and L. Benini, "YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights," 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016, pp. 236-241, doi: 10.1109/ISVLSI.2016.111.
- 91. M. Crescentini, M. Biondi, M. Bennati, P. Alberti, G. Luciani, C. Tamburini, M. Pizzotti, A. Romani, M. Tartagni, D. Bellasi, D. Rossi, L. Benini, "A 2 MS/s 10A Hall current sensor SoC with digital compressive sensing encoder in 0.16 µm BCD," ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, 2016, pp. 393-396, doi: 10.1109/ESSCIRC.2016.7598324.
- 92. M. Rusci, D. Rossi, M. Lecca, M. Gottardi, L. Benini, E. Farella, "Energy-efficient design of an always-on smart visual trigger," 2016 IEEE International Smart Cities Conference (ISC2), 2016, pp. 1-6, doi: 10.1109/ISC2.2016.7580824.
- 93. S. Benatti, F. Montagna, D. Rossi and L. Benini, "Scalable EEG seizure detection on an ultra low power multi-core architecture," 2016 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2016, pp. 86-89, doi: 10.1109/BioCAS.2016.7833731.
- 94. D. Rossi, "Sub-pJ per operation scalable computing: The PULP experience," 2016 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2016, pp. 1-3, doi: 10.1109/S3S.2016.7804389.
- 95. G. Tagliavini, A. Marongiu, D. Rossi and L. Benini, "Always-on motion detection with application-level error control on a near-threshold approximate computing platform," 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2016, pp. 552-555, doi: 10.1109/ICECS.2016.7841261.
- 96. S. Das, K. J. M. Martin, P. Coussy, D. Rossi and L. Benini, "Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures," 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017, pp. 127-132, doi: 10.1109/ASPDAC.2017.7858308.
- 97. V. Kartsch, S. Benatti, D. Rossi and L. Benini, "A wearable EEG-based drowsiness detection system with blink duration and alpha waves analysis," 2017 8th International IEEE/EMBS Conference on Neural Engineering (NER), 2017, pp. 251-254, doi: 10.1109/NER.2017.8008338.
- 98. S. Das, D. Rossi, K. J. M. Martin, P. Coussy and L. Benini, "A 142MOPS/mW integrated programmable array accelerator for smart visual processing," 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017, pp. 1-4, doi: 10.1109/ISCAS.2017.8050238.
- 99. A. Pullini, D. Rossi, G. Haugou and L. Benini, "μDMA: An autonomous I/O subsystem for IoT end-nodes," 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2017, pp. 1-8, doi: 10.1109/PATMOS.2017.8106971.

- 100.Pasquale Davide Schiavone, Francesco Conti, Davide Rossi, Michael Gautschi, Antonio Pullini, Eric Flamand and Luca Benini, "Slow and steady wins the race? A comparison of ultra-low-power RISC-V cores for Internet-of-Things applications," 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2017, pp. 1-8, doi: 10.1109/PATMOS.2017.8106976.
- 101.A. Di Mauro, D. Rossi, A. Pullini, P. Flatresse and L. Benini, "Temperature and process-aware performance monitoring and compensation for an ULP multi-core cluster in 28nm UTBB FD-SOI technology," 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2017, pp. 1-8, doi: 10.1109/PATMOS.2017.8106979.
- 102. Pahlevan et al., "Energy proportionality in near-threshold computing servers and cloud data centers: Consolidating or Not?," 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp. 147-152, doi: 10.23919/DATE.2018.8341994.
- 103.G. Tagliavini, S. Mach, D. Rossi, A. Marongiu and L. Benini, "A transprecision floating-point platform for ultralow power computing," 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp. 1051-1056, doi: 10.23919/DATE.2018.8342167.
- 104.A. Di Mauro, D. Rossi, A. Pullini, P. Flatresse and L. Benini, "Live Demonstration: Body-Bias Based Performance Monitoring and Compensation for a Near-Threshold Multi-Core Cluster in 28nm FD-SOI Technology," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1-1, doi: 10.1109/ISCAS.2018.8351586.
- 105.S. Mach, D. Rossi, G. Tagliavini, A. Marongiu and L. Benini, "A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1-5, doi: 10.1109/ISCAS.2018.8351816.
- 106.S. Das, K. J. M. Martin, P. Coussy and D. Rossi, "A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1-5, doi: 10.1109/ISCAS.2018.8351749.
- 107.M. Dazzi *et al.*, "Sub-mW multi-Gbps chip-to-chip communication Links for Ultra-Low Power IoT end-nodes," 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1-5, doi: 10.1109/ISCAS.2018.8351893.
- 108.A. Pullini, D. Rossi, I. Loi, A. Di Mauro and L. Benini, "Mr. Wolf: A 1 GFLOP/s Energy-Proportional Parallel Ultra Low Power SoC for IOT Edge Processing," *ESSCIRC 2018 IEEE 44th European Solid State Circuits Conference (ESSCIRC)*, 2018, pp. 274-277, doi: 10.1109/ESSCIRC.2018.8494247.
- 109.H. Zolfaghari, D. Rossi and J. Nurmi, "An Explicitly Parallel Architecture for Packet Parsing in Software Defined Networks," 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2018, pp. 1-4, doi: 10.1109/ASAP.2018.8445123.
- 110.E. Flamand, D. Rossi, F. Conti, A. Pullini, I. Loi, F. Rotenberg and L. Benini, "GAP-8: A RISC-V SoC for AI at the Edge of the IoT," 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2018, pp. 1-4, doi: 10.1109/ASAP.2018.8445101.
- 111.A. D. Mauro, D. Rossi, A. Pullini, P. Flatresse and L. Benini, "Independent Body-Biasing of P-N Transistors in an 28nm UTBB FD-SOI ULP Near-Threshold Multi-Core Cluster," 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2018, pp. 1-3, doi: 10.1109/S3S.2018.8640136.
- 112.P. D. Schiavone, D. Rossi, A. Pullini, A. Di Mauro, F. Conti and L. Benini, "Quentin: an Ultra-Low-Power PULPissimo SoC in 22nm FDX," 2018 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2018, pp. 1-3, doi: 10.1109/S3S.2018.8640145.
- 113.F. Montagna, A. Rahimi, S. Benatti, D. Rossi and L. Benini, "PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform," 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 2018, pp. 1-6, doi: 10.1109/DAC.2018.8465801.
- 114.R. Andri, L. Cavigelli, D. Rossi and L. Benini, "Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes," *2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, 2018, pp. 509-515, doi: 10.1109/ISVLSI.2018.00099.

- 115.R. Aghazadeh, F. Montagna, S. Benatti, D. Rossi and J. Frounchi, "Compressed Sensing Based Seizure Detection for an Ultra Low Power Multi-core Architecture," 2018 International Conference on High Performance Computing & Simulation (HPCS), 2018, pp. 492-495, doi: 10.1109/HPCS.2018.00083.
- 116.F. Renzini, D. Rossi, E. F. Scarselli, C. Mucci and R. Canegallo, "A Fully Programmable eFPGA-Augmented SoC for Smart-Power Applications," 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Bordeaux, 2018, pp. 241-244, doi: 10.1109/ICECS.2018.8617970.
- 117.F. Glaser, G. Haugou, D. Rossi, Q. Huang and L. Benini, "Hardware-Accelerated Energy-Efficient Synchronization and Communication for Ultra-Low-Power Tightly Coupled Clusters," 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2019, pp. 552-557, doi: 10.23919/DATE.2019.8715266.
- 118.G. Tagliavini, S. Mach, D. Rossi, A. Marongiu and L. Benini, "Design and Evaluation of SmallFloat SIMD extensions to the RISC-V ISA," 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2019, pp. 654-657, doi: 10.23919/DATE.2019.8714897.
- 119.A. Burrello, F. Conti, A. Garofalo, D. Rossi, and L. Benini. "DORY: Lightweight memory hierarchy management for deep NN inference on IoT endnodes: work-in-progress". In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis Companion (CODES/ISSS '19). Association for Computing Machinery, New York, NY, USA, Article 17, 1–2. DOI:https://doi.org/10.1145/3349567.3351726
- 120.H. Zolfaghari, D. Rossi and J. Nurmi, "Reducing Crossbar Costs in the Match-Action Pipeline," 2019 IEEE 20th International Conference on High Performance Switching and Routing (HPSR), 2019, pp. 1-6, doi: 10.1109/HPSR.2019.8808105.
- 121.H. Zolfaghari, D. Rossi and J. Nurmi, "An Explicitly Parallel Architecture for Packet Processing in Software Defined Networks," 2019 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC), 2019, pp. 1-7, doi: 10.1109/NORCHIP.2019.8906959.
- 122.A. Bartolini, D. Rossi, A. Mastrandrea, C. Conficoni, S. Benatti, A. Tilli, L. Benini, "A PULP-based Parallel Power Controller for Future Exascale Systems," 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Genoa, Italy, 2019, pp. 771-774, doi: 10.1109/ICECS46596.2019.8964699.
- 123.A. Garofalo, M. Rusci, F. Conti, D. Rossi and L. Benini, "PULP-NN: A Computing Library for Quantized Neural Network inference at the edge on RISC-V Based Parallel Ultra Low Power Clusters," 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Genoa, Italy, 2019, pp. 33-36, doi: 10.1109/ICECS46596.2019.8965067.
- 124.A. D. Mauro, F. Conti, P. D. Schiavone, D. Rossi and L. Benini, "Pushing On-chip Memories Beyond Reliability Boundaries in Micropower Machine Learning Applications," 2019 IEEE International Electron Devices Meeting (IEDM), 2019, pp. 30.4.1-30.4.4, doi: 10.1109/IEDM19573.2019.8993434.
- 125.N. Bruschi, A. Garofalo, F. Conti, G. Tagliavini, and D. Rossi. 2020. "Enabling mixed-precision quantized neural networks in extreme-edge devices". In Proceedings of the 17th ACM International Conference on Computing Frontiers (CF '20). Association for Computing Machinery, New York, NY, USA, 217–220. DOI:https://doi.org/10.1145/3387902.3394038.
- 126.P. D. Schiavone et al., "Neuro-PULP: A Paradigm Shift Towards Fully Programmable Platforms for Neural Interfaces," 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Genova, Italy, 2020, pp. 50-54, doi: 10.1109/AICAS48895.2020.9073920.
- 127. Garofalo, G. Tagliavini, F. Conti, D. Rossi and L. Benini, "XpulpNN: Accelerating Quantized Neural Networks on RISC-V Processors Through ISA Extensions," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 186-191, doi: 10.23919/DATE48585.2020.9116529.
- 128.Rohit Prasad, Satyajit Das, Kevin J. M. Martin, Giuseppe Tagliavini, Philippe Coussy, Luca Benini, Davide Rossi, "TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 1067-1072, doi: 10.23919/DATE48585.2020.9116408.
- 129.C. Jie, I. Loi, L. Benini and D. Rossi, "Energy-Efficient Two-level Instruction Cache Design for an Ultra-Low-Power Multi-core Cluster," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 1734-1739, doi: 10.23919/DATE48585.2020.9116212.

- 130.H. Okuhara et al., "An Energy-Efficient Low-Voltage Swing Transceiver for mW-Range IoT End-Nodes," 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Sevilla, 2020, pp. 1-5, doi: 10.1109/ISCAS45731.2020.9181081.
- 131.G. Ottavi, A. Garofalo, G. Tagliavini, F. Conti, L. Benini and D. Rossi, "A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference," 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus, 2020, pp. 512-517, doi: 10.1109/ISVLSI49217.2020.000-5.
- 132.D. Rossi et al., "4.4 A 1.3TOPS/W @ 32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode," 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 60-62, doi: 10.1109/ISSCC42613.2021.9365939.
- 133. Joshua Klein, Alexandre Levisse, Giovanni Ansaloni, David Atienza, Marina Zapater, Martino Dazzi, Geethan Karunaratne, Irem Boybat, Abu Sebastian, Davide Rossi, Francesco Conti, Elana Pereira de Santana, Peter Haring Bolívar, Mohamed Saeed, Renato Negra, Zhenxing Wang, Kun-Ta Wang, Max C. Lemme, Akshay Jain, Robert Guirado, Hamidreza Taghvaee, and Sergi Abadal. 2021. Architecting more than Moore: wireless plasticity for massive heterogeneous computer architectures (WiPLASH). In Proceedings of the 18th ACM International Conference on Computing Frontiers (CF '21). Association for Computing Machinery, New York, NY, USA, 191–193. DOI:https://doi.org/10.1145/3457388.3458859.
- 134.G. Ottavi, G. Karunaratne, F. Conti, I. Boybat, L. Benini and D. Rossi, "End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?," 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1-4, doi: 10.1109/AICAS51828.2021.9458409.
- 135.Pasquale Davide Schiavone, Davide Rossi, Yan Liu, Simone Benatti, Song Luan, Ian Williams, Luca Benini, Timothy Constandinou, "Neuro-PULP: A Paradigm Shift Towards Fully Programmable Platforms for Neural Interfaces," 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2020, pp. 50-54, doi: 10.1109/AICAS48895.2020.9073920.
- 136.Garofalo, G. Tagliavini, F. Conti, D. Rossi and L. Benini, "XpulpNN: Accelerating Quantized Neural Networks on RISC-V Processors Through ISA Extensions," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 186-191, doi: 10.23919/DATE48585.2020.9116529.
- 137.R. Prasad et al., "TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 1067-1072, doi: 10.23919/DATE48585.2020.9116408.
- 138.C. Jie, I. Loi, L. Benini and D. Rossi, "Energy-Efficient Two-level Instruction Cache Design for an Ultra-Low-Power Multi-core Cluster," 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020, pp. 1734-1739, doi: 10.23919/DATE48585.2020.9116212.
- 139.H. Okuhara et al., "An Energy-Efficient Low-Voltage Swing Transceiver for mW-Range IoT End-Nodes," 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Sevilla, 2020, pp. 1-5, doi: 10.1109/ISCAS45731.2020.9181081.
- 140.G. Ottavi, A. Garofalo, G. Tagliavini, F. Conti, L. Benini and D. Rossi, "A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference," 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus, 2020, pp. 512-517, doi: 10.1109/ISVLSI49217.2020.000-5.
- 141.D. Rossi et al., "4.4 A 1.3TOPS/W @ 32GOPS Fully Integrated 10-Core SoC for IoT End-Nodes with 1.7μW Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode," 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 2021, pp. 60-62, doi: 10.1109/ISSCC42613.2021.9365939.
- 142. Joshua Klein, Alexandre Levisse, Giovanni Ansaloni, David Atienza, Marina Zapater, Martino Dazzi, Geethan Karunaratne, Irem Boybat, Abu Sebastian, Davide Rossi, Francesco Conti, Elana Pereira de Santana, Peter Haring Bolívar, Mohamed Saeed, Renato Negra, Zhenxing Wang, Kun-Ta Wang, Max C. Lemme, Akshay Jain, Robert Guirado, Hamidreza Taghvaee, and Sergi Abadal. 2021. Architecting more than Moore: wireless plasticity for massive heterogeneous computer architectures (WiPLASH). In Proceedings of the 18th ACM International Conference on Computing Frontiers (CF '21). Association for Computing Machinery, New York, NY, USA, 191–193. DOI:https://doi.org/10.1145/3457388.3458859

- 143.A. Garofalo et al., "A 1.15 TOPS/W, 16-Cores Parallel Ultra-Low Power Cluster with 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode," ESSCIRC 2021 IEEE 47th European Solid State Circuits Conference (ESSCIRC), 2021, pp. 267-270, doi: 10.1109/ESSCIRC53450.2021.9567767.
- 144.L. Valente, D. Rossi and L. Benini, "Hardware-In-The Loop Emulation for Agile Co-Design of Parallel Ultra-Low Power IoT Processors," 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC), 2021, pp. 1-6, doi: 10.1109/VLSI-SoC53125.2021.9607006.
- 145.N. Bruschi, G. Haugou, G. Tagliavini, F. Conti, L. Benini and D. Rossi, "GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors," 2021 IEEE 39th International Conference on Computer Design (ICCD), 2021, pp. 409-416, doi: 10.1109/ICCD53106.2021.00071.
- 146.G. Ottavi, G. Karunaratne, F. Conti, I. Boybat, L. Benini and D. Rossi, "End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?," 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1-4, doi: 10.1109/AICAS51828.2021.9458409.
- 147.N. Bruschi et al., "Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference," 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 170-173, doi: 10.1109/AICAS54282.2022.9869996.
- 148.A. Di Mauro, M. Scherer, D. Rossi and L. Benini, "Kraken: A Direct Event/Frame-Based Multi-sensor Fusion SoC for Ultra-Efficient Visual Processing in Nano-UAVs," 2022 IEEE Hot Chips 34 Symposium (HCS), 2022, pp. 1-19, doi: 10.1109/HCS55958.2022.9895621.
- 149.Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Corrado Bonfanti, Simone Benatti, Davide Rossi, Luca Benini, and Andrea Bartolini. 2022. ControlPULP: A RISC-V Power Controller for HPC Processors with Parallel Control-Law Computation Acceleration. In Embedded Computer Systems: Architectures, Modeling, and Simulation: 22nd International Conference, SAMOS 2022, Samos, Greece, July 3–7, 2022, Proceedings. Springer-Verlag, Berlin, Heidelberg, 120–135. https://doi.org/10.1007/978-3-031-15074-6\_8 Best HW-SW open source paper.
- 150.Garofalo et al., "Darkside: 2.6GFLOPS, 8.7mW Heterogeneous RISC-V Cluster for Extreme-Edge On-Chip DNN Inference and Training," ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC), 2022, pp. 273-276, doi: 10.1109/ESSCIRC55480.2022.9911384.
- 151.V. Jain et al., "PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge," 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2023, pp. 1-6, doi: 10.1109/DAC56929.2023.10247800.
- 152.G. Ottavi, F. Zaruba, L. Benini and D. Rossi, "Reducing Load-Use Dependency-Induced Performance Penalty in the Open-Source RISC-V CVA6 CPU," 2023 26th Euromicro Conference on Digital System Design (DSD), Golem, Albania, 2023, pp. 429-435, doi: 10.1109/DSD60849.2023.00066.
- 153.N. Bruschi et al., "End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture," 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2023, pp. 1-6, doi: 10.23919/DATE56975.2023.10137208.
- 154.L. Valente et al., "HULK-V: a Heterogeneous Ultra-low-power Linux capable RISC-V SoC," 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2023, pp. 1-6, doi: 10.23919/DATE56975.2023.10137252.
- 155.S. A. Mirsalari, G. Tagliavini, D. Rossi and L. Benini, "TransLib: A Library to Explore Transprecision Floating-Point Arithmetic on Multi-Core IoT End-Nodes," 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2023, pp. 1-2, doi: 10.23919/DATE56975.2023.10136916.
- 156.M. Sinigaglia et al., "ECHOES: a 200 GOPS/W Frequency Domain SoC with FFT Processor and I2S DSP for Flexible Data Acquisition from Microphone Arrays," 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 2023, pp. 1-5, doi: 10.1109/ISCAS46773.2023.10181862.
- 157.M. Ciani et al., "Cyber Security aboard Micro Aerial Vehicles: An OpenTitan-based Visual Communication Use Case," 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 2023, pp. 1-5, doi: 10.1109/ISCAS46773.2023.10181732.

- 158.L. Valente et al., "Shaheen: An Open, Secure, and Scalable RV64 SoC for Autonomous Nano-UAVs," 2023 IEEE Hot Chips 35 Symposium (HCS), Palo Alto, CA, USA, 2023, pp. 1-12, doi: 10.1109/HCS59251.2023.10254698.
- 159.A. Nadalini et al., "A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks," 2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Foz do Iguacu, Brazil, 2023, pp. 1-6, doi: 10.1109/ISVLSI59464.2023.10238679.
- 160.M. Scherer et al., "Siracusa: A Low-Power On-Sensor RISC-V SoC for Extended Reality Visual Processing in 16nm CMOS," ESSCIRC 2023- IEEE 49th European Solid State Circuits Conference (ESSCIRC), Lisbon, Portugal, 2023, pp. 217-220, doi: 10.1109/ESSCIRC59616.2023.10268718.
- 161.F. Conti et al., "22.1 A 12.4TOPS/W @ 136GOPS AI-IoT System-on-Chip with 16 RISC-V, 2-to-8b Precision-Scalable DNN Acceleration and 30%-Boost Adaptive Body Biasing," 2023 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA, 2023, pp. 21-23, doi: 10.1109/ISSCC42615.2023.10067643.

### **Book Chapters**

- 162.N. Voros et. al., "Dynamic System Reconfiguration in Heterogeneous Platforms", Chapter 5: "The DREAM digital Signal Processor", Springer, 2009.
- 163.N. Voros et. al., "Dynamic System Reconfiguration in Heterogeneous Platforms", Chapter 8:" The MORPHEUS Data Communication and Storage Infrastructure", Springer, 2009.
- 164.D. Rossi, I. Loi, A. Pullini, L. Benini, "Chapter 3: Ultra-Low-Power Digital Architectures for the Internet of Things", Enabling the Internet of Things, Springer, pp 69-93, 26 January 2017.
- 165.Andrea Bartolini, Davide Rossi, "Advances in power management of many-core processors", Many-Core Computing: Hardware and Software, Publication May 2019; Hardback Product Code: PBPC0220; ISBN: 978-1-78561-582-5.

# C) Institutional Activities

### **Boards**

- Member of Engineering and Information Technology for Structural and Environmental Monitoring and Risk Management (EIT4SEMM), University of Bologna, academic board since 2020.

### **Other Institutional Activities**

- Member of the Quality Committee for the Master's Degree Program in Electronic Engineering (2023 present)
- Member of the Teaching Committee for the Bachelor's Degree Program in Electronics and Telecommunications Engineering (2024 present)
- Europractice Representative for the Advanced Research Center on Electronic Systems (ARCES) at the University of Bologna (2021 present)
- Representative of the University of Bologna for the Italian Electronic Society (SIE) (2024 present)

## **Committees**

| 23/07/2019 |
|------------|
| 03/10/2019 |
| 24/10/2019 |
| 19/12/2019 |
| 06/02/2020 |
| 11/03/2020 |
| 09/10/2020 |
| 08/02/2021 |
| 13/06/2019 |
| 21/06/2019 |
|            |

| - Participation to two Research Fellows Evaluation Committees   | 26/07/2019 |
|-----------------------------------------------------------------|------------|
| - Participation to two Research Fellows Evaluation Committees   | 19/09/2019 |
| - Participation to one Research Fellow Evaluation Committee     | 26/09/2019 |
| - Participation to one Research Fellow Evaluation Committee     | 13/01/2020 |
| - Participation to one Research Fellow Evaluation Committee     | 17/01/2020 |
| - Participation to two Research Fellows Evaluation Committees   | 24/01/2020 |
| - Participation to three Research Fellows Evaluation Committees | 28/02/2020 |
| - Participation to one Research Fellow Evaluation Committee     | 20/03/2020 |
| - Participation to two Research Fellows Evaluation Committees   | 17/04/2020 |
| - Participation to one Research Fellow Evaluation Committee     | 25/06/2020 |
| - Participation to two Research Fellows Evaluation Committees   | 12/03/2021 |
| - Participation to two Research Fellows Evaluation Committees   | 26/03/2021 |
| - Participation to two Research Fellows Evaluation Committees   | 14/05/2021 |
| - Participation to two Research Fellows Evaluation Committees   | 09/06/2021 |
|                                                                 |            |