Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

Klein, Joshua Alexander Harrison

doi:10.5075/epfl-thesis-10549

2024

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

The desire and ability to place AI-enabled applications on the edge has grown significantly in recent years. However, the compute-, area-, and power-constrained nature of edge devices are stressed by the needs of the AI-enabled applications, due to a general pressure to increase the size, depth, and capability of the underlying neural networks. These applications represent a worst-case scenario for numerous architecture-based problems due to their tendency to have a large memory footprint with millions or billions of parameters and high compute requirements via matrix operations. To address architectural issues that arise, e.g., the memory wall, computer architects and engineers have developed numerous solutions, frameworks, and techniques for modeling architectural solutions for AI-enabled applications. Apparatuses for simulating a variety of specialized systems with in-memory compute architectures, SIMD co-processors, neural network engines, and more, have all been proposed and implemented to varying degrees. However, many of these apparatuses suffer from a common limitation: they are designed with a very constrained set of experiments in mind, often only comparing their solutions against "conventional" systems. As a result, given a wide range of architectural choices for AI-enabled applications, it is extremely difficult to look at solutions both in isolation with respect to one another, as well as compare heterogeneous solutions that may employ one or several solutions simultaneously. The lack of traversability in architecture design-space explorations is a hindrance to future architecture development due to the complexity of architectural challenges to modern computing and the gaining popularity of heterogeneity in modern compute systems. Therefore, in this doctoral thesis, I present the ALPINE framework: a full system-level computer architecture framework built atop the gem5-X simulator, and my accumulated work to build apparatuses, tools, and methodologies for implementing, modeling, and extracting vital statistics from new heterogeneous edge architectures for modern AI-enabled applications. By building a framework for modeling numerous novel accelerators and interfaces into a kernel-capable full system-level computer architecture simulation of general purpose systems, I am able to perform numerous architectural explorations for modern neural networks, as well as set up the basis for future explorations.