Next-generation AI Accelerators for Spatially Aware Autonomous Naval Systems
PI: Prof. Subhasish Mitra
Department: Electrical Engineering/Computer Science
Sponsor: United States Navy (USN) ONR NEPTUNE Program
21st century abundant-data applications (e.g., AI applications), that play foundational roles in overcoming global grand challenges, are demanding yet another level of integration for increasingly powerful computing systems. At this exact moment when these applications are demanding the largest gains in computing senergy and throughput, conventional approaches are stalling. Existing computing systems use large off-chip memories and spend enormous time and energy shuttling data back and forth, especially for such AI applications (also known as the memory wall). This memory wall challenge gets particularly exacerbated as AI applications move from cloudscale datacenters to remote edge devices or end devices (such as autonomous vehicles [1]). It is therefore critical to develop new fast and energy-efficient compute engines (also called edge AI accelerators) that can lie at the heart of such end devices which overcome this memory wall challenge to quickly and accurately execute AI models. Such accelerators can have wide scale implications for the U.S. Navy (USN). For instance, they would enable the Navy to enhance the capabilities of remote USN platforms that involve video-feed connectivity (e.g., unmanned vehicles, advanced manufacturing equipment, robotics, smart cameras). Off-the-shelf hardware (such as GPUs, FPGAs) have extreme energy requirements limiting their optimal use within such platforms. Here, we propose to develop new edge AI accelerator architectures that overcomes these challenges and provide 10-100× system-level Energy-Delay Product (EDP, a metric which captures both energy efficiency and latency of a system) benefits (>10× TOPS/W benefits) over the current state-of-the-art commercial hardware (e.g., NVIDIA A100 GPU, Intel Stratix FPGA) leveraging dense on-chip non-volatile memory technologies (e.g., RRAM) across relevant AI workloads (e.g., CNNs, Transformers) at the edge through computation immersed in memory.