Energy Efficiency in AI/ML: From Next-Gen Hardware to Sustainable Networks
Latest 24 papers on energy efficiency: May. 16, 2026
The relentless march of AI and Machine Learning is inspiring, but it comes with a growing appetite for energy. As models become larger and applications more ubiquitous, ensuring their efficiency—from the silicon powering them to the networks carrying their data—is paramount. This post dives into recent breakthroughs, synthesized from a collection of cutting-edge research papers, that are paving the way for a more sustainable and performant AI/ML future.
The Big Ideas & Core Innovations
At the heart of these advancements is a common goal: achieving more with less. One prominent theme is the reimagining of computing architectures to bypass traditional bottlenecks. For instance, Purdue University’s “Time Domain Near Memory Computing Engine” (Sarthak Antal et al.) introduces a time-domain near-memory computing architecture for low-precision MAC operations, side-stepping the exponential power scaling of voltage-domain ADCs/DACs. This innovative approach focuses on 4-bit workloads, achieving an impressive 7.62 TOPS/W.
Building on hardware innovation, Monash University’s Kai Sun et al., in “Not All Timesteps Matter Equally: Selective Alignment Knowledge Distillation for Spiking Neural Networks”, tackle the energy efficiency of Spiking Neural Networks (SNNs). They propose Selective Alignment Knowledge Distillation (SeAl-KD), which intelligently aligns class-level and temporal knowledge, acknowledging that intermediate misclassifications in SNNs don’t always impact final predictions. This selective approach leads to better discriminative representations and can reduce fire rates and energy consumption without sacrificing accuracy.
Another significant thrust is the optimization of network infrastructure. Trinity College Dublin researchers, Urooj Tariq et al., in “Energy Consumption in Next Generation Radio Access Networks”, highlight that processing energy now dominates total consumption in Radio Access Networks (RANs). Their work shows that baseband processing (BBP) location is the critical factor for energy efficiency, with centralized architectures achieving up to 75% energy savings compared to distributed RANs (D-RANs) at high densification.
For 6G networks, the University of Oulu and KAIST’s Elaheh Ataeebojd et al. explore “Resource Allocation and AoI-Aware Detection for ISAC with Stacked Intelligent Metasurfaces”. They demonstrate that Stacked Intelligent Metasurfaces (SIM) can achieve a remarkable 230% energy efficiency improvement over baselines, matching conventional base station performance with significantly fewer transmit antennas. Their two-timescale optimization framework elegantly manages heterogeneous services and sensing requirements.
In the realm of specialized hardware for AI tasks, the University of Texas at Austin’s Siddhartha Raman Sundara Raman et al. introduce “A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine”. Their SACHI architecture repurposes CPU’s L1 cache for processing-in-memory to solve NP-complete optimization problems, eliminating ADCs/DACs and achieving up to 4000x reuse factors for certain workloads. Meanwhile, the AGH University of Krakow’s Michał Filipkowski et al. showcase the first FPGA hardware architecture for Contrast Maximization in “FPGA-Based Hardware Architecture for Contrast Maximization in Event-Based Vision”, achieving over 200x speedup compared to CPU and ~450x speedup compared to GPU at under 1W, ideal for real-time embedded vision.
Under the Hood: Models, Datasets, & Benchmarks
These papers not only present novel algorithms but also significant contributions to the tools and metrics used to advance the field:
- Deep Mixture of Experts (MoE) & Transformer-based CP-Net: Proposed by Donggen Li et al. (Chongqing University, Surrey, Southwest Jiaotong, NTU, Houston, Kyung Hee) in “Deep Mixture of Experts Network for Resource Optimization in Aerial-Terrestrial CF-mMIMO Systems under URLLC”, this framework uses specialized expert models and a channel prediction network to optimize resource allocation in dynamic aerial-terrestrial networks, achieving 51.7% NMSE reduction for channel aging mitigation.
- Multi-Block Attention (MBA) Framework: From Mehrdad Momen-Tayefeh et al. (University of Tehran, Sharif University of Technology), this deep learning architecture for “Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO” significantly reduces pilot overhead (up to 87%) while improving estimation accuracy for IRS-assisted mmWave MIMO systems, validated with 3GPP TR 38.901 CDL models.
- ASTDP-GAD (Neuromorphic Graph Anomaly Detection): Abdul Joseph Fofanah et al. (Griffith University) introduce this framework in “Neuromorphic Graph Anomaly Detection via Adaptive STDP and Spiking Graph Neural Networks”, a first-of-its-kind integration of SNNs with STDP for energy-efficient anomaly detection in dynamic graphs, tested on datasets like DBLP and Tmall. It achieves 5.3-12.1% Macro-F1 improvements with a mean spike sparsity of 0.24, showing significant energy efficiency.
- MicroViTv2: Novendra Setyawan et al. (National Formosa University, University of Muhammadiyah Malang, National Taipei University, National Yang Ming Chiao Tung University) detail this edge-optimized Vision Transformer in “MicroViTv2: Beyond the FLOPS for Edge Energy-Friendly Vision Transformers”. Utilizing RepEmbed, RepDW, and Single Depth-Wise Transposed Attention (SDTA), it achieves superior throughput and energy efficiency on NVIDIA Jetson AGX Orin for ImageNet-1K and COCO datasets, demonstrating that FLOPs are not the sole metric for real-world efficiency. Code available at https://github.com/novendrastywn/MicroViT.
- TREA & CARMEN Accelerators: Vijay Pratap Sharma et al. and Sonu Kumar et al. (Indian Institute of Technology Indore, University of Ljubljana, Bar-Ilan University) introduce two distinct hardware accelerators. “TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification” features a dual-precision (4/8-bit) SIMD MAC unit, structured pruning (SHARP), and reconfigurable CORDIC-based activations, achieving 9x latency reduction. “CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning” uses runtime-adaptive CORDIC iteration depth for dynamic accuracy-efficiency trade-offs, achieving 11.67 TOPS/W energy efficiency in 28nm CMOS.
- PoTAcc Pipeline: Rappy Saha et al. (University of Glasgow, NVIDIA) present “PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs”, an open-source pipeline for Power-of-Two (PoT) quantized DNNs, providing shift-PE designs and accelerators for FPGAs (PYNQ-Z2, Kria), yielding 3.6x speedup and 78.0% energy reduction. Code available at https://github.com/gicLAB/PoTAcc.
- XtraMAC: Feng Yu et al. (National University of Singapore, Huazhong University of Science and Technology) detail “XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA”, a datatype-adaptive MAC architecture for FPGAs that unifies integer, floating-point, and mixed-precision operations, leading to 1.9x greater energy efficiency on LLM workloads. Code available at https://github.com/Xtra-Computing/XtraMAC.
- Scalable Digital Twin Framework: Raphael Hendrigo de Souza Gonçalves and Wendel Marcos dos Santos (Federal University of São João del-Rei, Federal Institute of Education, Science and Technology of São Paulo) propose “A Scalable Digital Twin Framework for Energy Optimization in Data Centers”, integrating IoT, cloud, and LSTM models (outperforming linear regression) for real-time energy management, reducing data center consumption by ~10% and improving PUE from 1.85 to 1.70.
- DICE Architecture: Jiayi Wang et al. (University of Washington) introduce “DICE: Enabling Efficient General-Purpose SIMT Execution with Statically Scheduled Coarse-Grained Reconfigurable Arrays”, a novel GPU architecture that uses CGRAs instead of traditional SIMD units, reducing register file accesses by 68% and improving energy efficiency by 1.77-1.90x while preserving the SIMT programming model.
- Gated Multimodal Learning Framework: Yunfei Bai et al. (King’s College London) present this framework in “Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis”, fusing tabular, textual, and spatial data for interpretable property energy performance prediction. Their work demonstrates that text modality (assessor reports) is the most dominant factor (~60% weight) for energy prediction, and helps evaluate retrofit scenarios.
- Goal-Oriented Scheduling (GoS): Prasoon Raghuwanshi et al. (University of Oulu, Texas A&M, Indian Institute of Technology Indore) develop this DRL-based framework in “Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring” for IoT sensors, reducing transmissions by 77-88% while maintaining query response accuracy for non-linear dynamic systems.
- Pre-training for Diffractive Neural Networks: Xudong Lv et al. (Harbin Institute of Technology, Hangzhou Dianzi University, The Chinese University of Hong Kong) show in “Pre-training Enables Extraordinary All-optical Image Denoising” that pre-training on 3.45 million images significantly boosts diffractive network performance for all-optical image denoising, improving PSNR from <8 dB to >18 dB and enabling real-world applications like face detection.
- Benchmarking LLMs on Edge Devices: Dorian Lamouille et al. (University of Tartu, ECAM LaSalle) conduct an extensive “Benchmarking Local Language Models for Social Robots using Edge Devices” study on Raspberry Pi 4, evaluating 25 LLMs for inference efficiency, knowledge, and teaching effectiveness. They find that models like Granite4 Tiny Hybrid (7B) offer a strong balance (2.5 TPS, 0.90 TPJ, 54.6% MMLU accuracy), challenging the notion that larger models always require more powerful hardware.
Impact & The Road Ahead
These advancements herald a future where AI is not just intelligent, but also inherently efficient and sustainable. The potential impact is vast: from greener data centers and 6G networks that intelligently manage resources to ubiquitous, energy-friendly AI on edge devices, and even specialized hardware that redefines computational limits. The push towards neuromorphic computing with SNNs and time-domain architectures signals a paradigm shift away from conventional Von Neumann bottlenecks.
Looking ahead, the emphasis will be on co-design: integrating algorithm, hardware, and network architecture early in the development cycle. The challenge lies in bridging the gap between theoretical efficiency gains and practical, deployable systems that generalize across diverse real-world scenarios. We’ll likely see more hybrid approaches, leveraging the best of both traditional and novel computing paradigms. The insights from these papers suggest that energy efficiency is no longer an afterthought but a core design principle, driving the next wave of innovation in AI/ML.
Share this content:
Post Comment