Loading Now

Energy Efficiency in AI/ML: From Silicon to Networks, Major Leaps Towards Sustainable Computing

Latest 24 papers on energy efficiency: May. 23, 2026

The relentless march of AI/ML, particularly with the rise of massive models and ubiquitous IoT, has brought an unprecedented demand for computational power. This surge, while fueling innovation, presents a critical challenge: energy consumption. The ‘memory wall’ bottleneck, in particular, remains a persistent barrier to both performance and efficiency in modern architectures. However, recent breakthroughs across hardware, algorithms, and network design are paving the way for a more sustainable and efficient AI future. This post dives into a collection of cutting-edge research, revealing how innovators are tackling energy challenges from fundamental silicon design to macro-scale network optimization.

The Big Idea(s) & Core Innovations

At the heart of these advancements is a holistic approach, recognizing that energy efficiency is a multi-faceted problem requiring innovation at every layer. For instance, in the realm of specialized computing, A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine by Siddhartha Raman Sundara Raman, Lizy K. John, and Jaydeep P. Kulkarni from The University of Texas at Austin presents SACHI, an all-digital Ising machine that cleverly reuses L1 cache for in-memory compute, achieving massive speedups and energy efficiency for NP-complete optimization problems without complex DAC/ADC converters. Similarly, their work in A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks with NEM-GNN, shows how repurposing L1 cache for GNN acceleration leads to 80-230x performance and 850-1134x energy efficiency by focusing on graph and sparsity-aware near-memory aggregation.

Pushing the boundaries of neuromorphic computing, ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing by Kang You et al. from Shanghai Jiao Tong University and Shanghai AI Laboratory introduces ELSA, a fine-grained spine/token-wise pipeline for Spiking Neural Networks (SNNs), enabling elastic inference with 3.4x speedup and 13.6x energy efficiency over state-of-the-art accelerators. Complementing this, Ankit Kumar Tenwar et al. from Indian Institute of Technology Indore in E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference designed E-ReCON, a digital compute-in-memory (DCIM) macro that leverages 3T1R ReRAM bitcells for up to 419 TOPS/W energy efficiency for both CNNs and SNNs at the edge. The synergy between SNNs and energy efficiency is further highlighted by Neuromorphic Graph Anomaly Detection via Adaptive STDP and Spiking Graph Neural Networks from Abdul Joseph Fofanah et al. at Griffith University, introducing ASTDP-GAD, which integrates STDP-based learning into SNNs for dynamic graph anomaly detection, achieving significant Macro-F1 improvements with high energy efficiency due to sparse spiking.

Optimizing existing hardware for new applications is another key theme. Shaizeen Aga and Mohamed Assem Ibrahim from Advanced Micro Devices, Inc. in CompPow: A Case for Component-level GPU Power Management demonstrate how component-aware power management on GPUs can yield up to 10% higher energy efficiency for ML workloads by intelligently allocating power across GPU components. Similarly, PALS: Power-Aware LLM Serving for Mixture-of-Experts Models by Can Hankendi et al. from Boston University and Harvard University presents PALS, a power-aware runtime for LLM inference that jointly optimizes GPU power caps and software parameters like batch size, leading to up to 26.3% energy efficiency gains and 4x-7x fewer QoS violations, revealing that maximum power isn’t always optimal, especially for communication-bound MoE models. On the network front, Sustainable Real-Time 8K60 HEVC Encoding for V2X: Repurposing Legacy NVENC Hardware at the Vehicular Edge by Kasidis Arunruangsirilert and Jiro Katto from Waseda University shows that legacy NVIDIA Pascal GPUs can achieve real-time 8K60 HEVC encoding with 2-Way Split Frame Encoding (SFE), showcasing impressive efficiency gains and a sustainable approach to vehicular edge computing.

Even fundamental computing paradigms are being rethought. Haesol Im et al. from 1QBit and HPE Labs, in Accelerating Hybrid XOR–CNF Boolean Satisfiability Problems Natively with In-Memory Computing, introduce a memristor-based in-memory computing accelerator that natively solves XOR–CNF SAT problems, achieving 10x speedup and 1000x energy efficiency over CPUs. This is complemented by Time Domain Near Memory Computing Engine by Sarthak Antal and Steve Enosh from Purdue University, exploring a time-domain near-memory computing architecture for low-precision MAC operations that avoids the exponential power scaling limitations of traditional voltage-domain ADCs/DACs, achieving 7.62 TOPS/W.

Under the Hood: Models, Datasets, & Benchmarks

These research efforts leverage and introduce a diverse set of models, datasets, and benchmarks to validate their innovations:

  • Hardware/Architectures: AMD Instinct MI300X Platform (CompPow), UPMEM PIM System (Taking Cryptography Out of the Data Path), Nordic nRF54L15 (Enhanced-BLE), 3T1R ReRAM bitcell & interleaved adder tree (E-ReCON), Repurposed L1/L2 cache (NEM-GNN, SACHI), Memristor crossbar arrays (Accelerating Hybrid XOR–CNF), Passive optical encoder with microlens array (Low Latency Gaze Tracking), NVIDIA Pascal GPUs with NVENC (Sustainable Real-Time 8K60 HEVC).
  • AI Models/Frameworks: Spiking Neural Networks (ELSA, E-ReCON, ASTDP-GAD, Not All Timesteps Matter Equally), Mixture-of-Experts (MoE) models and vLLM (PALS, Deep Mixture of Experts Network), Graph Neural Networks (NEM-GNN, ASTDP-GAD), Convolutional Attention Network (CAN) and Complex Multi-Convolutional Network (CMN) (Multi-Block Attention), Transformers (Deep Mixture of Experts Network).
  • Datasets/Benchmarks: CIFAR-10/100, ImageNet, DVS-CIFAR10 (Not All Timesteps Matter Equally), Celeb-DF, DeepSpeak, Google VEO-3 (Scalable, Energy-Efficient Optical-Neural Architecture), ETH-XGaze (Low Latency Gaze Tracking), Netflix Chimera, ITE UHD, Xiph.org Derf’s Test Media (Sustainable Real-Time 8K60 HEVC Encoding), DBLP, Tmall, Patent, Yelp, T-Finance, Weibo, BlogCatalog, Flickr, Amazon (Neuromorphic Graph Anomaly Detection), Flickr30K, MS-COCO, LLaVA-1.5-mix-665K (CAST).

Impact & The Road Ahead

The collective impact of this research is profound. From achieving 419 TOPS/W for edge AI inference with E-ReCON to a staggering 1000x energy efficiency for SAT problems with Accelerating Hybrid XOR–CNF Boolean Satisfiability Problems Natively with In-Memory Computing, these advancements promise to make AI more pervasive, efficient, and sustainable. For instance, Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM by Nicola Barcarolo et al. from the University of Trento and ETH Zurich, demonstrates that multi-rank PIM systems can outperform modern CPUs for cryptographic operations like AES-128 and SHA-256 by eliminating data movement, offering a secure and efficient future for data processing.

In the realm of IoT and communications, the hybrid Enhanced-BLE framework from Ziyao Zhou et al. at Nanyang Technological University as detailed in Enhanced-BLE: A Hybrid BLE-ESB Framework for Dynamically Reconfigurable and Energy-Efficient 2.4 GHz IoT Communication, offers a twofold throughput improvement and 50% energy reduction for IoT communication. Beyond individual devices, the larger network infrastructure is also undergoing a green transformation. Praveen Hegde and Robin Joseph Varughese from Verizon and Marriott International, in their papers IoT and Massive Connectivity: Massive MIMO Optimization for IoT Connectivity in 5G and Beyond Networks and Sustainability in Telecom: Energy-Efficient Networks and Circular Economy Models to Reduce Carbon Footprints and Increase Efficiency, highlight how AI-enhanced Massive MIMO can nearly double energy efficiency and how a combination of AI-optimized energy management and circular economy principles can cut telecom carbon emissions by 18-25%. This is reinforced by Urooj Tariq et al. from Trinity College Dublin in Energy Consumption in Next Generation Radio Access Networks, showing that intelligent baseband processing (BBP) placement in O-RAN can achieve 75% energy savings. Furthermore, hardware-orchestrated dynamic power management, as shown by Charalampos S. Kouzinopoulos et al. from Maastricht University in A Hardware-Based Multi-Stage Dynamic Power Management Architecture for Autonomous Low-Light Operation, achieves ultra-low quiescent currents of 452nA, enabling truly autonomous sensor nodes.

The future is bright, with innovations like Low Latency Gaze Tracking via Latent Optical Sensing by Yidan Zheng et al. from KAUST demonstrating sub-4ms latency with 4096x less data by directly acquiring latent features, and CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection by Boran Zhao et al. from Xi’an Jiaotong University offering energy-efficient dataset selection for large multimodal models. Even the very design of accelerators is becoming autonomous with A3D: Agentic AI flow for autonomous Accelerator Design from Abinand Nallathambi et al. at Purdue University, which uses multi-agent AI to generate accelerators with 3.5x power advantage, demonstrating the transformative potential of AI in its own creation. As AI continues its rapid evolution, these concerted efforts towards energy efficiency will be critical in making powerful, intelligent systems both practical and planet-friendly.

Share this content:

mailbox@3x Energy Efficiency in AI/ML: From Silicon to Networks, Major Leaps Towards Sustainable Computing
Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Spread the love

Post Comment