Energy Efficiency in AI/ML: From Edge to Cloud, Photonics to Spikes

Latest 37 papers on energy efficiency: Apr. 25, 2026

The relentless march of AI/ML, especially with the advent of massive models like Large Language Models (LLMs), has brought the critical issue of energy consumption to the forefront. Training and deploying these models demand colossal computational resources, leading to significant energy footprints and environmental concerns. This blog post dives into recent breakthroughs from a collection of research papers that are carving new paths toward a more energy-efficient AI future, spanning innovations from novel hardware architectures to intelligent software strategies and network optimizations.

The Big Idea(s) & Core Innovations

Researchers are tackling the energy challenge from multiple angles, often leveraging the inherent characteristics of AI workloads to design more efficient systems. A prominent theme is the repurposing and optimization of hardware for specific AI tasks. For instance, “Enabling AI ASICs for Zero Knowledge Proof” by Georgia Institute of Technology, MIT, and Google introduces MORPH, which intelligently remaps high-precision Zero-Knowledge Proof (ZKP) kernels onto AI ASICs like TPUs. By transforming complex modular arithmetic into dense low-precision matrix operations and using layout-stationary dataflows, MORPH achieves up to 10× higher throughput for NTT and comparable MSM throughput compared to GPUs, effectively making AI ASICs viable ZKP accelerators.

Another significant area is photonic and neuromorphic computing, which promises radical shifts in energy efficiency by moving beyond traditional electron-based computation. The paper “Tensor Processing with Homodyne Photonic Integrated Circuits exceeds 1,000 TOPS” from Opticore Inc. and University of California, Berkeley showcases a groundbreaking homodyne photonic integrated circuit for general matrix multiplication (GEMM) achieving over 1,000 TOPS throughput and 330 TOPS/W energy efficiency. This is achieved through massive on-chip optical fanout, parallelism, and time multiplexing, demonstrating near-digital accuracy on LLMs like Qwen2.5. Complementing this, “LightMat-HP: A Photonic-Electronic System for Accelerating General Matrix Multiplication With Configurable Precision” by researchers from Australian National University and UNSW develops a hybrid photonic-electronic system using block floating-point (BFP) arithmetic and slicing-based photonic multiplication. This elegantly overcomes photonic precision limitations by decomposing high-bit-width operations into multiple low-bit sub-multiplications, significantly improving energy efficiency and latency over conventional systems. The push for brain-inspired computing is also evident: “SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression” by authors from Institute of Automation, Chinese Academy of Sciences introduces SpikeMLLM, the first spike-based framework for multimodal LLMs. It achieves 25.8x power reduction and 9.06x throughput on an RTL accelerator by employing Modality-Specific Temporal Scales and Temporally Compressed LIF mechanisms, proving that spiking neural networks can handle complex multimodal tasks with near-FP16 performance. Similarly, “Adaptive Spiking Neurons for Vision and Language Modeling” from Peking University proposes ASN/NASN, which achieves adaptive firing dynamics with efficient integer training and spike inference, reducing energy consumption by 93% compared to Qwen-1.5B while maintaining accuracy on QA tasks.

Optimizing memory and dataflow is another crucial strategy. “MemExplorer: Navigating the Heterogeneous Memory Design Space for Agentic Inference NPUs” by University of Cambridge, Imperial College London, and Microsoft introduces a memory system synthesizer for heterogeneous NPU systems. It finds that prefill benefits from large, high-bandwidth 3D-stacked SRAM, while decode prefers larger-capacity, lower-bandwidth LPDDR, achieving up to 2.3× higher energy efficiency than baseline NPUs. “AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator” from Peking University systematically explores dataflows for SRAM-based compute-in-memory (CIM) accelerators, revealing that systolic interconnects are more area-efficient and medium-sized macros offer the best energy-area trade-off. Furthermore, “GEM3D-CIM: General Purpose Matrix Computation Using 3D-Integrated SRAM-eDRAM Hybrid Compute-In-Memory-on-Memory Architecture” by University of Wisconsin Madison demonstrates a 3D-integrated SRAM-eDRAM hybrid CIM architecture capable of general matrix operations (transpose, element-wise multiplication) directly in memory, reaching 436.61 GOPS/W energy efficiency.

For edge and IoT devices, specific innovations are tailored to resource constraints. “Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design” by University of Duisburg-Essen presents an LSTM accelerator for embedded FPGAs that achieves 59.19% better energy efficiency by using 8-bit quantization and replacing complex activation functions. Another work from University of Duisburg-Essen, “Idle is the New Sleep: Configuration-Aware Alternative to Powering Off FPGA-Based DL Accelerators During Inactivity”, proposes an “Idle-Waiting” strategy for FPGA-based DL accelerators, reducing configuration energy by 40.13-fold and extending system lifetime by 12.39× for intermittent IoT workloads. In a similar vein, “Towards Auto-Building of Embedded FPGA-based Soft Sensors for Wastewater Flow Estimation” by the same group optimizes hardware for soft sensors on smaller FPGAs, achieving micro-watt static power consumption. For wireless communication, “Generalized Two-Dimensional Index Modulation in the Code–Spatial Domain for LPWAN” by Sun Yat-sen University and Techphant Technologies introduces a generalized code-index modulation (CIM) transceiver for LPWANs, saving up to 57% energy by implicitly transmitting index bits through antenna and spreading sequence selection. Benchmarking “On the Practical Performance of Noise Modulation for Ultra-Low-Power IoT” from National Institute of Telecommunications (Inatel) identifies critical “energy crossover distances” where simpler non-coherent NoiseMod becomes less efficient than coherent schemes, providing crucial design insights for ultra-low-power IoT.

Lastly, software and system-level optimizations prove vital. “Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized” from University of Kaiserslautern-Landau and DFKI demonstrates that decentralized (federated) learning can reduce overall energy consumption by up to 70% in 6G IoT networks, achieving comparable predictive accuracy to centralized methods. “Are Large Language Models Economically Viable for Industry Deployment?” by researchers from DSEU-Okhla, Macquarie University, and Stanford University highlights that compact models (< 2B parameters) are the efficiency frontier for LLM deployment on legacy hardware, showing that QLoRA can surprisingly increase adaptation energy by 7x despite memory footprint reduction. “Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs” from University of Virginia introduces NPUMoE, a runtime inference engine for MoE LLMs on Apple Neural Engine, achieving 1.81x–7.37x energy efficiency by offloading dense computations to the NPU while using CPU/GPU for dynamic routing. “ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving” by KAIST and Samsung Electronics proposes ELMoE-3D, a hybrid-bonding framework that unifies cache-based acceleration and speculative decoding for MoE LLMs, achieving 4.4x energy efficiency gains. For High-Performance Computing (HPC), “Towards Energy Efficient Co-Scheduling in HPC” from University of Illinois Chicago and Argonne National Laboratory presents EcoSched, an online energy-aware co-scheduler for multi-GPU HPC systems that achieves up to 14.8% energy savings by jointly optimizing GPU-count selection and application co-scheduling.

Under the Hood: Models, Datasets, & Benchmarks

These advancements are built upon and validated through a rich ecosystem of specialized models, datasets, and benchmarks:

Hardware/System Benchmarks & Tools:
- EDGE-EVAL: A new lifecycle benchmarking framework for LLMs, evaluating economic viability, intelligence-per-watt, system density, cold-start tax, and quantization fidelity. (“Are Large Language Models Economically Viable for Industry Deployment?” – Code: https://github.com/Abdullah4152/EDGE-EVAL)
- DEEP-GAP: Benchmarking methodology for GPU inference performance, comparing NVIDIA T4 and L4 GPUs across precision modes. (“DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance” – Code: https://github.com/kathiravan-palaniappan/DEEP-GAP)
- ElasticAI-Creator: An automated toolchain for deploying deep learning models on embedded FPGAs. (“Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design” and “Towards Auto-Building of Embedded FPGA-based Soft Sensors for Wastewater Flow Estimation” – Code: https://github.com/es-ude/elastic-ai.creator)
- CodeCarbon: Library for energy consumption measurement in AI training. (“Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized”)
- vLLM: For LLM inference serving. (“Are Large Language Models Economically Viable for Industry Deployment?”)
- Pynvml: For GPU power telemetry. (“Are Large Language Models Economically Viable for Industry Deployment?”)
Specific Models & Frameworks:
- LLaMA, Qwen, Phi-3.5-MoE-Instruct: Key LLMs used for benchmarking deployment viability and NPU inference. (“Are Large Language Models Economically Viable for Industry Deployment?”, “Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs”, “ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving”, “Tensor Processing with Homodyne Photonic Integrated Circuits exceeds 1,000 TOPS”)
- SpikingJelly: Open-source platform for spike-based intelligence. (“Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension”)
- CRONet: A hybrid CNN-RNN for topology optimization, accelerated on AMD Versal AIE-ML. (“Accelerating CRONet on AMD Versal AIE-ML Engines”)
- NPUMoE: Inference engine for MoE LLMs on Apple Neural Engine. (“Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs”)
- MORPH: Framework for ZKP acceleration on AI ASICs. (“Enabling AI ASICs for Zero Knowledge Proof” – Code: https://github.com/EfficientPPML/MORPH)
- Overmind NSA: Unified neuro-symbolic AI architecture using Padé approximations. (“Overmind NSA: A Unified Neuro-Symbolic Computing Architecture with Approximate Nonlinear Activations and Preemptive Memory Bypass”)
- SpikeMLLM: First spike-based framework for multimodal LLMs. (“SpikeMLLM: Spike-based Multimodal Large Language Models via Modality-Specific Temporal Scales and Temporal Compression”)
- WirelessAgent: AI-enabled agent for OFDMA wireless resource allocation. (“WirelessAgent: A Unified Agent Design for General Wireless Resource Allocation Problem without Current Channel State Information”)
- IMU-to-4D: Framework repurposing LLMs for non-visual spatiotemporal understanding from IMU sensors. (“Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs” – Project page: https://tianhang-cheng.github.io/IMU4D/)
- EcoSched: Online energy-aware co-scheduler for multi-GPU HPC systems. (“Towards Energy Efficient Co-Scheduling in HPC”)
- R2NN (Resonant Recurrent Neural Network): Fully analog neural network via metacircuit. (“Fully Analog Resonant Recurrent Neural Network via Metacircuit”)
- ImageHD: FPGA accelerator for on-device continual learning using hyperdimensional computing. (“ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing”)
Datasets & Environments:
- MotionMillion, LINGO, HUMOTO, DIP-IMU, IMUPoser, HumanML3D, ParaHome, AMASS: Datasets for 4D human-scene understanding. (“Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs”)
- CORe50, CIFAR-10, CIFAR-100, ImageNet: Standard visual recognition benchmarks. (“ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing”, “Adaptive Spiking Neurons for Vision and Language Modeling”, “HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming”)
- GLUE, XSum, SQuAD, UltraChat, MT-Bench, GSM8K, Alpaca, HumanEval: Language understanding and generation benchmarks. (“Are Large Language Models Economically Viable for Industry Deployment?”, “Adaptive Spiking Neurons for Vision and Language Modeling”, “ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving”)
- UMAR unit (NEST building, Empa campus): Real-world living lab data for building energy control. (“Safe Deep Reinforcement Learning for Building Heating Control and Demand-side Flexibility”)
- SLKI project (German railway infrastructure): Real-world testbed for 6G IoT energy impact. (“Towards Energy Impact on AI-Powered 6G IoT Networks: Centralized vs. Decentralized”)
- Madrid Grid urban scenario (3D-Ray Tracing): Realistic radio channel models for Massive MIMO. (“Impact of Nonlinear Power Amplifier on Massive MIMO: Machine Learning Prediction Under Realistic Radio Channel”)
- OGBN-Products, GRCh38 human genome: Graph datasets for genomic and graph-based dynamic programming. (“GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming”)

Impact & The Road Ahead

These advancements herald a future where AI/ML is not only powerful but also sustainable. The shift towards neuromorphic and photonic computing promises orders-of-magnitude improvements in energy efficiency, potentially enabling AI directly within sensors and on-device environments, as shown by SpikeMLLM and LightMat-HP. The emphasis on hardware-software co-design is critical, ensuring that algorithms are optimized for the underlying physics of new compute substrates, as demonstrated by MORPH for ZKP on AI ASICs and AccelCIM for CIM dataflows.

For edge AI and IoT, the focus on tailored hardware (FPGA accelerators, specialized LPWAN transceivers), intelligent idle management, and decentralized learning paradigms (federated learning) will unlock unprecedented capabilities for smart cities, industrial automation, and environmental monitoring, making ubiquitous AI a reality without the prohibitive energy cost. The ability to reconstruct 4D human-scene understanding from IMU sensors alone, as presented by “Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs”, opens doors for privacy-preserving and ultra-low-power context awareness in daily life.

In cloud and data center environments, optimizations like heterogeneous memory architectures (MemExplorer), efficient MoE inference (NPUMoE, ELMoE-3D), and energy-aware co-scheduling (EcoSched) are crucial for managing the growing demand for large model inference and HPC workloads. The understanding that economic viability and energy efficiency are intertwined (as highlighted by EDGE-EVAL) will drive the adoption of more compact and specialized models, shifting the paradigm from “bigger is always better” to “smarter and leaner is more sustainable.”

Looking ahead, the integration of AI for optimizing complex systems like integrated energy grids (e.g., “End-to-End Learning-based Operation of Integrated Energy Systems for Buildings and Data Centers”) and 6G wireless networks (e.g., “Aerial Multi-Functional RIS in Fluid Antennas-Aided Full-Duplex Networks: A Self-Optimized Hybrid Deep Reinforcement Learning Approach”, “Generative Learning Enhanced Intelligent Resource Management for Cell-Free Delay Deterministic Communications”, and “WirelessAgent: A Unified Agent Design for General Wireless Resource Allocation Problem without Current Channel State Information”) will be paramount. These papers collectively paint a picture of an AI future that is not only powerful and intelligent but also profoundly energy-conscious and sustainable, a critical step towards mitigating AI’s environmental impact and fostering its broader adoption.

Share this content:

Spread the love

Energy Efficiency in AI/ML: From Edge to Cloud, Photonics to Spikes

Latest 37 papers on energy efficiency: Apr. 25, 2026

The Big Idea(s) & Core Innovations

Under the Hood: Models, Datasets, & Benchmarks

Impact & The Road Ahead

Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Post Comment Cancel reply

Latest 37 papers on energy efficiency: Apr. 25, 2026

The Big Idea(s) & Core Innovations

Under the Hood: Models, Datasets, & Benchmarks

Impact & The Road Ahead

Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Fine-Tuning Frontiers: Charting New Territories in LLM Efficiency, Safety, and Intelligence

Zero-Shot Learning Unlocked: From Ambiguous Labels to Code Summarization Brilliance

Post Comment Cancel reply