Loading Now

O(N) and Beyond: Recent Leaps in Computational Efficiency Across AI/ML

Latest 55 papers on computational complexity: May. 16, 2026

The quest for efficiency is a perpetual frontier in AI/ML. As models grow larger and datasets more expansive, the computational demands can quickly become prohibitive, limiting deployment to specialized hardware or making rapid experimentation impossible. This challenge is particularly acute when dealing with quadratic complexity bottlenecks, often found in foundational components like self-attention or matrix operations. Recent research, however, is pushing the boundaries, offering ingenious solutions that achieve linear, near-linear, or even sublinear complexities, making powerful AI more accessible and scalable than ever before.

The Big Idea(s) & Core Innovations

The central theme across these papers is the creative dismantling of computational bottlenecks through algorithmic and architectural innovation. We’re seeing a fundamental shift away from operations with quadratic scaling (like traditional self-attention) towards more efficient, linear-time alternatives, coupled with novel problem reformulations and hardware-aware designs.

For instance, the groundbreaking work in Ister: Linear Transformer for Efficient Multivariate Time Series Forecasting by Cao et al. (Hong Kong University of Science and Technology) introduces Dot-attention, which replaces computationally intensive matrix multiplications with element-wise operations. This radical change reduces Transformer complexity from O(N^2) to O(N), making it viable for long multivariate time series. Similarly, the Toeplitz MLP Mixer (TMM), presented by Badger and Roland (IBM, AE Studio) in Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models, achieves O(N log N) training complexity via FFT-based Toeplitz matrix multiplication, demonstrating superior information retention compared to other sub-quadratic models.

In the realm of Vision Transformers, Carrigg et al. (Radboud University) in Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation tackle the LayerNorm global reduction bottleneck with genetic programming. They evolve heterogeneous, layer-specific scalar functions that reduce off-chip memory access by half while maintaining accuracy. Further advancing efficient vision, Li et al. (Tianjin University) introduce LSFormer in Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention, a Spiking Neural Network (SNN) that replaces global self-attention with Local Structure-Aware Spiking Self-Attention, cutting complexity from O(N^2D) to O(ND).

State Space Models (SSMs) are emerging as a powerful alternative to Transformers, especially for long sequence modeling with linear complexity. Cheng et al. (Technical University of Munich) pioneer MambaPanoptic in MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation, the first fully Mamba-based architecture for panoptic segmentation, achieving competitive performance with O(N) scaling. Extending this, Zhang et al. (Northwestern Polytechnical University) propose Attention-Mamba in Attention-Mamba: A Mamba-Enhanced Multi-Scale Parallel Inference Network for Medical Image Segmentation, a CNN-Mamba hybrid for medical image segmentation with multi-scale parallel inference. Yu and Qian (Harbin Institute of Technology) present EmambaIR in EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction, employing Top-k Sparse Attention and Gated State Space Modules for event-guided image reconstruction with linear complexity. Finally, Zhang et al. (Xi’an Jiaotong University) bring SSMs to image compression with SAMIC in SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression, using a semantic-aware scanning strategy and SVD-inspired redundancy reduction.

Beyond neural network architectures, efficiency gains are found in diverse areas. Zhu et al. (University of Oxford), in From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction, introduce MedTPE for lossless prompt compression in LLMs by merging frequent medical token pairs, reducing token length by 31% and inference latency by up to 63%.

In communication systems, Park et al. (DGIST) deliver low-complexity blind SNR estimators for mmWave MIMO in Low-Complexity Blind SNR Estimator for mmWave Multi-Antenna Communications and a beamspace channel denoiser in Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs. Both exploit beamspace sparsity for efficient, single-sample estimation and near-linear denoising without iterative optimization, achieving real-time FPGA implementation.

Under the Hood: Models, Datasets, & Benchmarks

These innovations are often validated against challenging real-world scenarios and public benchmarks, demonstrating their practicality and scalability.

  • Ister: Achieves state-of-the-art on 11 real-world datasets including ETT, Electricity, Traffic, Weather, and PEMS, outperforming iTransformer.
  • LSFormer: Validated on CIFAR-10, CIFAR-100, Tiny-ImageNet, DVS-Gesture, and N-CALTECH101 datasets, showcasing broad applicability for SNNs.
  • MambaPanoptic: Benchmarked on Cityscapes and COCO for panoptic segmentation, surpassing Mask2Former with fewer parameters.
  • Attention-Mamba: Evaluated on Synapse, ACDC, ISIC-2018, and PH2 datasets across various medical imaging modalities.
  • EmambaIR: Tested on GoPro, Adobe240, SDSD, H2D datasets for image deblurring, deraining, and HDR enhancement.
  • SAMIC: Trained on LSDIR and evaluated on Kodak, CLIC2020, and UHD datasets for perceptual image compression.
  • MedTPE: Utilizes MIMIC-IV, EHRSHOT, ARC-Challenge, ECTSum, and CMedQA2 for clinical prediction and general NLP tasks.
  • Low-Complexity Blind SNR Estimator & Beamspace Channel Denoiser: Performance validated with QuaDRiGa mmMAGIC UMi channel model and FPGA implementations on AMD-Xilinx Kintex UltraScale+ KCU116.
  • FunnelNet: Demonstrated on the CirCor DigiScope Phonocardiogram dataset for real-time heart murmur detection with TinyML on Raspberry Pi and Android.
  • TARNet: Benchmarked on VoxCeleb1 and LibriSpeech for speaker identification, outperforming ECAPA-TDNN.
  • ConvRec: Evaluated on four benchmark datasets (Beauty, Games, Fashion, CD) for sequential recommendation.
  • GPU Boruta: Utilizes UCI Machine Learning Repository datasets (CT slices, Online News Popularity, Appliances Energy Prediction, MADELON) for high-dimensional feature selection.
  • HE-PIM: Characterizes homomorphic operations on a real-world UPMEM Processing-in-Memory (PIM) system.
  • YOLOv3-Tiny FPGA: Implemented on ZYNQ-XC7Z035 FPGA for object detection using the WIDER FACE dataset.
  • BatMIL: Evaluated on seven Whole-Slide Image (WSI) datasets (CAMELYON16/17, PANDA, TCGA-BLCA/BRCA/CESC/NSCLC) across six cancer types.
  • SVAkADD: Empirically evaluated across multiple datasets (Titanic, Wine, Adult, ImageNet, Big Five, Breast Cancer) for Shapley value approximation.

Several papers also provide open-source code for wider adoption and experimentation:

Impact & The Road Ahead

The impact of this research is profound, touching upon the very foundations of AI accessibility, deployability, and even security. The drive towards O(N) or better complexity isn’t just about speed; it’s about enabling real-time applications on edge devices, democratizing access to powerful models, and reducing the carbon footprint of AI training. We’re seeing:

  • Hardware-Aware AI: Papers like those on mmWave SNR estimation, FPGA-based YOLOv3-Tiny, and FunnelNet demonstrate how co-designing algorithms with hardware constraints in mind can lead to ultra-low latency and power-efficient solutions for edge AI in critical applications like healthcare and autonomous driving. The work on HE-PIM by Gupta et al. (ETH Zürich, Barcelona Supercomputing Center) highlights the potential of Processing-in-Memory for homomorphic encryption if hardware support for 64-bit modular multiplication is integrated, hinting at a future of privacy-preserving computing.
  • Democratization of Large Models: Innovations like MedTPE and compact sign language translation models reduce the overhead of LLMs, making them more practical for domain-specific tasks and resource-constrained environments. The O(N) complexity of Ister and Mamba-based vision models means that high-resolution inputs and long sequences, once computationally prohibitive, are now within reach.
  • Theoretical Foundations & Hardness Boundaries: Papers investigating the computational complexity of Gallai Vertex problem (The Gallai Vertex Problem is Θ₂^p-Complete by Nikabadi et al.), quantum state isomorphism (Quantum state isomorphism problems for groups by Gheorghiu et al.), and graph similarity (Three Hardness Results for Graph Similarity Problems by Sun and Vagnozzi) provide crucial insights into the limits of efficient computation, guiding where to focus efforts for approximation algorithms versus exact solutions.
  • Enhanced Interpretability & Robustness: SVAkADD offers a more efficient path to Shapley value approximation, a key tool for XAI. The exploration of “deductive errors” in PAC teaching by Telle et al. (University of Bergen) moves us toward more robust machine teaching, especially for learners like LLMs.

The road ahead is exciting. We can anticipate further advancements in hybrid architectures that combine the strengths of different modeling paradigms, novel algorithmic primitives that push complexity boundaries even lower, and a deeper integration of theoretical insights into practical, deployable AI systems. The relentless pursuit of computational efficiency is not just an engineering challenge; it’s a driving force toward a more intelligent and sustainable future for AI.

Share this content:

mailbox@3x O(N) and Beyond: Recent Leaps in Computational Efficiency Across AI/ML
Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Spread the love

Post Comment