Sample Efficiency: Unlocking the Future of AI with Less Data, More Impact

Latest 50 papers on sample efficiency: Feb. 14, 2026

The quest for greater sample efficiency is a central challenge and a holy grail in modern AI/ML. As models grow larger and tasks become more complex, the demand for vast amounts of data for training and generalization becomes a bottleneck. Researchers are actively pursuing innovative ways to reduce this data dependency, making AI more accessible, sustainable, and capable in real-world, data-scarce environments. This digest delves into recent breakthroughs, showcasing how diverse fields, from robotics to molecular design, are making strides in doing more with less.

The Big Ideas & Core Innovations

At the heart of these advancements lies a collective effort to imbue AI systems with smarter learning mechanisms, leveraging structured knowledge, advanced model architectures, and novel optimization techniques. Many papers focus on enhancing reinforcement learning (RL), a field notoriously data-hungry. For instance, Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning by Akshay Mete et al. from Texas A&M University introduces Optimistic World Models (OWMs), integrating optimistic dynamics loss into world models to dramatically improve exploration in sparse-reward environments. Similarly, ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm by Hanyong Wang and Menglong Yang from Sichuan University combines on-policy and off-policy methods in a new PPO variant to boost sample efficiency while maintaining stability.

Robotics and control systems are a major beneficiary of these innovations. The paper Accelerating Robotic Reinforcement Learning with Agent Guidance by Yijie Guo et al. (from University of Washington, UC Berkeley, ETH Zurich) presents AGPS, which automates supervision pipelines in robotic RL, bridging the sample efficiency gap between human-in-the-loop and fully automated methods. Further, JEPA-VLA: Video Predictive Embedding is Needed for VLA Models by Shangchen Miao et al. from Tsinghua University and Huawei Noah’s Ark Lab highlights the crucial role of video-based predictive embeddings like V-JEPA 2 for enhancing environment understanding and policy priors in vision-language-action (VLA) models, directly addressing poor generalization and low sample efficiency.

Beyond RL, other domains are also pushing the boundaries. In molecular optimization, Sample Efficient Generative Molecular Optimization with Joint Self-Improvement by Serra Korkmaz et al. from Helmholtz Zentrum Munchen and Technical University of Munich introduces JOINT SELF-IMPROVEMENT. This framework mitigates distribution shifts and high-variance updates in RL-based methods by using a joint generative-predictive model with self-improving sampling, leading to superior performance under limited evaluation budgets. For large language models, Reinforcement Learning with Promising Tokens for Large Language Models by Jing-Cheng Pang et al. from Huawei Technologies introduces RLPT, which improves efficiency and stability by focusing policy optimization on a subset of high-likelihood tokens, effectively reducing gradient variance.

Under the Hood: Models, Datasets, & Benchmarks

These research efforts introduce and leverage a variety of models, datasets, and benchmarks to achieve their impressive gains:

Optimistic World Models (OWMs): A plug-and-play framework built on existing world models (e.g., DreamerV3) and evaluated on challenging sparse-reward environments. Code: https://github.com/weipu-zhang/STORM
AGPS (Agent-Guided Policy Synthesis): Tested on robotic reinforcement learning tasks, demonstrating improved sample efficiency over traditional Human-in-the-Loop (HIL) methods. Code: https://agps-rl.github.io/agps
JEPA-VLA: Integrates video-based predictive representations (like V-JEPA 2) into existing VLA models, enhancing performance across robotics benchmarks and real-world tasks.
JOINT SELF-IMPROVEMENT: Utilizes a joint generative-predictive model for sample-efficient molecular optimization, outperforming state-of-the-art methods on both offline and online molecular optimization benchmarks. Code: https://github.com/schwallergroup/
RLPT: Enhances policy optimization for Large Language Models (LLMs) like Qwen series models, demonstrating superior performance on mathematical reasoning (GSM8K), coding (HumanEval), and general instruction-following (AlpacaEval) tasks. Code: https://github.com/huggingface/open-r1
Fun-DDPS: A generative framework combining function-space diffusion models with neural operator surrogates for carbon capture and storage (CCS) modeling, achieving robust forward modeling with limited data. Resources: https://arxiv.org/pdf/2602.12274
JEPA-VLA: Employs video predictive embeddings, specifically V-JEPA 2, to overcome limitations of traditional image- or language-image-based visual representations in VLA models for robotics.
Constrained Initial Representations (CIR): A framework for Temporal Difference Learning that uses the Tanh function to constrain initial representations, achieving strong empirical performance on numerous continuous control tasks. Resources: https://arxiv.org/pdf/2602.11800
FlowAdapt: A parameter-efficient domain adaptation framework based on optimal transport theory, achieving state-of-the-art performance on collaborative perception benchmarks with only 1% trainable parameters. Resources: https://arxiv.org/pdf/2602.11565
Exact Posteriors with Normalizing Flows: An oracle framework using class-conditional normalizing flows to decompose neural network error, revealing scaling laws beyond the loss curve on datasets like AFHQ and ImageNet. Code: https://github.com/TarFlow/TarFlow
Hierarchical Goal-Conditioned RL via Normalizing Flows: A data-efficient framework for hierarchical goal-conditioned reinforcement learning, useful in scenarios with sparse or missing reward/action information. Code: https://github.com/Shaswat2001/heirarchical_RL
Neuro-symbolic Action Masking (NSAM): Integrates symbolic reasoning using Probabilistic Sentential Decision Diagrams (PSDDs) into deep reinforcement learning, improving sample efficiency and reducing constraint violations. Code: https://github.com/shan0126/NSRL
MOBONS (Multi-Objective Bayesian Optimization for Networked Black-Box Systems): Extends Bayesian optimization for networked black-box systems, using graph-based representations for efficient multi-objective optimization. Code: https://github.com/PaulsonLab/MOBONS
ARO (Adaptively Rotated Optimization): A new matrix optimization paradigm based on gradient rotation, outperforming AdamW by up to 1.35x in large model pretraining. Resources: https://arxiv.org/pdf/2602.09006
AT-GRPO (Adaptive Tree-based Group Relative Policy Optimization): Part of a dialogue agent framework, addressing short-horizon biases and improving sample efficiency in long-horizon RL for dialogue. Resources: https://arxiv.org/pdf/2602.08533
Octopus: An RL rollout augmentation framework for Vision-Language Models (VLMs) that teaches self-correction, achieving state-of-the-art performance with reduced training time. Code: https://dripnowhy.github.io/Octopus/
CBS (Contextual Rollout Bandits): A scheduler for Reinforcement Learning with Verifiable Rewards (RLVR), improving training efficiency and performance through noise-aware intra-group selection and adaptive global reuse of rollouts. Code: https://github.com/buaa-cbs/CBS
Laplacian Keyboard (LK): A hierarchical framework using graph Laplacian eigenvectors as a reward basis for sample-efficient RL across tasks. Resources: https://arxiv.org/pdf/2602.07730
Sequence-to-Sequence Models for Log Parsing: Evaluates Transformers and Mamba models, finding Transformers reduce parsing error by up to 23.4%, while Mamba offers competitive accuracy at lower computational cost. Resources: https://arxiv.org/pdf/2602.07698
Continuous Program Search: Improves genetic programming efficiency by aligning mutation operators with latent behavioral geometry, demonstrating significant reductions in evaluation budget. Resources: https://doi.org/10.1145/nnnnnnn.nnnnnnn
Language Bottleneck Models (LBMs): Uses natural language as an interpretable intermediate representation for knowledge state modeling, outperforming traditional models on three datasets. Resources: https://arxiv.org/pdf/2506.16982
Deep Meta Coordination Graphs (DMCG): Leverages dynamic coordination graphs and graph convolutional networks for cooperative multi-agent RL, achieving state-of-the-art performance. Code: https://github.com/Nikunj-Gupta/dmcg-marl
Aurora: A unified training-serving system integrating speculative decoding with reinforcement learning for LLMs, achieving 1.5x speedups on frontier models. Resources: https://aurora-spec-ai.github.io/
JBR (Joint Experience Best Response): An efficient modification to PSRO that reuses shared experience across agents in multi-agent RL, enhancing sample efficiency and strategic robustness. Code: https://arxiv.org/pdf/2602.06599
Progress Constraints for RL in Behavior Trees: Introduces constraints to guide RL exploration within behavior trees, leading to faster convergence in complex environments. Resources: https://arxiv.org/pdf/2602.06525
Coupled Local and Global World Models: A framework for efficient first-order RL, demonstrating improved sample efficiency and performance. Code: https://github.com/your-organization/coupled-world-models
Stochastic Hierarchical Data-Driven Optimization: Applies Sloppy Model theory to efficiently calibrate physical models, outperforming traditional optimization techniques in sample efficiency for plasma-surface kinetics. Code: https://github.com/scikit-optimize/
Differential RL (dfPO): A novel framework that reformulates RL through continuous-time control, with pointwise convergence guarantees and competitive regret bounds for scientific computing tasks. Code: https://github.com/mpnguyen2/dfPO
Stochastic Decision Horizons (SDH): A novel approach to constrained RL using survival-weighted objectives for efficient off-policy learning while maintaining constraint compliance. Code: https://github.com/hyfydy/hyfydy
Pruning for Generalization: A transfer-oriented spatiotemporal graph framework that improves model generalization through novel pruning techniques for spatiotemporal data. Resources: https://arxiv.org/pdf/2602.04153
Off-Policy Log-Dispersion Regularization (LDR): A regularization framework for training Boltzmann generators, improving data efficiency by up to one order of magnitude in sampling from unnormalized probability densities. Resources: https://arxiv.org/pdf/2602.03729
Variance-Reduced MPPI: Leverages quadratic approximations of system dynamics to improve sample efficiency and stability in trajectory optimization for robotic systems. Code: https://github.com/MarcToussaint/robotic
Reparameterization Flow Policy Optimization (RFO): Integrates flow-based policies with reparameterization policy gradients for high sample efficiency in robotic control tasks. Code: https://github.com/rewarped/rewarped
GFlowPO: A probabilistic framework combining off-policy GFlowNet training with dynamic meta-prompt updates for sample-efficient prompt optimization in LLMs. Resources: https://arxiv.org/pdf/2602.03358
Information-Theoretic Multi-Model Fusion: An adaptive sampling framework for materials design, redefining optimization as trajectory discovery and leveraging multi-model fusion for target-oriented search. Resources: https://arxiv.org/pdf/2602.03319
SLOPE (Shaping Landscapes with Optimistic Potential Estimates): A framework for model-based RL that transforms sparse reward modeling into informative potential landscapes, enabling efficient planning and exploration. Resources: https://arxiv.org/pdf/2602.03201
GCP (Graph of Concept Predictors): An active distillation framework that externalizes LLM reasoning into a graph, enabling interpretable and efficient training for discriminative models. Code: https://github.com/Ziyang-Yu/GCP
GCR-RL: A reinforcement learning framework that enforces geometric coherence in value functions using order theory, improving sample efficiency and stability. Resources: https://arxiv.org/pdf/2602.02978
MADT (Multi-Agent Decision Transformer): Reformulates traffic signal control as a sequence modeling problem, achieving state-of-the-art performance in traffic coordination. Resources: https://arxiv.org/pdf/2602.02903
IDM-based Policies: Utilizes Inverse Dynamics Models (IDMs) to significantly improve sample efficiency in semi-supervised imitation learning compared to behavior cloning. Code: https://github.com/zuoxingdong/mazelab
EchoJEPA: A latent predictive foundation model for echocardiography, trained on 18 million videos, outperforming existing methods in LVEF estimation and RVSP prediction. Code: https://github.com/bowang-lab/EchoJEPA
CADENT: A hybrid distillation framework that unifies strategic and tactical knowledge for sample-efficient transfer in reinforcement learning, achieving 40-60% better performance than baselines. Resources: https://arxiv.org/pdf/2602.02532

Impact & The Road Ahead

These advancements in sample efficiency are not merely incremental improvements; they are foundational shifts that promise to unlock new frontiers for AI. By reducing the data burden, we can accelerate research in data-scarce domains like materials science (as shown by Technical University of Darmstadt’s work on Information-Theoretic Multi-Model Fusion for Target-Oriented Adaptive Sampling in Materials Design) and expand the deployment of complex AI systems in real-world scenarios, such as robotic manipulation of deformable objects (e.g., Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows from Stanford University et al.) and sustainable industrial processes (Multi-Objective Bayesian Optimization for Networked Black-Box Systems by Akshay Kudva et al. from The Ohio State University).

The ability to learn from fewer samples also has significant implications for ethical AI, reducing the carbon footprint of training large models, and fostering more equitable access to powerful AI technologies. The work on Language Bottleneck Models (Language Bottleneck Models for Qualitative Knowledge State Modeling by Antonin Berthon and Mihaela van der Schaar from University of Cambridge) offers a glimpse into how sample-efficient, interpretable models can revolutionize personalized education. Furthermore, the theoretical insights from papers like When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems by Junwei Su and Chuan Wu from University of Hong Kong are critical for guiding future development, ensuring that new architectures and algorithms are designed with efficiency in mind.

The road ahead involves continuous exploration of hybrid approaches, combining model-based and model-free methods, leveraging symbolic reasoning with deep learning, and integrating advanced optimization techniques across diverse problem spaces. As we continue to refine these methods, the future of AI will be characterized by intelligence that is not only powerful but also remarkably efficient and adaptable. The era of “more with less” is truly upon us, and the breakthroughs highlighted here are paving the way.

Share this content:

Spread the love

Sample Efficiency: Unlocking the Future of AI with Less Data, More Impact

Latest 50 papers on sample efficiency: Feb. 14, 2026

The Big Ideas & Core Innovations

Under the Hood: Models, Datasets, & Benchmarks

Impact & The Road Ahead

Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Post Comment Cancel reply

Latest 50 papers on sample efficiency: Feb. 14, 2026

The Big Ideas & Core Innovations

Under the Hood: Models, Datasets, & Benchmarks

Impact & The Road Ahead

Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Unsupervised Learning Unlocks New Frontiers: From Robust Anomaly Detection to Self-Evolving LLMs

Robustness Unleashed: A New Era of Resilient AI/ML Systems

Post Comment Cancel reply