Few-Shot Learning: Navigating Data Scarcity with Smarter Models and Data Strategies
Latest 53 papers on few-shot learning: Aug. 17, 2025
Few-shot learning (FSL) stands at the forefront of AI innovation, promising to unlock powerful capabilities in scenarios where labeled data is scarce – a common reality in many real-world applications, from medical diagnostics to industrial defect detection. Imagine training a robust AI model with just a handful of examples, rather than thousands. This dream is becoming a reality as researchers push the boundaries of how models learn and generalize from minimal data. Recent breakthroughs, as highlighted by a collection of cutting-edge papers, reveal exciting new directions in FSL, spanning novel architectural designs, advanced prompt engineering, and ingenious data augmentation techniques.
The Big Idea(s) & Core Innovations
The overarching theme in recent FSL research is to build more adaptable and efficient models that can generalize effectively from limited examples. One significant thrust involves leveraging pre-trained models and adapting them cleverly. For instance, Semantic Prompt Tuning for Vision-Language Models by Xiao Shi, Yangjun Ou, and Zhenzhong Chen from Wuhan Textile University and Wuhan University introduces SemPT, which enhances vision-language models’ transferability by using shared visual attributes as ‘semantic bridges’ for knowledge transfer between seen and unseen categories. This moves beyond simple class tokens, as further explored in the paper, Beyond Class Tokens: LLM-guided Dominant Property Mining for Few-shot Classification, which uses large language models (LLMs) to identify dominant class properties for improved generalization.
Another innovative avenue focuses on feature refinement and noise reduction. Stochastic-based Patch Filtering for Few-Shot Learning and Slot Attention-based Feature Filtering for Few-Shot Learning, both by Javier Ródenas, Eduardo Aguilar, and Petia Radeva from the Universitat de Barcelona, introduce SPFF and SAFF, respectively. SPFF uses a stochastic mechanism to select class-specific patches, filtering irrelevant features in food image classification, while SAFF applies slot attention to focus on discriminative features in both support and query images, reducing noise for general classification tasks. Further improving feature learning, Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning by Tianjiao Jiang et al. from the Australian Institute for Machine Learning proposes the Causal CLIP Adapter (CCA). CCA leverages Independent Component Analysis (ICA) to disentangle features from CLIP, combined with bidirectional cross-modal alignment, making models more robust to distribution shifts.
Data augmentation and synthesis also play a crucial role. PQ-DAF: Pose-driven Quality-controlled Data Augmentation for Data-scarce Driver Distraction Detection by X. Han et al. introduces a pose-driven framework to generate high-quality synthetic images, tackling data scarcity in critical applications like driver distraction detection. In the realm of foundation models, GraphProp: Training the Graph Foundation Models using Graph Properties by Ziheng Sun et al. from The Chinese University of Hong Kong, Shenzhen, captures cross-domain structural information through graph invariants, leading to better generalization even with unlabeled or synthetic graphs.
For LLMs, effective FSL involves more than just data. Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning by Kwesi Cobbina and Tianyi Zhou from the University of Maryland reveals a significant positional bias in in-context learning, showing that placing demonstrations at the start of a prompt leads to more stable and accurate outputs. Similarly, P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs by Dongjun Jang et al. from Seoul National University enhances LLMs’ phonological reasoning by integrating pedagogical strategies like scaffolding into Chain-of-Thought prompting.
Even in challenging domains like robotics, FSL is making strides. H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation by Hongzhe Bi et al. from Tsinghua University leverages human manipulation data and diffusion transformers to improve robot policy learning efficiency, especially in few-shot settings. For time series, T3Time: Tri-Modal Time Series Forecasting via Adaptive Multi-Head Alignment and Residual Fusion by Abdul Monaf Chowdhury et al. from the University of Dhaka integrates temporal, spectral, and prompt-based representations for superior forecasting, demonstrating strong generalization with minimal training data.
Under the Hood: Models, Datasets, & Benchmarks
Recent FSL advancements are often underpinned by novel architectures, specially curated datasets, and robust benchmarks. Here’s a look at some key resources:
- Architectural Innovations:
- LCN-4 (Location-aware Constellation Network): Introduced in Shallow Deep Learning Can Still Excel in Fine-Grained Few-Shot Learning by Chaofei Qi et al. (Harbin Institute of Technology), this shallow network challenges the depth-over-breadth paradigm in fine-grained FSL, using grid position encoding and frequency domain location embedding. Code: https://github.com/ChaofeiQI/LCN-4.
- PointKAN: From KAN or MLP? Point Cloud Shows the Way Forward by Yan Shi et al. (Harbin Institute of Technology, Tencent, University of Pennsylvania), PointKAN leverages Kolmogorov-Arnold Networks for efficient point cloud analysis, offering significant parameter and computational reductions. Code: https://github.com/Shiyan-cps/PointKAN-pytorch.
- GLiClass: A lightweight, generalist model for sequence classification from GLiClass: Generalist Lightweight Model for Sequence Classification Tasks by Ihor Stepanov et al. (Knowledgator Engineering, Kyiv), which bridges embedding-based methods and generative LLMs with a label-conditioned encoder. Code: https://github.com/Knowledgator/GLiClass.
- MP1: Featured in MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation by Juyi Sheng et al. (Peking University, Zhejiang University), MP1 is a MeanFlow-based framework enabling one-step policy learning with millisecond latency and improved few-shot generalization. Code: https://github.com/LogSSim/MP1.git.
- Foundation Models & Adaptation Frameworks:
- MIST (Multiple Stochastic Prompt Tuning): Presented in Multiple Stochastic Prompt Tuning for Few-shot Adaptation under Extreme Domain Shift by Debarshi Brahma and Soma Biswas (Indian Institute of Science, Bangalore), MIST adapts CLIP to extreme domain shifts using multiple learnable prompts modeled as Gaussian distributions.
- FedVLM: A federated learning framework from FedVLM: Scalable Personalized Vision-Language Models through Federated Learning by Author 1 et al. (University of Texas at Arlington), which uses personalized LoRA (pLoRA) for privacy-preserving and efficient VLM adaptation in decentralized environments.
- GLAD: From GLAD: Generalizable Tuning for Vision-Language Models by Yuqi Peng et al. (Shenzhen Institutes of Advanced Technology), GLAD is a parameter-efficient fine-tuning framework for VLMs, leveraging LoRA with gradient-based regularization for improved generalization.
- MultiADS: Introduced in MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning by Ylli Sadikaj et al. (University of Vienna, Bosch Corporate Research), MultiADS is a zero-shot learning approach for multi-type anomaly detection and segmentation, utilizing a Knowledge Base for Anomalies (KBA). Code: https://github.com/boschresearch/MultiADS.
- GOOD (Generalized Few-shot OOD Detection): A novel framework for generalized few-shot out-of-distribution detection based on Generality-Specificity Balance (GS-Balance) from Generalized Few-Shot Out-of-Distribution Detection by Pinxuan Li et al. (Tianjin University). Code: https://arxiv.org/pdf/2508.05732.
- UoMo: The first universal foundation model for mobile traffic forecasting that combines diffusion models and transformers, presented in UoMo: A Foundation Model for Mobile Traffic Forecasting with Diffusion Model by Haoye Chai et al. (Tsinghua University, China Mobile). Code: https://github.com/tsinghua-fib-lab/UoMo.
- Datasets & Benchmarks:
- M3FD (Multi-Modal Model Few-shot Dataset): Introduced in A Foundational Multi-Modal Model for Few-Shot Learning by Pengtao Dang et al. (Oregon Health & Science University), M3FD contains over 10K+ samples across vision, tables, and time-course data. Code: https://github.com/ptdang1001/M3F.
- FRB (Font Recognition Benchmark): Used in Texture or Semantics? Vision-Language Models Get Lost in Font Recognition by Zhecheng Li et al. (University of California, San Diego), FRB evaluates VLM capabilities in font recognition, revealing challenges with the stroop effect. Code: https://github.com/Lizhecheng02/VLM4Font.
- LoopDB: A new benchmarking dataset for loop closure detection in SLAM systems, introduced alongside LoopNet in LoopNet: A Multitasking Few-Shot Learning Approach for Loop Closure in Large Scale SLAM. Code: https://github.com/RovisLab/LoopDB.
- CodeMixEval: Presented in Evaluating Code-Mixing in LLMs Across 18 Languages by Yilun Yang and Yekun Chai (NUC, Baidu), CodeMixEval is an extensive framework for evaluating LLMs on code-mixed data across 18 languages.
- FMC (Formalization of Natural Language Mathematical Competition Problems): From FMC: Formalization of Natural Language Mathematical Competition Problems by Jiaxuan Xie et al. (Peking University, University of Washington), this dataset comprises 3,922 natural language-Lean pairs for autoformalization.
Impact & The Road Ahead
These advancements in few-shot learning are poised to democratize AI, making sophisticated models accessible even when extensive data collection is impractical or impossible. The ability of models like SemPT and CCA to generalize from limited examples means AI can be deployed faster in new domains, from niche medical image analysis to specialized industrial automation. The work on prompt engineering (e.g., in Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning and P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs) underscores that how we interact with and guide large models is as crucial as their underlying architecture.
Moreover, the push towards multi-modal FSL with frameworks like M3F (A Foundational Multi-Modal Model for Few-Shot Learning) and multi-operator FSL for PDEs (Multi-Operator Few-Shot Learning for Generalization Across PDE Families) hints at a future where AI can tackle highly complex, interdisciplinary problems with minimal bespoke training. The developments in efficient models like GLiClass and PointKAN-elite showcase a critical path towards deploying powerful AI on resource-constrained devices, broadening the scope of real-world applications.
However, challenges remain. As shown in Large Language Models are Unreliable for Cyber Threat Intelligence, LLMs still struggle with consistency and confidence calibration on complex, real-world data like CTI reports, even with few-shot learning. Similarly, Texture or Semantics? Vision-Language Models Get Lost in Font Recognition highlights that VLMs can be easily misled by superficial features, demonstrating that true semantic understanding with limited data is still an elusive goal. The vulnerability of LLMs to vertically aligned text manipulations (Vulnerability of LLMs to Vertically Aligned Text Manipulations) points to the need for more robust input handling.
Future research will likely delve deeper into causal inference for robust generalization, hybrid architectures combining symbolic and neural approaches, and self-supervised learning techniques that extract maximum information from unlabeled data. The goal is to build AI that is not just data-efficient but also profoundly adaptive and capable of true understanding, moving us closer to truly intelligent systems that can learn and operate in the messy, data-scarce real world. The journey of few-shot learning is truly exciting, promising to transform how we approach AI development and deployment.
Post Comment