Loading Now

Class Imbalance: Bridging the Gap in AI/ML from Healthcare to Finance

Latest 33 papers on class imbalance: May. 23, 2026

Class imbalance remains one of the most persistent and pervasive challenges in AI and Machine Learning, particularly in real-world applications where rare events, like fraud, disease, or system failures, are often the most critical to detect. This fundamental issue, where one class significantly outnumbers others, can lead to models that are biased, underperforming on minority classes, and ultimately unreliable. Fortunately, recent research has unveiled innovative strategies, from novel data augmentation techniques to sophisticated model architectures and optimization approaches, pushing the boundaries of what’s possible in tackling this ubiquitous problem.

The Big Idea(s) & Core Innovations

At the heart of many recent breakthroughs is the recognition that addressing class imbalance requires a multi-faceted approach, often integrating data-level, algorithm-level, and optimization-level solutions. For instance, in sensitive domains like medical image analysis, explicit supervision is proving crucial. The German Research Center for Artificial Intelligence (DFKI), in their paper “SegGuidedNet: Sub-Region-Aware Attention Supervision for Interpretable Brain Tumour Segmentation”, introduces a lightweight SegAttentionGate module. This innovation directly supervises the model’s attention for specific tumor sub-regions, even severely imbalanced ones, leading to both higher accuracy and built-in interpretability without heavy post-hoc methods.

Similarly, in embodied AI and robotics, preemptive action verification is gaining traction. Researchers from Beihang University and Tsinghua University propose “Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts”. This system preemptively assesses action validity, filtering low-quality actions before execution. A key insight here is that an imbalance-aware loss combined with probability threshold calibration drastically improves action validity discrimination, significantly mitigating error propagation in complex robotic tasks.

Several papers highlight how problem re-framing or novel architectural designs can turn the tide against imbalance. For video capsule endoscopy, Kyungpook National University’s “GALAR-TemporalNet v2: Anatomy-Guided Dual-Branch Temporal Classification with Bidirectional Mamba and Dual-Graph GCN for Video Capsule Endoscopy—after competition results” proposes a dual-branch architecture that models pathology as a deviation from healthy anatomy prototypes. This disentanglement offers a more principled way to detect subtle lesions, especially rare ones, by isolating their signals from the dominant normal textures. This aligns with the understanding that pathology is inherently contextual.

For tabular data, a significant revelation comes from Arizona State University in their work “Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification”. They demonstrate that Prior-Data Fitted Networks (PFNs), while powerful, become majority-biased under imbalance. Their simple yet effective solution? Thresholding the classification probability to the minority class prior, which dramatically boosts minority-class performance. This underscores that understanding a model’s calibration characteristics is key to effective imbalance correction.

Beyond data balancing, the nature of model learning itself is being re-evaluated. University of Athens and NSCR Demokritos, in “Neural Collapse by Design: Learning Class Prototypes on the Hypersphere”, unify classifier and supervised contrastive learning as prototype contrast on the unit hypersphere. Their proposed Normalized Temperature-scaled Cross Entropy (NTCE) and Negatives Only Normalization Loss (NONL) losses directly optimize for the Neural Collapse geometry, leading to significant gains in transfer learning and under severe class imbalance, and importantly, proving that linear probing in SCL is often redundant.

Under the Hood: Models, Datasets, & Benchmarks

These advancements aren’t just theoretical; they’re powered by new or strategically utilized models, carefully curated datasets, and robust benchmarks. Many papers emphasize the importance of data-level solutions, including synthetic data generation and intelligent sampling:

  • SegGuidedNet leverages standard 3D residual encoder-decoder networks and is rigorously benchmarked on BraTS 2021 and BraTS 2023 GLI, demonstrating competitive single-model performance against ensemble methods.
  • Pre-VLA employs an efficient multimodal backbone and dual-branch prediction head, evaluated on the LIBERO robotic manipulation benchmark.
  • GALAR-TemporalNet v2 makes extensive use of DINOv3 features and integrates Bidirectional Mamba and Dual-Graph GCNs, trained and evaluated on the large Galar dataset for video capsule endoscopy.
  • Prior-Data Fitted Networks (PFNs) are the core model in “Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification”, evaluated on 11 binary classification datasets from the OpenML-CC18 benchmark.
  • Q-SYNTH, from Hassan II University of Casablanca, introduces a hybrid classical-quantum GAN where a parameterized quantum circuit serves as the generator for synthesizing minority-class fraud samples, evaluated on the Credit Card Fraud Detection dataset from the IEEE UCI repository.
  • UOTIP (Unbalanced Optimal Transport Map for Unpaired Inverse Problems) uses neural optimal transport techniques to learn a mapping between noisy and clean image distributions, showcasing its robustness to noise and class imbalance in tasks like deblurring and super-resolution.
  • yvsoucom-iterkit (from Hangzhou Normal University) is a log-driven AutoML framework, demonstrating its utility on healthcare datasets like Pima Indians Diabetes and Healthcare Stroke Dataset. Its code is available at https://github.com/yvsoucom/itekit-examples.
  • SEABAD, a groundbreaking dataset of 50,000 3-second audio clips from tropical soundscapes, developed by Universiti Malaya, addresses the critical need for region-specific bird activity detection data. It can be accessed at https://zenodo.org/records/18290494 with code at https://github.com/mun3im/seabad.
  • Episodic Sampling for medical image segmentation, explored by Amsterdam UMC, works with various models on the SAROS dataset (https://www.nature.com/articles/s41597-024-03337-6). Their code is at https://github.com/iasonsky/episodic-sampling.
  • Neural Collapse by Design evaluates its NTCE and NONL losses on ImageNet-1K, CIFAR-10/100, ImageNet-100, including their long-tailed variants. Code: https://github.com/pakoromilas/nc_by_design.
  • CELM (Data-Free Client Contribution Estimation via Logit Maximization) for federated learning, from MBZUAI, is validated on FashionMNIST, CIFAR-10, FedISIC, and EMNIST, demonstrating its ability to handle label skew.
  • Tabular Foundation Models (TFMs) for credit risk prediction are benchmarked on Home Credit and Lending Club datasets. The work “Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models” highlights that context construction strategy for in-context learning is more impactful than TFM architecture choice itself.
  • Domain Incremental Learning for chest X-ray analysis, by Korea International School, uses CNNs and MobileNetV2, tested on the domain-shifted PneumoniaMNIST dataset.
  • Synthetic Data for Facial Expression Recognition (FER), researched by Istanbul Technical University, leverages models like Llama-3-8B-Instruct for context-aware generation, evaluated on AffectNet, RAF-DB, and FER2013 datasets. Code: https://github.com/AliAZ98/SyntFER.
  • The Learnability Gap in Medical Latent Diffusion (FAU Erlangen-Nürnberg, Imperial College London) analyzes autoencoder families on MIMIC-CXR, ISIC 2019, CT-RATE, and Cardium datasets.
  • Privacy-Preserving Generation Fraud Detection in federated learning for PV systems, from Monash University, fuses PV and weather data, using prototype-based regularization for imbalance. It uses Ausgrid’s Solar Home Electricity Data.
  • Modulation Feature Enhancement with Multi-Stage Attention Network (Chinese Academy of Sciences) uses 1-D CNNs and various attention mechanisms on the ShipsEar dataset (https://doi.org/10.1016/j.apacoust.2016.06.008) for underwater acoustic target recognition.
  • LLM-Driven Data Augmentation for Cognitive Score Prediction (Télécom SudParis, Nara Institute of Science and Technology) uses GPT-5 conditioned on written narratives to generate synthetic oral-style speech, tested on the small, imbalanced GSK2018-A Japanese elderly speech corpus.
  • ToxiAlert-Bench (Zhejiang University) is a new, large-scale audio dataset (32,561 clips) for toxic speech detection distinguishing textual vs. paralinguistic origins, used with a dual-head neural network built on wav2vec 2.0. Code: https://github.com/yiliang-la/ToxiAlert.
  • DBS-Adam (Dynamic Batch-Sensitive Adam) is a novel optimizer (University of Cape Coast) for Bi-LSTM networks, combined with SMOTE-ENN and Focal Loss for vehicular accident injury severity prediction on a dataset from Addis Ababa, Ethiopia.
  • Mitigating Data Scarcity in Psychological Defense Classification (VinUniversity) uses Llama-3-8B-Instruct for context-aware synthetic augmentation, with a hybrid model combining MentalRoBERTa and clinical features from the DMRS framework. Code: https://github.com/htdgv/CASA-PDC.
  • Graph-Based Financial Fraud Detection (Brandeis University) employs graph neural networks on the IEEE CIS Fraud Detection dataset.
  • SEMIR (Semantic Minor-Induced Representation Learning on Graphs), by University of Missouri-Kansas City, is a graph minor-based framework for medical image segmentation, tested on BraTS 2021, KiTS23, and LiTS benchmarks.
  • Few-Shot Synthetic Data Generation (National University of Science and Technology MISIS) fine-tunes a LoRA adapter on FLUX.2-dev diffusion models, applied to NIH ChestX-ray14 and Magnetic Tile Surface Defect datasets.
  • OverNaN (Australian National University) extends SMOTE, ADASYN, and ROSE to handle NaNs directly, evaluated on the Neutral Graphene Oxide Data Set and several OpenML datasets. Code: https://github.com/amaxiom/OverNaN.
  • Optimal Representations for Generalized Contrastive Learning provides theoretical proofs related to InfoNCE and Neural Collapse, with experimental verification using CVX modeling system.
  • Multilingual Foundation Model for reclamation detection (Kitami Institute of Technology) uses XLM-RoBERTa with GPT-4o-mini back-translation, evaluated on the MultiPRIDE dataset (https://multipride-evalita.github.io/). Code: https://github.com/rbg-research/MultiPRIDE-Evalita-2026.
  • RobustLT (Sichuan University) is a plug-and-play framework for long-tail adversarial training, compatible with various adversarial training algorithms, tested on CIFAR10-LT, CIFAR100-LT, and TinyImageNet-LT datasets. Code: https://github.com/zhang-lilin/RobustLT.

Impact & The Road Ahead

The collective efforts in these papers paint a promising picture for tackling class imbalance. From enabling more reliable medical diagnoses and safer robotic systems to more accurate financial fraud detection and robust natural language understanding in diverse languages, the implications are far-reaching. The emphasis on interpretability, privacy-preservation (as seen in federated learning frameworks like CELM and PVG-FD), and reproducibility (as championed by yvsoucom-iterkit) highlights a maturation in the field, moving beyond raw accuracy to build trustworthy AI systems.

Several themes suggest future directions:

  1. Hybrid Approaches: The synergy between classical techniques (like thresholding, resampling) and advanced deep learning, as well as quantum-classical integration (Q-SYNTH), hints at a future where the best of all worlds are combined.
  2. Theory-Driven Design: A deeper theoretical understanding of model behavior under imbalance, particularly in contrastive learning and neural collapse, is leading to fundamentally more robust architectures and optimization strategies.
  3. Data-Centric AI: The realization that how data is presented and augmented (context construction for TFMs, NaN-aware oversampling, LLM-driven synthesis) often matters more than model architecture alone will drive innovation in data engineering and curation.
  4. Specialized Adaptations: Generic solutions often fall short. The success of anatomy-guided models for VCE, tropical soundscapes for bird detection, and domain-incremental learning for medical X-rays underscore the need for domain-specific adaptations.

The journey to truly robust and fair AI systems in the face of class imbalance is ongoing, but these recent advancements provide compelling evidence that we’re on the right track, moving towards a future where AI can reliably serve all, not just the majority.

Share this content:

mailbox@3x Class Imbalance: Bridging the Gap in AI/ML from Healthcare to Finance
Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Spread the love

Post Comment