Robustness in AI/ML: From Adversarial Defenses to Trustworthy Systems
Latest 50 papers on robustness: Dec. 21, 2025
The quest for intelligent systems capable of operating reliably in dynamic and unpredictable environments has made robustness a paramount concern in AI/ML. As models become more complex and deployed in critical applications, ensuring their resilience against various forms of perturbations, from adversarial attacks to real-world noise and non-stationarity, is no longer just an academic pursuit—it’s an absolute necessity. Recent research showcases significant strides in fortifying AI systems, offering innovative solutions across diverse domains.
The Big Idea(s) & Core Innovations
At the heart of these advancements lies a common thread: building systems that don’t just perform well, but perform reliably. A standout theme is the proactive defense against adversarial attacks, a critical challenge for deploying AI in sensitive areas. For instance, DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack by Hao Li and Yubing Ren from the Institute of Information Engineering, Chinese Academy of Sciences, introduces a novel dual-stream watermarking algorithm. This innovation aims to provide reliable detection and traceability of LLM-generated content against both paraphrase and spoofing attacks, moving beyond the limitations of existing methods that often inadvertently facilitate misleading attribution. Similarly, ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples, developed by Yunfei Yang and collaborators from the Chinese Academy of Sciences and Nankai University, takes a frequency-domain approach to create compressed, covert, and attack-resistant watermarks, enhancing robustness through simulated attacks during training.
Beyond watermarking, other works focus on direct adversarial defense and resilience. The paper TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models by Zhiwei Li and colleagues from the Chinese Academy of Sciences and Tsinghua University, proposes a lightweight, retraining-free defense mechanism. It uses trainable padding to restore attention patterns in Vision-Language Models (VLMs) during inference, significantly improving robustness against attacks without altering the model’s architecture. Furthering VLM robustness, MoAPT: Mixture of Adversarial Prompt Tuning for Vision-Language Models from researchers at Beihang University and A*STAR, presents a novel prompt tuning method that utilizes multiple learnable prompts and a conditional weight router to achieve better generalization across diverse adversarial examples. Complementing these, DeContext as Defense: Safe Image Editing in Diffusion Transformers by Linghui Shen and team from The Hong Kong Polytechnic University, tackles unauthorized image editing in diffusion models by disrupting contextual information flow with attention-based perturbations, preserving visual quality while blocking malicious edits. These innovations highlight a shift toward more sophisticated, context-aware defense mechanisms.
Another critical area is ensuring the reliability and generalizability of AI systems in complex real-world scenarios. KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling by Lei Wang et al. from Shaanxi University of Science and Technology, introduces a Kalman-optimal selective state space model for robust long-term sequence forecasting, demonstrating significant accuracy improvements in noisy and sparse data environments. In autonomous driving, the survey Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future highlights how integrating language (VLMs) improves reasoning and interpretability, moving beyond black-box operations for safer systems. The framework TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries by Jiayang Yang and Zhixing Cao et al. enables LLMs to interpret complex time-series data from lithium-ion batteries, improving prediction and anomaly detection without retraining, leading to more adaptive and robust battery management systems.
Furthermore, improving the trustworthiness of AI systems extends to their evaluation and governance. Are We on the Right Way to Assessing LLM-as-a-Judge? by Yao Wan and Dongping Chen introduces Sage, a framework to assess LLM-as-a-Judge robustness by measuring logical consistency, revealing biases in human annotations. For data governance, Smart Data Portfolios: A Quantitative Framework for Input Governance in AI by A. Talha Yalta and A. Yasemin Yalta, proposes treating data categories as risk-bearing assets, allowing for transparent and auditable deployment by formalizing input governance. The paper Ev-Trust: A Strategy Equilibrium Trust Mechanism for Evolutionary Games in LLM-Based Multi-Agent Services by Jiye Wang et al. introduces a novel trust mechanism to address deception and fraud in LLM-based multi-agent systems, dynamically guiding agents toward stable cooperation through evolutionary game theory. These diverse approaches collectively underline a robust push toward more secure, reliable, and trustworthy AI ecosystems.
Under the Hood: Models, Datasets, & Benchmarks
The research heavily leverages and often introduces specialized tools and datasets to achieve and evaluate robustness:
- DAP: A foundation model for panoramic depth estimation, trained on a large-scale panoramic dataset of over 2 million synthetic and real samples, bridging domain gaps (as seen in Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation).
- Pixel Seal: An image watermarking model, trained with an adversarial-only paradigm using a three-stage schedule and Just-Noticeable Difference (JND) attenuation, for state-of-the-art robustness and imperceptibility (Pixel Seal: Adversarial-only training for invisible image and video watermarking). Code is available at https://github.com/facebookresearch/videoseal.
- TOGGLE: A framework for LLM compression for edge devices, integrating temporal logic constraints to maintain temporal behavior. (TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge)
- TOP-Bench: A benchmark for evaluating privacy leakage and reasoning robustness in autonomous agents using multiple tools, proposing the Counterfactual Cue for rigorous evaluation (Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation). Code available at https://github.com/1Ponder/TOP-R.
- Sage Framework: An evaluation suite for LLM-as-a-Judge systems, introducing metrics like Intra-Pair Instability (IPI) and Weak Total Order Violation (TOV) to measure logical consistency without human annotations (Are We on the Right Way to Assessing LLM-as-a-Judge?). Code available at https://github.com/plafle/LLMasaJudgeAssessment.
- The Perceptual Observatory: A framework to characterize perceptual robustness and vision-language grounding in MLLMs using ID corruptions and OOD stylized illusions (The Perceptual Observatory: Characterizing Robustness and Grounding in MLLMs).
- TimeSeries2Report (TS2R): A prompting framework for LLMs to manage lithium-ion battery data by converting time-series into structured reports, validated on lab-scale and real-world datasets (TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries). Code is available at https://github.com/zju-cs-ic/TimeSeries2Report.
- TABREX-BENCH: A large-scale benchmark across six domains and twelve perturbation types for tabular generation evaluation, using graph-based reasoning and LLM-guided matching (TabReX : Tabular Referenceless eXplainable Evaluation). Code is available at https://github.com/tejasanvekar/TABREX.
- DyG-Mamba: A state space model for dynamic graph modeling, translating irregular time spans into control signals and leveraging memory mechanisms inspired by Ebbinghaus’ forgetting curve (DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs). Code is available at https://github.com/Clearloveyuan/DyG-Mamba.
- AcTOL: A pre-training method for vision-language representations in embodied agents that introduces a local Brownian bridge constraint for smoother transitions between frames (Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents). Code is available at https://actol-pretrain.github.io/.
- CSA-TTA: A framework using cross-sample augmentation and test-time adaptation for personalized intraoperative hypotension prediction, leveraging the VitalDB dataset and a real-world in-hospital dataset (Cross-Sample Augmented Test-Time Adaptation for Personalized Intraoperative Hypotension Prediction). Code available at https://github.com/kanxueli/CSA-TTA.
- SDP Framework: Utilizes quantitative financial principles to govern AI training data, ensuring explainability and regulatory alignment (Smart Data Portfolios: A Quantitative Framework for Input Governance in AI).
- KOSS: Demonstrates superior performance in long-term forecasting, showing 2.92–36.23% accuracy improvements across nine benchmarks through its Kalman-optimal selective state space model (KOSS: Kalman-Optimal Selective State Spaces for Long-Term Sequence Modeling).
Impact & The Road Ahead
The implications of this research are profound, touching virtually every domain where AI is deployed. From enhancing the safety of autonomous vehicles through real-world adversarial testing (Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving) and robust localization systems (A2VISR: An Active and Adaptive Ground-Aerial Localization System Using Visual Inertial and Single-Range Fusion, Ridge Estimation-Based Vision and Laser Ranging Fusion Localization Method for UAVs), to improving early medical diagnosis (A Multimodal Approach to Alzheimer’s Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments) and ensuring the security of LLM-based services, the drive for robustness is ubiquitous. The insights into scaling laws for black-box adversarial attacks (Scaling Laws for Black-box Adversarial Attacks) highlight the continuous arms race between attackers and defenders, calling for more resilient architectures. Advances in model interpretability and control, such as SALVE (SALVE: Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks), also pave the way for more auditable and trustworthy AI systems.
The future of AI robustness will likely involve a multi-pronged approach: developing more inherently robust models, creating sophisticated defense mechanisms, establishing rigorous and unbiased evaluation benchmarks, and formalizing governance frameworks that ensure accountability. Addressing non-stationarity in fields like Brain-Computer Interfaces (Non-Stationarity in Brain-Computer Interfaces: An Analytical Perspective) remains a crucial challenge. As AI systems become more autonomous and integrate into our daily lives, the innovations highlighted here are vital steps toward a future where AI is not only intelligent but also reliably safe, secure, and trustworthy.
Share this content:
Discover more from SciPapermill
Subscribe to get the latest posts sent to your email.
Post Comment