Loading Now

Data Privacy in AI/ML: From Unlearning to Unseen Domains and Efficient Edge Intelligence

Latest 21 papers on data privacy: May. 9, 2026

Data privacy remains a paramount concern in the rapidly evolving landscape of AI and Machine Learning. As models grow in complexity and data becomes increasingly distributed, ensuring confidentiality, robustness against attacks, and compliance with regulations like GDPR presents multifaceted challenges. Fortunately, recent research offers promising breakthroughs, pushing the boundaries of what’s possible in privacy-preserving AI. This post dives into a selection of these advancements, highlighting innovations that span unlearning, federated learning, secure inference, and domain adaptation.

The Big Idea(s) & Core Innovations

At the heart of recent developments lies a drive to build more resilient and ethical AI systems. One critical area is machine unlearning, the ability to remove specific data’s influence from a trained model. However, “Revisiting Privacy Leakage in Machine Unlearning: Membership Inference Beyond the Forgotten Set” by Stevens Institute of Technology and University of Connecticut researchers, reveals a nuanced challenge: unlearning can paradoxically increase privacy risks for retained data. Their novel Tri-Class Unlearning Membership Inference Attack (TC-UMIA) effectively distinguishes forgotten, retained, and unseen data by leveraging model predictions before and after unlearning, achieving up to 95.6% accuracy. This highlights that true privacy requires holistic strategies, not just data deletion.

Complementing this, Brac University researchers in “Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures” propose a modified SISA (Sharded, Isolated, Sliced, and Aggregated) framework for efficient class-level unlearning in CNNs. By using sequential class-level slicing, a reinforced replay mechanism, and a gating network, they achieve exact unlearning for specific classes while preserving overall model performance and drastically reducing retraining costs. This provides a practical path towards GDPR compliance.

Federated Learning (FL) continues to be a cornerstone for privacy-preserving distributed AI. The University of Texas at Arlington and Mississippi State University introduce “PERFECT: Personalized Federated Learning for CBRS Radar Detection”, a framework enabling geographically dispersed Environmental Sensing Capability (ESC) sensors to collaboratively train radar detection models without sharing raw spectral data. Their innovation lies in federated personalization—shared base layers with private, personalized heads—to maintain an FCC-mandated 99% recall even with non-IID (non-independently and identically distributed) data. This concept of hybrid global-local models is further explored by Avignon University in “FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training”. FedPLT allows clients to train only partial layers based on their resources, significantly reducing trainable parameters (71-82%) while achieving comparable performance to full-model training, crucial for resource-constrained IoT environments.

Addressing data heterogeneity and security in FL, “Sample Selection Using Multi-Task Autoencoders in Federated Learning with Non-IID Data” by Gebze Technical University proposes multi-task autoencoders combined with unsupervised outlier detection (like One-Class SVM) to filter noisy or malicious samples on client devices, boosting accuracy by up to 7.02% under 40% noise. Meanwhile, Southeast University introduces “CO-EVO: Co-evolving Semantic Anchoring and Style Diversification for Federated DG-ReID”, a novel federated framework for domain generalization in person re-identification that tackles semantic-style conflict. CO-EVO combines camera-invariant semantic anchoring with global style diversification to learn robust features across diverse camera styles, outperforming prior methods by +2.0% mAP.

Beyond federated training, privacy in inference and data utility is also evolving. Seoul National University presents “A Target-Free Harmonization Method for MRI”, TgtFreeHarmony, which harmonizes MRI data across different scanners without requiring access to target domain data, a critical breakthrough for multi-center medical studies with stringent privacy rules. It uses disentanglement-based generators and Bayesian optimization to estimate and synthesize target styles. For clinical decision support, Anurag University’s “Federated Semi-Supervised Graph Neural Networks with Prototype-Guided Pseudo-Labeling for Privacy-Preserving Gestational Diabetes Mellitus Prediction” (FedTGNN-SS) offers a framework for GDM prediction, combining prototype-guided pseudo-labeling with privacy-safe prototype sharing. This allows hospitals to collaborate and improve models even with limited labeled data, without sharing sensitive patient EHRs.

Finally, the survey “Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey” by Qianzhou Chen et al. provides a holistic view of Split Learning (SL) and Aggregation Learning (AL) for large foundation models in 6G networks. It highlights how SL enhances privacy by transmitting intermediate activations instead of raw data, while AL optimizes model updates, paving the way for ubiquitous, privacy-preserving AI.

Under the Hood: Models, Datasets, & Benchmarks

These innovations are often built upon robust models, diverse datasets, and rigorous benchmarks:

Impact & The Road Ahead

These advancements herald a new era for privacy-preserving AI. The nuanced understanding of unlearning’s impact on retained data, as revealed by TC-UMIA, is crucial for developing truly secure deletion mechanisms. Practical frameworks like modified SISA and FedPLT demonstrate that privacy and efficiency don’t have to be mutually exclusive, enabling real-world deployments in areas like autonomous systems, surveillance, and healthcare.

The rise of personalized federated learning (PERFECT), adaptive split learning for LLMs (SplitFT), and target-free domain adaptation (TgtFreeHarmony) addresses the persistent challenges of data heterogeneity and multi-institutional collaboration. These solutions promise more robust, equitable, and widely adoptable AI systems. Furthermore, initiatives like DeRelayL (https://arxiv.org/pdf/2605.02935), a blockchain-based decentralized relay learning paradigm from Shenzhen MSU-BIT University, aim to democratize AI training, allowing common users to contribute to and even own parts of large models. This vision, alongside the integration of distributed AI with emerging 6G technologies, promises a future where powerful AI is accessible, collaborative, and inherently privacy-aware.

Share this content:

mailbox@3x Data Privacy in AI/ML: From Unlearning to Unseen Domains and Efficient Edge Intelligence
Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Spread the love

Post Comment