Data Privacy in the Age of AI: From Formal Verification to Unlearning and Secure Hardware
Latest 9 papers on data privacy: Jun. 20, 2026
The rapid advancement of AI and Machine Learning has brought unprecedented capabilities, yet it concurrently amplifies critical questions around data privacy. As models become more complex and data becomes more pervasive, ensuring the confidentiality and integrity of personal information is not just a regulatory requirement but a foundational challenge. Recent research explores this multifaceted problem from various angles, from formalizing privacy guarantees and securing hardware to enabling model forgetting and safeguarding agentic AI. Let’s dive into some of the latest breakthroughs.
The Big Idea(s) & Core Innovations:
One central theme is the quest for robust, verifiable privacy. Traditionally, achieving anonymity in data often involves trade-offs or complex preprocessing. However, the paper “A-COMPASS: Formal Foundations for Anonymity Analysis in Microdata” by Tamara Tagliavia and Silvia Ghilezan (Mathematical Institute of the Serbian Academy of Sciences and Arts, and University of Novi Sad) introduces a groundbreaking extension to the COMPASS language. A-COMPASS allows direct verification and enforcement of anonymity conditions like k-anonymity and l-diversity on standard microdata tables. This is a significant leap, as it bypasses the need for pre-processed, grouped data, making formal privacy guarantees more accessible and practical. Key to this is the COUNT DISTINCT operation for l-diversity and a versatile REPLACE action for both suppression and generalization, all underpinned by a formally proven, deterministic semantics.
Another innovative thrust tackles the intersection of privacy and hardware acceleration. Fully Homomorphic Encryption (FHE) promises computation on encrypted data, but its high-precision requirements clash with the low-precision optimization of AI hardware like TPUs. “Low-Cost Multi-Precision Systolic Arrays for Accelerating FHE NTTs on AI ASICs” by George Alexakis, Dimitrios Schoinianakis, and Giorgos Dimitrakopoulos (Democritus University of Thrace and Nokia Bell Labs) presents a clever solution. They propose a minimally modified systolic array architecture that performs full-precision reconstruction natively within the matrix engine. This small hardware tweak (less than 1% overhead) delivers substantial speedups (1.33x to 4.49x) for FHE’s Number Theoretic Transform (NTT) operations, effectively bridging the precision gap and making FHE acceleration on commodity AI hardware a more tangible reality.
Beyond technical mechanisms, governance and ethical considerations are paramount, especially for autonomous AI. The paper “Deontic Policies for Runtime Governance of Agentic AI Systems” by Anupam Joshi, Tim Finin, Karuna Joshi, and Lalana Kagal (UMBC and MIT CSAIL) introduces AgenticRei, a framework leveraging deontic logic and OWL/RDF semantics for runtime policy enforcement in LLM-driven agentic AI systems. Unlike traditional flat policy engines, AgenticRei handles complex concepts like obligations, dispensations, and meta-policy conflict resolution. This is crucial for robust, auditable governance, ensuring agents act not just within permissions, but also fulfill necessary duties, preventing “authority creep” where agents incrementally gain capabilities without re-authorization.
Fairness and trust are critical in distributed AI settings like Federated Learning (FL). “SCOPE-FL: A Strategy-proof Chain-based Optimal Pareto Efficient Federated Learning System” by Seyed Salar Ghazi et al. (École de Technologie Supérieure, Ferdowsi University of Mashhad, and University of Toronto) proposes a hierarchical FL framework that achieves both Pareto efficiency and strategy-proofness in client selection using the Top Trading Cycle (TTC) algorithm. This ensures clients have no incentive to misreport their preferences, a significant step towards trustworthy and fair FL. Coupled with blockchain smart contracts for tamper-proof execution and a scalable Shapley value approximation for fair contribution evaluation, SCOPE-FL represents a robust solution for incentive-aligned federated learning. For code, check out their GitHub repository.
Finally, the critical aspect of data removal and the ‘right to be forgotten’ is addressed by “To forget is to preserve: Machine Unlearning for 3D medical image segmentation” by Nitesh Kumar Singh et al. (UPES, University of Tomorrow). This work provides the first systematic benchmark of approximate machine unlearning strategies for 3D medical image segmentation, a high-stakes domain. They find that the Noisy Label strategy offers the best trade-off, achieving 93% performance reduction on forgotten data while maintaining 84% accuracy on retained data, a practical approach for GDPR compliance without expensive retraining.
However, even with advancements, new vulnerabilities emerge. “VLALeaks: Membership Inference Attacks against Vision-Language-Action Models” by Xukun Luan et al. (Beijing Institute of Technology, Northeastern University, and King Abdullah University of Science and Technology) introduces the first membership inference attack (MIA) framework for Vision-Language-Action (VLA) models in robotics. VLALeaks exploits attention discrepancies across modalities, achieving near-perfect AUC (>0.99) and demonstrating that even sophisticated defenses like differential privacy are ineffective. This highlights a fundamental privacy leakage in VLA models, revealing a new frontier for privacy research in embodied AI. You can explore their code on GitHub.
In data-constrained federated environments, privacy-preserving knowledge distillation is essential. “Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments” by Junming Liu et al. (Tongji University, Shanghai Artificial Intelligence Laboratory, The City University of New York, and Shenzhen University of Advanced Technology) proposes Mosaic. This framework uses a lightweight generator ensemble and a Mixture-of-Experts (MoE) teacher model to address both model and data heterogeneity without sharing real data. The generator ensemble preserves client-specific knowledge, and the MoE teacher robustly distills this into a global model using synthetic data, significantly outperforming prior art. Their code is available on GitHub.
Finally, the human element of AI governance cannot be overlooked. “Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation” by Sitong Lyu et al. (University of Sheffield and University of Oxford) examines the gap between national AI policies and local implementation in the UK public sector, using Special Educational Needs and Disabilities (SEND) services as a case study. They identify five critical challenges: shadow AI usage, market-government asymmetry, insufficient workforce readiness, lack of standardized measurements, and gaps in human accountability. This qualitative work underscores that responsible AI demands structural reforms and local guardrails, not just top-down compliance.
Under the Hood: Models, Datasets, & Benchmarks:
- A-COMPASS: Utilizes standard microdata tables, demonstrating its applicability to existing data formats, diverging from previous methods requiring pre-processed tables. The formal semantics are inspired by SQL semantics.
- FHE Acceleration: Relies on cycle-accurate SCALE-Sim simulations and 7nm OpenROAD synthesis for hardware evaluation. It targets FHE’s Number Theoretic Transform (NTT) operations, a core component of many FHE schemes.
- AgenticRei: Built on the Rei framework, OWL/RDF semantics, and uses the RDFox knowledge graph system. It leverages industry standards like A2AS (Agentic AI Runtime Security) and aligns with W3C Verifiable Credentials Data Model 2.0. Integrates with healthcare (HL7 FHIR, SNOMED CT) and finance ontologies (FIBO, FinRegOnt).
- SCOPE-FL: Evaluated on diverse datasets including MNIST, Fashion-MNIST, and CIFAR-10. It uses Polygon blockchain for decentralized execution and can be deployed on a private Ethereum network via Kurtosis. Code: https://github.com/scope-fl/scope-fl
- Machine Unlearning for 3D Medical Images: Benchmarked on the MRBrainS18 dataset, a volumetric medical imaging dataset, using a Med3D ResNet-50 backbone. This establishes a reproducible protocol for patient-level unlearning in this critical domain.
- VLALeaks: Attacks OpenVLA, π0, and RT-2 VLA models. Tested on datasets like LIBERO (simulated) and Open X-Embodiment, RoboCOIN (real-world robotic platforms). Code: https://github.com/Zili1000/VLALeaks
- Mosaic: Experimented with 6 image datasets, 2 text datasets, and 1 multimodal dataset, demonstrating broad applicability. The framework leverages a lightweight generator ensemble and a Mixture-of-Experts teacher model. Code: https://github.com/Junming-Liu-Mosaic/Mosaic
- Fault Lines: A qualitative study involving thematic analysis of 17 semi-structured interviews with UK policymakers and practitioners, supplemented by documentary analysis, providing real-world insights into public sector AI adoption.
Impact & The Road Ahead:
These advancements collectively push the boundaries of data privacy in AI. The formal verification capabilities of A-COMPASS offer stronger, provable guarantees for data anonymization, critical for sensitive datasets. The FHE acceleration breakthroughs could make privacy-preserving computation practically viable on existing AI hardware, unlocking secure data analysis across industries. AgenticRei’s deontic policies provide a robust framework for governing increasingly autonomous AI systems, ensuring accountability and ethical behavior beyond simple permissions.
SCOPE-FL’s contributions to fair and strategy-proof federated learning address a crucial incentive alignment problem, fostering greater participation and trust in collaborative AI. The systematic benchmark in machine unlearning provides a tangible path for organizations to comply with “right to be forgotten” regulations, especially vital in medical AI. However, the revelation of VLALeaks on VLA models highlights that privacy challenges are ever-evolving, requiring continuous innovation in attack and defense mechanisms for embodied AI. Mosaic offers a powerful, privacy-preserving method for knowledge distillation in heterogeneous FL, making collaborative model training more efficient and secure without raw data sharing.
The “Fault Lines” study serves as a crucial reminder that technological solutions must be accompanied by robust governance, institutional capacity building, and a deep understanding of local contexts. The road ahead involves not just developing more secure algorithms and hardware, but also creating holistic ecosystems where technical prowess, ethical frameworks, and practical implementation converge to build truly responsible and privacy-preserving AI systems. The interplay between these diverse research areas promises a future where AI’s transformative power can be harnessed safely and ethically, respecting individual privacy at every turn.
Share this content:
Post Comment