Differential Privacy: Beyond the Epsilon, Towards Practical, Scalable, and Interpretable AI

Latest 32 papers on differential privacy: Jun. 6, 2026

Differential Privacy (DP) has emerged as the gold standard for quantifying and enforcing privacy in machine learning, but its practical application often grapples with trade-offs between privacy, utility, and computational overhead. Recent research is pushing the boundaries of DP, moving beyond naive noise addition to develop sophisticated, context-aware, and computationally efficient mechanisms. This digest explores groundbreaking advancements that are making DP more robust, flexible, and integrated into complex AI systems.

The Big Idea(s) & Core Innovations

The central theme across these papers is the pursuit of more effective privacy-utility trade-offs, often by leveraging structural insights about data or models. For instance, the traditional assumption of independent noise in Local Differential Privacy (LDP) is challenged by Madhura Pathegama et al. from Georgia Institute of Technology in their work, “Local Differential Privacy with Correlated Noise Achieves Central-DP Optimal Cost”. They demonstrate that carefully designed correlated noise can bridge the utility gap between local and centralized DP, achieving optimal costs that were previously thought impossible for LDP. This fundamental insight reshapes our understanding of locality’s cost.

Similarly, Huikang Liu et al. from Shanghai Jiao Tong University and UCL School of Management reveal in “Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy” that optimal DP noise distributions are often multimodal, not unimodal. By using mixtures of Gaussians, they close up to 99% of the optimality gap compared to traditional Gaussian mechanisms, significantly improving utility in moderate and low-privacy regimes prevalent in real-world deployments.

Addressing a critical vulnerability in heterogeneous DP federated learning, Farhin Farhad Riya et al. from the University of Tennessee, Knoxville and Oak Ridge National Laboratory introduce “IntraShuffler: A Privacy Preserving Framework for Heterogeneous DP Federated Learning”. This framework thwarts privacy inference attacks that exploit persistent client-specific gradient structures under ε-aware aggregation by employing privacy-aware bucketing and parameter-level shuffling, reducing attack success rates by over 60%.

For complex models like LLMs, privacy-preserving techniques are vital. Peihua Mai et al. from the National University of Singapore propose “SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models”, a batch-level paradigm that mixes original queries with noisy variants to protect user prompts. This approach is model-agnostic and achieves over 20% utility improvement with up to 5.6x cost reduction compared to DP baselines. Complementing this, Fengyu Gao and Jing Yang from the University of Virginia develop “Differentially Private Preference Data Synthesis for Large Language Model Alignment”, an algorithm to generate DP synthetic preference data for LLM alignment that often outperforms fine-tuning on real private data under strong DP constraints.

In specialized applications, Zhiyu Sun et al. from East China Normal University and Stevens Institute of Technology tackle location privacy in “Protecting K-Nearest Neighbor Queries from Location Inference Attacks”. They propose DPRS, a differential privacy framework that uses rejection sampling within constrained perturbation intervals to reduce location inference attack success rates to below 3%. Meanwhile, Atsu Kokuvi Ang’elo Passah et al. from ETIS Laboratory introduce “Channel Chart Location Privacy Based on Geo-Indistinguishability”, defining Chart Location Indistinguishability (CLI) and proposing a Mahalanobis norm planar Laplace (MNPL) mechanism that preserves neighborhood structure in channel charting representations, a significant improvement over standard DP.

Under the Hood: Models, Datasets, & Benchmarks

This collection of research leverages and introduces various models, datasets, and benchmarks to validate their innovations:

DPDL (for decentralized learning): Evaluated on MNIST and CIFAR-10 for non-IID data. Yunsheng Yuan et al.
DPPrefSyn (for LLM preference alignment): Utilizes OpenAssistant, Anthropic-HH, and TL;DR summarization datasets, leveraging models like Pythia-2.8B, Llama-7B-chat, and Qwen-3-4B-Instruct. Code available at https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis.
SharedRequest (for LLM inference): Uses Legal-QA, Medical-QA, and MMLU-Biz datasets with Qwen2.5 discriminator models. Code available at https://github.com/NusIoraPrivacy/SharedRequest.
PATE-TabTransGAN (for synthetic tabular data): Benchmarked on Adult Income, Breast Cancer, Cardio, and Cervical datasets. Code available at https://anonymous.4open.science/r/PATE-TabTransGAN-B467/README.md.
DP-OLS with FastMix (for private linear regression): Omri Lev et al. introduce the FastMix mechanism for ordinary least squares, with code at https://github.com/omrilev1/FastMix.
DP-TTA (for private test-time adaptation): Evaluated on ImageNet-C and ImageNet-R datasets using ViT-Base/16 and ConvNeXt Tiny models, integrated with the Opacus library. Zefeng Li et al.
FHE for Causal Structure Learning: Jian Yang et al. utilize the Microsoft SEAL FHE library and EVA FHE compiler for privacy-preserving causal inference. Code at https://github.com/microsoft/SEAL and https://github.com/microsoft/EVA.
DP Datastore Generation: Abdelrahman Abouelenein and Marwan Torki use SimHash and various text classification datasets like MR, TREC, and AGNews with Phi-4-Mini as a backbone.
IntraShuffler: Uses London Household Electricity, Pecan Street Electricity, ComStock, and CIFAR-10 datasets.
PATE-TabTransGAN: Uses Adult, Breast, Cardio, and Cervical datasets.
ScanTwin: Benchmarked on TPC-H and SSB datasets for performance regression. Donghyun Sohn and Jennie Rogers
DPML Training with FHE: Yvonne Zhou et al. train on Adult, COMPAS, Credit Card Default, and MNIST datasets. Code at https://github.com/dpfhe096-design/dp_fhe.

Impact & The Road Ahead

The collective impact of this research is profound, making differentially private AI not just a theoretical ideal but a practical reality across diverse domains. From securing sensitive medical data in speech analysis (“InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization” by Xueyang Wu et al.) to protecting user prompts in LLMs and even debugging database performance without risking tenant data, DP is being refined and extended.

Looking ahead, several directions emerge. The convergence of DP with homomorphic encryption, as seen in “Preserving Data Privacy in Learning Causal Structure with Fully Homomorphic Encryption” by Jian Yang et al. and “Revisiting ML Training under Fully Homomorphic Encryption: Convergence Guarantees, Differential Privacy, and Efficient Algorithms” by Yvonne Zhou et al., promises truly secure outsourced computation. The exploration of “adversary-aware” DP bounds by Marika Swanberg et al. in “A Unified Framework for Adversary-Aware Differential Privacy Bounds” is crucial for calibrating privacy parameters to real-world threats, moving beyond conservative worst-case estimates.

New theoretical insights, such as optimal rates for private hypothesis testing with e-values (“Optimal Rates for Differentially Private Hypothesis Testing with E-values” by Ben Jacobsen et al.) and the role of fairness in mitigating inference attacks (“Fair Finetuning Mitigates Distribution Inference Attacks” by Rakshit Naidu), highlight the growing maturity and interdisciplinary nature of DP research. The advent of quantum LDP mechanisms (“Optimal quantum locally differentially private mechanisms in the high-privacy regime” by Yuuya Yoshida) even hints at a future where privacy is enhanced by quantum advantages. These advancements are not just incremental; they represent a fundamental shift towards building an AI ecosystem that is not only powerful but also inherently private and trustworthy. The journey to fully realize this vision is ongoing, but the path is becoming increasingly clear and exciting.

Share this content:

Spread the love

Discover more from SciPapermill

Subscribe to get the latest posts sent to your email.

Differential Privacy: Beyond the Epsilon, Towards Practical, Scalable, and Interpretable AI

Latest 32 papers on differential privacy: Jun. 6, 2026

The Big Idea(s) & Core Innovations

Under the Hood: Models, Datasets, & Benchmarks

Impact & The Road Ahead

Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Discover more from SciPapermill

Post Comment Cancel reply

Latest 32 papers on differential privacy: Jun. 6, 2026

The Big Idea(s) & Core Innovations

Under the Hood: Models, Datasets, & Benchmarks

Impact & The Road Ahead

Hi there 👋

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Discover more from SciPapermill

Unlocking Efficiency and Performance: Recent Breakthroughs in Model Compression

Feature Extraction Frontiers: From Neuromuscular Micro-Motions to Hybrid Architectures

Post Comment Cancel reply

Discover more from SciPapermill