Gaussian Splatting: Bridging Realism, Efficiency, and Intelligence in 3D AI
Latest 57 papers on gaussian splatting: Apr. 4, 2026
Gaussian Splatting (3DGS) has rapidly emerged as a game-changer in 3D AI, offering unprecedented real-time rendering capabilities and high-fidelity scene representations. This vibrant field is at the forefront of tackling complex challenges, from recreating dynamic worlds to enabling intelligent robotic interaction. Recent breakthroughs, as showcased in a flurry of innovative research papers, are pushing the boundaries of what’s possible, addressing efficiency, quality, and new application domains.
The Big Idea(s) & Core Innovations
The core innovation across these papers is a relentless pursuit of enhancing 3DGS from multiple angles: improving rendering quality, boosting efficiency, enabling new capabilities like physics-aware dynamics, and extending applications to diverse fields. A key theme is overcoming the limitations of vanilla 3DGS, which often struggles with fine details, dynamic scenes, or resource intensity.
For instance, the paper “Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction” by Condor, Hermann, Yurtsever, and Didyk from Vision and Pattern Recognition/CVPR introduces Neural Harmonic Textures. This novel mechanism enhances primitive-based neural reconstruction by encoding high-frequency details that traditional Gaussians miss, achieving superior rendering quality and structural fidelity. Similarly, “HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars” from Gent Serifi and Marcel C. Buehler (ETH Zurich) extends 3DGS to capture intricate details like specular reflections and thin structures in animatable face avatars, leveraging a probabilistic interpretation and an efficient “inverse covariance trick.” This highlights a general trend toward enriching the Gaussian representation itself.
Efficiency and scalability are also major drivers. “GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending” by Haomin Li et al. from Shanghai Jiao Tong University and United Imaging Intelligence, tackles the underutilization of GPU Tensor Cores by reformulating 3DGS blending into a GEMM-compatible form, leading to significant real-time speedups. Further addressing efficiency, “GS^2: Graph-based Spatial Distribution Optimization for Compact 3D Gaussian Splatting” by Xianben Yang et al. (Beijing Jiaotong University) significantly reduces memory footprint by optimizing Gaussian spatial distribution through graph-based feature encoding and adaptive densification, achieving high quality with only 12.5% of the original points. This points to smarter data management as a path to efficiency.
Dynamic scene reconstruction sees major advancements. “MotionScale: Reconstructing Appearance, Geometry, and Motion of Dynamic Scenes with Scalable 4D Gaussian Splatting” by Haoran Zhou and Gim Hee Lee (National University of Singapore) introduces scalable cluster-centric motion fields and a progressive optimization strategy to capture photorealistic appearance, accurate geometry, and coherent motion in large-scale dynamic environments from monocular videos. Building on this, “4DSurf: High-Fidelity Dynamic Scene Surface Reconstruction” by Renjie Wu et al. (Australian National University, NVIDIA, Amazon) presents a prior-free framework for temporally consistent 3D surfaces from sparse-view dynamic videos, uniquely handling large deformations with Gaussian Deformations induced Signed Distance Function Flow Regularization.
Another exciting direction is enabling precise control and interaction. “SVGS: Single-View to 3D Object Editing via Gaussian Splatting” offers text-driven 3D object editing from a single image by combining diffusion models with 3DGS, using “Relevance-Aware Editing” and “Structural Prior Initialization” to maintain geometric consistency. Similarly, “ObjectMorpher: 3D-Aware Image Editing via Deformable 3DGS Models” by Yuhuan Xie et al. (The University of Hong Kong) provides a unified framework for real-time, non-rigid 3D-aware image editing, leveraging deformable 3DGS models with ARAP constraints for physically plausible manipulations.
Applications are also broadening significantly. “FaCT-GS: Fast and Scalable CT Reconstruction with Gaussian Splatting” by Pawel Tomasz Pieta et al. (RENNER) leverages 3DGS for fast, high-quality sparse-view CT reconstruction, demonstrating its utility in medical imaging by significantly reducing execution time. “Satellite-Free Training for Drone-View Geo-Localization” by Tao Liu et al. (Nanjing University of Science and Technology) uses 3DGS to generate pseudo-orthophotos from drone views, enabling geo-localization without satellite imagery during training, a crucial step for GPS-denied environments. And “F3DGS: Federated 3D Gaussian Splatting for Decentralized Multi-Agent World Modeling” from Morui Zhu et al. (University of North Texas, Budapest University of Technology and Economics) enables decentralized 3D reconstruction in multi-agent systems by decoupling geometry from appearance, a significant leap for privacy-preserving robotics.
Under the Hood: Models, Datasets, & Benchmarks
These advancements are powered by innovative models, novel datasets, and rigorous benchmarks:
- GEMM-GS: Proposes a GEMM-compatible blending transformation for 3DGS, implemented with a specialized CUDA kernel featuring a three-stage double-buffered pipeline. Code: https://github.com/shieldforever/GEMM-GS
- ProDiG: Utilizes Causal Attention Mixing and a Distance-Adaptive Gaussian Module for progressive altitude refinement from aerial imagery. Evaluated on the Aerial MegaDepth Dataset.
- Resonance4D: Employs Dual-domain Motion Supervision (DMS) and zero-shot text-prompted segmentation for preset-free physical parameter learning in 4D dynamic simulations.
- GS^2: Introduces an adaptive densification strategy based on ELBO and a graph-based feature encoding module for compact 3DGS. Evaluated on Mip-NeRF 360 and Tanks & Temples datasets. Code: https://github.com/BJTU-KD3D/GS-2
- FaCT-GS: A Gaussian Splatting-based CT reconstruction pipeline with novel initialization schemes based on FDK gradients and volumetric priors. Code: https://github.com/PaPieta/fact-gs
- F3DGS: A federated 3DGS framework decoupling geometry and appearance, using a LiDAR SLAM anchor and validated on the MeanGreen dataset. Provides a development kit.
- Satellite-Free Training for Drone-View Geo-Localization: Integrates 3D Gaussian Splatting with geometry-guided pseudo-orthophoto generation and Fisher vector aggregation, evaluated on University-1652 and SUES-200 datasets.
- Better Rigs, Not Bigger Networks: Advocates for the more expressive Momentum Human Rig (MHR), estimated via SAM-3D-Body, over SMPL for Gaussian avatars. Evaluated on PeopleSnapshot and ZJU-MoCap. Code: https://github.com/dcaustin33/better_rigs_not_bigger_networks
- LESV: Replaces probabilistic 3DGS with Sparse Voxel Rasterization (SVRaster) and leverages the AM-RADIO foundation model for open-vocabulary 3D scene understanding. Evaluated on LERF Benchmark, ScanNet, and KITTI-360.
- PhysGaia: A physics-aware benchmark for Dynamic Novel View Synthesis, featuring multi-body interactions with liquids, gases, and textiles, providing ground-truth physical parameters. Resources: https://cv.snu.ac.kr/research/PhysGaia/
- Learning Fine-Grained Geometry: Introduces a Cascade Depth Loss mechanism for sparse-view 3DGS, iteratively refining depth predictions.
- Neural Harmonic Textures: Integrates harmonic functions to encode high-frequency details within Lagrangian primitives for primitive-based neural reconstruction.
- Autoregressive Appearance Prediction: A spatial-MLP-conditioned 3D Gaussian avatar model with an autoregressive transformer-based appearance predictor. Resources: https://steimich96.github.io/AAP-3DGA/
- Coko-SLAM: A multi-agent RGB-D Gaussian Splatting SLAM framework with an optimization-sparsification process and adapted keyframe selection. Code: https://github.com/lemonci/coko-slam
- DirectFisheye-GS: Embeds the Kannala-Brandt projection model and a cross-view joint optimization strategy for native fisheye input.
- RT-GS: Integrates reflection and transmittance primitives directly into 3DGS, evaluated on Ref-Real and NU-NeRF datasets.
- ARGS: An auto-regressive paradigm that predicts the Level-of-Detail hierarchy of Gaussian Splatting fields in parallel using a tree-based transformer.
- GRVS: A recurrent loop architecture and plane sweep volumes for monocular dynamic view synthesis. Introduces Kubric-4D-dyn dataset. Resources: https://thomas-tanay.github.io/grvs
- AA-Splat: The first feed-forward 3DGS model with Opacity-Balanced Band-Limiting (OBBL) for alias-free rendering. Evaluated on RE10K, DL3DV, and ACID datasets. Code: https://kaist-viclab.github.io/aasplat-site
- MotionScale: A scalable 4D Gaussian Splatting framework with a cluster-centric motion field and a progressive optimization strategy. Resources: https://hrzhou2.github.io/motion-scale-web/
- LightHarmony3D: Utilizes GenEnvLighting (a generative diffusion model) and PBR-Guided Shadow Compositing for physically consistent object insertion.
- GenSplat: A feed-forward 3DGS framework for robotic policy learning with a 3D-prior distillation strategy. Code: https://github.com/SanMumumu/GenSplat
- SplatHLoc: A hierarchical visual relocalization framework based on Feature Gaussian Splatting with adaptive viewpoint retrieval. Resources: https://hqitao.github.io/SplatHLoc
- TUGS: A physics-based, compact representation for underwater scenes using tensor decomposition. Resources: https://liamlian0727.github.io/TUGS
- GeoHCC: A compression framework for 3DGS featuring Neighborhood-Aware Anchor Pruning and Geometry-Guided Convolution for hierarchical entropy coding.
- ObjectMorpher: Lifts objects into editable 3DGS representations with ARAP constraints and a composite diffusion module for image editing.
- SVGS: Combines diffusion models with 3DGS for single-view 3D object editing, using Relevance-Aware Editing and Structural Prior Initialization. Resources: https://amateurc.github.io/svgs.github.io/
- 4DSurf: A prior-free framework for dynamic scene surface reconstruction using Gaussian Deformations induced Signed Distance Function Flow Regularization and Overlapping Segment Partitioning.
- GS3LAM: A framework for dense semantic SLAM using a Semantic Gaussian Field with Depth-adaptive Scale Regularization and Random Sampling-based Keyframe Mapping. Code: https://github.com/lif314/GS3LAM
- SGS-Intrinsic: A two-stage framework for sparse-view indoor inverse rendering using semantic-invariant Gaussian fields and a hybrid illumination model. Code: https://github.com/GrumpySloths/SGS_Intrinsic.github.io
- DiffSoup: A radiance field representation using a small set of unstructured triangles with neural textures and stochastic opacity masking. Code: https://github.com/kenji-tojo/diffsoup
- arg-VU: A framework for affordance reasoning in robotic surgery, integrating physics-aware 3D geometry analysis. Code: https://github.com/placeholder-argvu-code
- MeshSplats: Converts Gaussian Splatting representations into disjoint mesh-like structures compatible with ray-tracing engines like Blender and Nvdiffrast.
- Drive-Through 3D Vehicle Exterior Reconstruction: Uses Dynamic-Scene SfM and Distortion-Aware Gaussian Splatting for vehicle reconstruction.
- Scene Grounding In the Wild: Aligns partial 3D reconstructions to a complete reference model using pseudo-synthetic renderings from Google Earth Studio and semantic features. Introduces the WikiEarth dataset.
- GLINT: Decomposes radiance into interface, transmission, and reflection components for scene-scale transparency, introducing 3D-FRONT-T dataset. Code: https://youngju-na.github.io/GLINT
- R-PGA: Generates robust physical adversarial camouflage using relightable 3D Gaussian Splatting with physically disentangled attributes and Hard Physical Configuration Mining. Code: https://github.com/TRLou/R-PGA
- Less Gaussians, Texture More: Introduces LGTM, a feed-forward framework for 4K novel view synthesis with compact textured Gaussians. Resources: https://yxlao.github.io/lgtm/
- ViewSplat: Improves feed-forward 3DGS with view-adaptive dynamic refinement using scene-conditioned dynamic MLPs. Resources: https://cvlab-uos.github.io/ViewSplat
- Learning Explicit Continuous Motion Representation: Models continuous motion using adaptive SE(3) B-spline bases for dynamic 3DGS. Code: https://github.com/hhhddddddd/se3bsplinegs
- GaussFusion: A geometry-informed video-to-video generation model for artifact removal in 3DGS. Resources: https://arxiv.org/pdf/2603.25053
- MoRGS: An efficient online framework for explicit per-Gaussian motion reasoning using sparse motion cues and motion confidence. Code: https://github.com/yonsei-cv/MoRGS
- π, But Make It Fly: Fine-tunes VLA models for aerial manipulation using teleoperated and 3D Gaussian Splatting synthetic data. Resources: https://airvla.github.io
- Relaxed Rigidity with Ray-based Grouping: Enforces physically plausible motion through relaxed rigidity constraints and ray-based grouping for dynamic 3DGS.
- Confidence-Based Mesh Extraction: A self-supervised confidence framework for 3DGS surface reconstruction, using color and normal variance losses. Code: https://github.com/r4dl/CoMe/
- Accurate Point Measurement in 3DGS: A web-based multi-ray spatial intersection method for accurate 3D point measurement. Code: https://github.com/GDAOSU/3dgs_measurement_tool
- SpectralSplats: Introduces Spectral Moment Loss and Principled Frequency Annealing for robust differentiable tracking via spectral moment supervision.
- FilterGS: A traversal-free parallel filtering and adaptive shrinking method for large-scale LoD 3DGS, using a novel GTC metric. Code: https://github.com/xenon-w/FilterGS
- AdvSplat: Proposes black-box adversarial attack algorithms for feed-forward 3DGS models, operating in the frequency domain.
- Stochastic Ray Tracing: Introduces a differentiable stochastic formulation for ray-traced 3DGS using an unbiased Monte Carlo estimator.
- Pose-Free Omnidirectional Gaussian Splatting: Proposes PFGS360, a pose-free omnidirectional 3DGS method with spherical consistency-aware pose estimation and depth-inlier-aware densification. Code: https://github.com/zcq15/PFGS360
- Drop-In Perceptual Optimization: A perceptual optimization framework for 3DGS that improves visual quality with minimal additional cost. Code: https://github.com/apple/ml-perceptual-3dgs
- GTLR-GS: A Geometry-Texture Aware LiDAR-Regularized 3D Gaussian Splatting (GTLR-GS) framework. Code: https://github.com/your-repo-name/gt-r-gs
- PhotoAgent: A robotic photographer system integrating spatial reasoning and aesthetic understanding for autonomous image capturing. Resources: https://developer.nvidia.com/isaac-sim
- Instrument-Splatting++: Creates controllable surgical instrument digital twins using Gaussian splatting for enhanced visualization. Code: https://github.com/Instrument-Splattingplusplus
- Predictive Photometric Uncertainty: Introduces an efficient, plug-and-play system for pixel-wise predictive photometric uncertainty estimation in 3DGS.
Impact & The Road Ahead
The impact of these advancements is far-reaching. From making real-time 3D rendering accessible on consumer hardware, as seen with GEMM-GS and DiffSoup, to enabling robust navigation in GPS-denied environments through Satellite-Free Training for Drone-View Geo-Localization, 3DGS is becoming a cornerstone technology. The ability to model complex dynamic scenes with unprecedented fidelity (MotionScale, 4DSurf) opens doors for virtual reality, advanced robotics, and immersive simulations. Applications in medical imaging (FaCT-GS, Instrument-Splatting++) promise to revolutionize diagnostics and surgical training. The integration of semantic understanding (GS3LAM, LESV) and physics-aware reasoning (Resonance4D, arg-VU) is transforming 3D models from mere representations into intelligent, interactive agents.
The road ahead is exciting. Researchers are actively tackling remaining challenges such as handling transparency more realistically (GLINT, RT-GS), improving robustness against adversarial attacks (AdvSplat), and pushing the boundaries of generalization to unseen views and dynamic interactions. The synergy between geometric explicit representations (Gaussians, meshes) and neural implicit fields will continue to evolve, likely leading to more hybrid, optimized solutions. Expect to see 3DGS become an even more pervasive technology, powering the next generation of AI-driven experiences, from realistic digital twins to truly intelligent embodied agents.
Share this content:
Post Comment