Loading Now

Gaussian Splatting: From Billion-Scale Worlds to Intelligent Robot Hands

Latest 57 papers on gaussian splatting: May. 23, 2026

Get ready, AI and ML enthusiasts! Gaussian Splatting (GS), the revolutionary real-time 3D representation, continues its meteoric rise, pushing the boundaries across an astonishing array of applications. From crafting photorealistic digital twins of entire cities to enabling robots to navigate complex, dynamic environments, GS is no longer just about rendering pretty pictures; itโ€™s becoming a cornerstone for intelligent systems. Recent research is doubling down on scalability, robustness, and semantic understanding, unlocking capabilities previously thought impossible.

The Big Idea(s) & Core Innovations:

The core innovation across these papers revolves around making Gaussian Splatting more versatile, robust, and intelligent. A significant theme is scalability and efficiency, exemplified by TideGS: Scalable Training of Over One Billion 3D Gaussian Splatting Primitives via Out-of-Core Optimization from Hong Kong University of Science and Technology. This groundbreaking work allows training of over a billion Gaussians on a single GPU by virtualizing parameters, transforming VRAM into a working-set cache. Complementing this, TensorGS: Accelerating 3D Gaussian Splatting using Tensor Cores by researchers from the University of Pittsburgh and Microsoft AI, identifies and utilizes idle Tensor Cores, achieving a 1.65ร— speedup by tensorizing the power computationโ€”a key bottleneck in GS rasterization.

Robustness in challenging conditions is another major thrust. HarmoGS: Robust 3D Gaussian Splatting in the Wild via Conflict-Aware Gradient Harmonization from Sun Yat-sen University tackles noisy, in-the-wild scenes by explicitly harmonizing conflicting gradients, leading to significantly cleaner reconstructions. For underwater scenes, 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling from Nankai University disentangles object appearance from water effects, allowing for physically accurate restoration. Addressing visual degradations, SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation from Shanghai Jiao Tong University leverages 3DGS to synthesize realistic degradations, revealing MLLM robustness gaps and showing how fine-tuning can surpass human performance.

Enhanced semantic understanding and editability are also advancing rapidly. OP2GS: Object-Aware 3D Gaussian Splatting with Dual-Opacity Primitives by the University of Oulu introduces dual opacity values, allowing mislabeled Gaussians to be suppressed in object masks without affecting visual appearanceโ€”a fundamental rethinking of the Gaussian primitive for better object segmentation. FaceParts: Segmentation and Editing of Gaussian Splatting Avatars from Wrocล‚aw University of Science and Technology, achieves unsupervised facial part segmentation and swapping directly in the GS domain. Furthermore, SCOUP: Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting from the University of Zagreb and University of Toronto enables 400x faster training for language-driven 3D understanding by learning sparse 2D codebook representations and uplifting them to 3D Gaussians.

Under the Hood: Models, Datasets, & Benchmarks:

This wave of innovation is powered by novel architectural designs, training strategies, and robust benchmarking. Here are some key resources and methodologies:

  • 4D Gaussian Splatting: A core component in dynamic scene understanding, featured in Sensor2Sensor (Waymo, JHU) for cross-embodiment sensor conversion, 4D-GSW (Southeast University) for kinematic-aware watermarking, and NoPo4D (Politecnico di Milano, ETH Zรผrich) for feed-forward, pose-free dynamic reconstruction from unposed multi-view videos.
  • Generative Models & Diffusion Priors: Increasingly integrated for scene completion and enhancement. FlowGS (Beijing Foreign Studies University) uses flow matching for continuous-scale super-resolution, while VidSplat (Tsinghua University) leverages video diffusion priors for training-free sparse-view reconstruction. PanoPlane (University of Maryland) employs diffusion models with layout-anchored attention for panoramic scene completion.
  • Physics-Aware Modeling: A rising trend for realism and functionality. EndoGSim (CUHK) combines 4DGS with MLLM-guided material estimation and differentiable MPM for surgical scene simulation. Real2Sim (RPI, University of Delaware) unifies 4DGS with MPM for editable, physics-aware autonomous driving simulations. PG-3DGS (Purdue University) even generates 3D structures that satisfy physics functionalities, like pouring teapots and lift-generating airplanes.
  • Optimization Strategies: Innovations abound for faster, more stable training. CAdam (Kyung Hee University) introduces momentum-based signal verification for generative densification, significantly reducing Gaussian primitives. Learn2Splat (University of Tรผbingen, Meta) proposes a meta-learned optimizer for long-horizon stability. ForeSplat (ShanghaiTech University) uses optimization-aware training for rapid refinement, while SparseOIT (Zhejiang University, Westlake University) uses an active set method for faster Order-Independent Transparency. Denoising-GS (Fudan University) reformulates GS optimization as a denoising process, enhancing robustness to noisy initializations.
  • Compact Representations & Acceleration: Efforts to reduce memory footprint and boost speed. MMGS: 10ร— Compressed 3DGS through Optimal Transport Aggregation (HKUST) achieves 10ร— compression using Optimal Transport theory. Compact 3D Gaussian Splatting For Dense Visual SLAM (SJTU, NTU) introduces voxel-anchored representations for 2.21ร— memory compression and 226% rendering speedup in SLAM. 3DGS3 (USTC) offers a post-rendering framework for joint super sampling and frame interpolation for 4K 96 FPS rendering.
  • Architectural Enhancements: 3D Skew Gaussian Splatting and 3D Skew-Normal Splatting (HKUST, Fudan University) replace symmetric Gaussians with skew normal distributions to capture asymmetric features more effectively. Z-Order Transformer for Feed-Forward Gaussian Splatting (University of Hong Kong) leverages Z-order curves for efficient context modeling and compression.
  • Specialized Datasets & Benchmarks: SpaceDG provides a large-scale dataset for MLLM spatial intelligence under visual degradation. FlyMirage (Zhejiang University) generates diverse and scalable UAV flight data using LLM-driven generative world models. The paper 3D Gaussian Splatting for Efficient Retrospective Dynamic Scene Novel View Synthesis with a Standardized Benchmark (Texas A&M University) introduces a Blender-based API for generating standardized dynamic multi-view datasets. PoseCompass (University of Sydney) tackles intelligent synthetic pose selection for visual localization.
  • Code Repositories: Many works promise public code, with some already available: SpaceDG, AIR, TWINGS, TideGS, MIF, PointGS, VCGS-SLAM, and AmbiSuR.

Impact & The Road Ahead:

The implications of these advancements are profound. Weโ€™re seeing GS evolve from a novel rendering technique into a cornerstone technology for embodied AI, enabling robots to build high-fidelity, semantic, and even physics-aware world models, as demonstrated by MIF: Multi-modal Interactive Fields for Robust Humanoid Navigation (Peking University, Oxford Robotics Institute) and Forecast-GS: Predictive 3D Representation in Language-Guided Pick-and-Place Manipulation (KTH Royal Institute of Technology). The ability to generate large-scale, physically consistent data through frameworks like FlyMirage and Real2Sim promises to accelerate autonomous driving and robotics research by overcoming data scarcity challenges.

Furthermore, the focus on efficiency and compactness (TideGS, TensorGS, MMGS, Compact 3DGS for SLAM) is making real-time, high-fidelity 3D reconstruction and rendering viable on consumer hardware and edge devices, opening doors for pervasive AR/VR, robotics, and interactive digital twins. The push for safety and copyright protection (3DEditSafe from Tufts University, GuardMarkGS from Korea University, and 4D-GSW) highlights a growing maturity in the field, addressing ethical and practical concerns for deploying this powerful technology.

As GS continues to integrate with other powerful AI paradigms like MLLMs and diffusion models, we can expect even more intelligent, adaptable, and real-time 3D perception and generation capabilities. The era of truly interactive, photorealistic, and semantically rich digital worlds, powered by Gaussian Splatting, is not just on the horizon โ€“ itโ€™s here.

Share this content:

mailbox@3x Gaussian Splatting: From Billion-Scale Worlds to Intelligent Robot Hands
Hi there ๐Ÿ‘‹

Get a roundup of the latest AI paper digests in a quick, clean weekly email.

Spread the love

Post Comment