Segment Anything Model: Pioneering the Next Wave of Intelligent Segmentation
Latest 50 papers on segment anything model: Nov. 16, 2025
The Segment Anything Model (SAM), and its successor SAM2, have rapidly become cornerstone technologies in AI/ML, revolutionizing how we approach image and video segmentation. Their unprecedented ability to generalize to unseen objects and domains with minimal prompting has unleashed a torrent of innovation, addressing long-standing challenges from medical imaging to autonomous driving and environmental monitoring. This digest dives into the latest breakthroughs, showcasing how researchers are pushing the boundaries of what SAM can segment and understand.
The Big Idea(s) & Core Innovations
At its heart, the recent research coalesces around three major themes: domain adaptation, efficiency and prompt engineering, and multimodal fusion. Researchers are consistently finding novel ways to adapt SAM to specialized, often challenging, domains. For instance, in medical imaging, the challenge lies in anatomical complexity and data scarcity. SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images from Zhejiang University introduces hierarchical self-supervised learning with an HL-Attn module to capture multi-level features, drastically improving medical image segmentation performance with 90% fewer fine-tuning epochs. Similarly, UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets from the University of Strasbourg leverages the massive US-43d dataset to train a specialized SAM for ultrasound, even proposing “prompted classification” as a new use case for structural analysis.
Efficiency and intelligent prompt engineering are crucial for practical deployment. SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection by researchers from Hangzhou Dianzi University and Shandong University tackles prompt dependency and memory consumption in RGB-D video segmentation. They introduce PAMIE for prompt-free fine-tuning and QTM for learnable query pipelines. For edge devices, PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications from Sony, Stanford, and UC Berkeley demonstrates in-sensor processing for real-time, low-latency segmentation, bypassing heavy cloud-based processing.
Multimodal fusion is another potent avenue. HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object Detection leverages dynamic convolution and prompt engineering to combine RGB and thermal data, significantly boosting salient object detection in complex environments. Addressing the complexities of surgical scenes, Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review from MBZ University of AI highlights SAM’s role in tool detection, workflow recognition, and training simulations, often through prompt tuning and adapter layers.
Under the Hood: Models, Datasets, & Benchmarks
This wave of innovation is fueled by new models, specialized datasets, and rigorous benchmarks:
- UltraSam and US-43d Dataset: Introduced in UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets, this is the largest public ultrasound segmentation dataset (282,321 image-mask pairs across 43 datasets), crucial for robust medical imaging. Code: https://github.com/CAMMA-public/UltraSam
- SAM-DAQ (PAMIE, QTM): A novel architecture in SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection featuring a Parallel Adapter-based Multi-modal Image Encoder and Query-driven Temporal Memory for efficient RGB-D video segmentation. Code: https://github.com/LinJ0866/SAM-DAQ
- SpinalSAM-R1 (CBAM, LoRA, DeepSeek-R1): An integrated system from Nanjing University of Aeronautics and Astronautics, detailed in SpinalSAM-R1: A Vision-Language Multimodal Interactive System for Spine CT Segmentation, which combines SAM with CBAM for feature refinement and LoRA for efficient fine-tuning, guided by the DeepSeek-R1 language model. Code: https://github.com/6jm233333/spinalsam-r1
- KG-SAM: A knowledge-guided extension of SAM from The George Washington University, presented in KG-SAM: Injecting Anatomical Knowledge into Segment Anything Models via Conditional Random Fields, using Conditional Random Fields (CRF) and medical knowledge graphs for anatomical consistency. No public code provided in the summary.
- UCIS4K Dataset & UCIS-SAM: Introduced by Shanghai Jiao Tong University in Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset, this is a novel dataset for underwater camouflaged instance segmentation alongside the superior UCIS-SAM model. Code: https://github.com/wchchw/UCIS4K
- LM-EEC: From the University of Chinese Academy of Sciences and University of North Texas, detailed in Robust Ego-Exo Correspondence with Long-Term Memory, this framework enhances SAM2 with a dual-memory architecture and Memory-View MoE module for robust ego-exo correspondence. Code: https://github.com/juneyeeHu/LM-EEC
- PolSAM (MVD, FFP, SFP): Presented by Northwestern Polytechnical University in PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model, this framework integrates physical scattering characteristics from PolSAR data using Microwave Vision Data (MVD) and specialized fusion modules. Code: https://github.com/XAI4SAR/PolSAM
Impact & The Road Ahead
The impact of SAM and SAM2’s continuous evolution is profound, driving advancements across diverse fields. In healthcare, models like SAMRI: Segment Anything Model for MRI and SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation promise faster, more accurate diagnostics and surgical planning, while Foam Segmentation in Wastewater Treatment Plants: A Federated Learning Approach with Segment Anything Model 2 demonstrates crucial applications in industrial monitoring and environmental management.
For robotics and automation, the zero-shot capabilities of SAM are critical. Zero-Shot Multi-Animal Tracking in the Wild and Object-Centric 3D Gaussian Splatting for Strawberry Plant Reconstruction and Phenotyping highlight its utility in wildlife monitoring and precision agriculture, respectively. However, challenges remain, as explored in How Universal Are SAM2 Features?, which identifies limitations in feature generalizability and underscores the ongoing need for task-specific fine-tuning or domain adaptation.
The future is bright, with research pushing towards increasingly autonomous and robust segmentation. From novel prompting mechanisms like in BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts to self-supervised open-world segmentation with SOHES: Self-supervised Open-world Hierarchical Entity Segmentation, these advancements pave the way for a future where AI understands and interacts with our visual world with unparalleled precision and adaptability.
Share this content:
Post Comment