Segment Anything Model: The Latest Frontiers in Universal Segmentation
Latest 45 papers on segment anything model: Aug. 11, 2025
The Segment Anything Model (SAM) and its successor, SAM2, have revolutionized computer vision by offering powerful, prompt-driven object segmentation capabilities across a vast array of visual data. But what happens when you push these foundational models into specialized, complex, and often challenging real-world scenarios? Recent research unveils an exciting landscape of advancements, demonstrating how SAM and SAM2 are being tamed, augmented, and reimagined to tackle everything from medical diagnostics to industrial inspection, and even outer space.
The Big Idea(s) & Core Innovations
The overarching theme across these breakthroughs is the strategic adaptation and enhancement of SAM’s inherent power. Researchers are tackling the ‘intent gap’ and context limitations of general-purpose models, making them more precise, robust, and efficient for niche applications. For instance, MAUP: Training-free Multi-center Adaptive Uncertainty-aware Prompting for Cross-domain Few-shot Medical Image Segmentation leverages innovative prompting strategies, like spatial diversity and uncertainty guidance, to enable precise medical image segmentation without extensive training, outperforming conventional models in few-shot, cross-domain settings. Similarly, SAMPO: Visual Preference Optimization for Intent-Aware Segmentation with Vision Foundation Models by Fudan University introduces visual preference optimization to infer high-level intent from sparse visual prompts, significantly improving multi-object segmentation, particularly in medical tasks, with minimal training data.
Bridging the gap between general models and domain-specific challenges is another key innovation. Anhui University’s Segment Any Vehicle: Semantic and Visual Context Driven SAM and A Benchmark integrates semantic and visual context with SAM to significantly improve vehicle segmentation accuracy in complex environments. In a similar vein, Taming SAM for Underwater Instance Segmentation and Beyond demonstrates that SAM can be effectively adapted for challenging underwater environments through knowledge distillation, achieving a notable performance boost in marine object detection.
For more dynamic scenarios, SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree from The Chinese University of Hong Kong addresses the critical problem of error accumulation in long video sequences, using a training-free memory tree to maintain segmentation diversity and reliability through occlusions and object reappearances. This aligns with the comprehensive review presented in Segment Anything for Video: A Comprehensive Review of Video Object Segmentation and Tracking from Past to Future, which highlights how motion-aware memory selection and trajectory-guided prompting are crucial for improving accuracy and efficiency in video object segmentation and tracking (VOST) tasks.
Under the Hood: Models, Datasets, & Benchmarks
These advancements aren’t just theoretical; they’re built on and contribute to a robust ecosystem of models, datasets, and practical tools:
- SGDFuse (by Zhang, Wang, Chen from Institute of Advanced Technology, University of Science and Technology) introduces a novel SAM-guided diffusion model for high-fidelity infrared and visible image fusion, with code available at https://github.com/boshizhang123/SGDFuse.
- Decoupling Continual Semantic Segmentation (by Guo, Lu, Zhang et al. from Sun Yat-sen University) proposes DecoupleCSS, a two-stage framework for Continual Semantic Segmentation (CSS) that leverages pre-trained encoders and SAM with LoRA adaptation. Code is at https://github.com/euyis1019/Decoupling-Continual-Semantic-Segmentation.
- MLLMSeg (by Wang, Wu, Huang et al. from East China Normal University) introduces a lightweight mask decoder and detail-enhanced feature fusion for Referring Expression Segmentation (RES) using MLLMs, with code at https://github.com/jcwang0602/MLLMSeg.
- SAM2-UNeXT (by Xiong, Wu, Zhang et al. from Sun Yat-sen University) combines SAM2 and DINOv2 with dual-resolution processing for improved segmentation across diverse benchmarks. Code is at https://github.com/WZH0120/SAM2-UNeXT.
- MAUP (by Y. Zhu et al. from National Natural Science Foundation of China) for medical image segmentation offers a training-free adaptive prompting strategy with code at https://github.com/YazhouZhu19/MAUP.
- Zero-shot Shape Classification of Nanoparticles in SEM Images using Vision Foundation Models by Barnatan et al. explores SAM and DINOv2 for nanoparticle classification without extensive training, with code at https://github.com/freida20git/nanoparticle-classification/.
- Rein++ (by Liao, Guo, Liu from Fudan University) enables efficient generalization and adaptation of VFMs for semantic segmentation, providing code at https://github.com/wloves/Rein.
- UncertainSAM (by Kaiser, Norrenbrock, Rosenhahn from Leibniz University Hannover) offers a lightweight framework for uncertainty quantification in SAM, with resources at https://arxiv.org/pdf/2505.05049.
- Taming SAM for Underwater Instance Segmentation and Beyond (by Liam Lian) uses the UIIS10K dataset for underwater segmentation, with code at https://github.com/LiamLian0727/UIIS10K.
- MergeSAM (by Hu, Lu, Han, Liu from Sun Yat-sen University) provides an unsupervised change detection method for remote sensing using SAM, detailed at https://arxiv.org/pdf/2507.22675.
- RS2-SAM2 (by Rong, Lan, Zhang, Zhang from Wuhan University) customizes SAM2 for referring remote sensing image segmentation with code at https://github.com/whu-cs/rs2-sam2.
- SAM2-Aug (by Xu, Dai, Zhao et al. from University of Texas Southwestern Medical Center) leverages prior knowledge for tumor auto-segmentation in radiation therapy, providing trained models at https://github.com/apple1986/SAM2-Aug.
- TextSAM-EUS (by Spiegler, Koleilat, Harirpoush et al. from Concordia University) enables text-driven pancreatic tumor segmentation in ultrasound, described at https://arxiv.org/pdf/2507.18082.
- ScSAM (by “Author 1” et al. from “Institution A”) fuses SAM and MAE features for subcellular semantic segmentation, enhancing accuracy in electron microscopy. Details at https://arxiv.org/pdf/2507.17149.
- CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation (by Chen, Meng, Yang et al. from University of Electronic Science and Technology of China) introduces an automated prompt generation mechanism for cross-domain few-shot segmentation, available at https://arxiv.org/pdf/2507.16753.
- PlantSAM (by Sklab, Castanet, Ariouat et al.) combines YOLOv10 and SAM2 for plant segmentation in herbarium images, found at https://arxiv.org/pdf/2507.16506.
- OP-SAM: One Polyp Identifies All: One-Shot Polyp Segmentation with SAM via Cascaded Priors and Iterative Prompt Evolution (by Mao, Xing, Meng et al.) introduces a one-shot polyp segmentation framework with code at https://github.com/Hectormxy/OP-SAM.
- MA-SAM2: Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation (by M.Yin et al.) enhances SAM2 for surgical video segmentation with memory modules. Code available at https://github.com/Fawke108/MA-SAM2.
- FastSmoothSAM: A Fast Smooth Method For Segment Anything Model (by Xu, Chen from Huaqiao University) improves FastSAM’s edge quality using B-Spline curve fitting. Code at https://github.com/XF astDataLab/F astSmoothSAM.
- Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2 (by Xu, Kabat, Zhang from University of Texas Southwestern Medical Center) introduces DD-SAM2 for efficient fine-tuning of SAM2 in medical videos. Code at https://github.com/apple1986/DD-SAM2.
- ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection (by Guo, Shehata, Du from University of British Columbia) integrates optical flow and vision-language models with SAM for zero-shot video camouflage segmentation, available at https://arxiv.org/pdf/2505.01431.
- OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation (by Zhong, Zheng, Liao et al.) adapts SAM2 for panoramic semantic segmentation under unsupervised domain adaptation, detailed at https://arxiv.org/pdf/2503.07098.
- RegCL: Continual Adaptation of Segment Anything Model via Model Merging (by Shu, Lin, Wang from Peking University) enables SAM to adapt across domains via model merging. Paper at https://arxiv.org/pdf/2507.12297.
- SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation (from mdpi.com) uses SAM for pseudo-label generation in remote sensing, available at https://arxiv.org/pdf/2507.11994.
- Landmark Detection for Medical Images using a General-purpose Segmentation Model integrates YOLO 11 and SAM for anatomical landmark detection, with code at https://github.com/Schobs/MediMarker.
- CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation (by Zhang, Liu, Tang from South China University of Technology) improves CLIP’s segmentation performance by addressing patch correlations, with code at https://github.com/zdk258/CorrCLIP.
Impact & The Road Ahead
The impact of these advancements is profound, pushing the boundaries of what universal segmentation models can achieve. From making complex medical diagnostics more precise and less labor-intensive, as seen in the advancements for tumor and polyp segmentation, to enabling robust autonomous systems in challenging environments like underwater or space, SAM and SAM2 are proving incredibly versatile. The move towards training-free, parameter-efficient adaptation, often leveraging lightweight adapters and innovative prompting strategies, means these powerful models can be deployed more broadly without prohibitive computational costs or vast new datasets.
Future work will undoubtedly focus on further enhancing the robustness of these adapted models to noise, improving their ability to generalize to truly unseen domains, and developing more intuitive and less labor-intensive prompting mechanisms. The continued integration of multimodal inputs—be it text, thermal data, or motion cues—will unlock even more sophisticated perception capabilities. As these papers collectively demonstrate, the Segment Anything Model is not just a breakthrough; it’s a dynamic platform for ongoing innovation, continually redefining the landscape of computer vision and its real-world applications.
Post Comment