skip to content

Advancing 3D Segmentation: Deep Learning Techniques for Video and Medical Imaging

Abstract: In the realm of computer vision, segmentation techniques have long been a cornerstone for understanding and interpreting complex visual data. While 2D segmentation has been extensively explored and has seen significant advancements, extending these successes to 3D segmentation presents a unique set of challenges. The complexity inherent in 3D data, whether it be through temporal sequences in video analysis or another spatial dimension in medical imaging, demands innovative approaches that can accurately capture and interpret the intricacies of three-dimensional spaces. Despite the critical need, the leap from 2D to 3D segmentation remains a significant challenge, requiring not only more sophisticated computational models but also a deeper understanding of the underlying patterns and structures within 3D datasets. To address the above challenges, in this thesis, we establish a new foundation for 3D deep-learning-based segmentation, incorporating novel priors and constraints that extend across dynamic video analysis and medical image processing. It stitches together five significant contributions, each addressing distinct challenges within these domains through innovative computational approaches.

The first segment of the thesis, comprising three chapters, delves into video shadow segmentation, object detection and segmentation in traffic videos, and multi-object tracking in traffic videos. These chapters collectively (Chapter 2) introduce a novel framework SCOTCH and SODA, leveraging Transformer models for video shadow segmentation, (Chapter 3) propose a motion-prior-based method for enhancing object detection in traffic videos, and (Chapter 4) present TrafficMOT, a challenging dataset aimed at advancing multi-object tracking in complex traffic scenarios. Through these studies, the thesis underscores the importance of temporal and spatial analysis in handling 3D datas with time-sequences, i.e., videos.

Transitioning to 3D medical imaging data, where all three dimensions contain spatial information, the latter 2 chapters of the thesis focus on unsupervised approaches to medical image registration and segmentation. They introduce two groundbreaking methodologies: CLMorph and PC-SwinMorph. (Chapter 5) The former proposes a contrastive learning framework for unsupervised medical image segmentation, while (Chapter 6) the latter presents a patch representation technique, PC-SwinMorph, for enhancing medical image registration and segmentation. These contributions not only showcase the application of 3D segmentation principles in a real 3D context but also highlight the overarching theme of the thesis: the versatility and efficacy of 3D segmentation techniques across varied applications.

Overall, by weaving together findings from video analysis and medical imaging, this thesis not only bridges methodological gaps between these fields but also demonstrates the universal applicability and potential of 3D segmentation techniques. Through comprehensive methodological frameworks and diverse application domains, it contributes significantly to the advancement of deep-learning-based computational techniques for understanding complex 3D visual data in both video and medical contexts. The synergy among the presented works exemplifies a multidisciplinary approach to tackling segmentation challenges, setting a foundation for future research in both video processing and medical image analysis.