Tracking-Aware Deformation Field Estimation for Non-rigid 3D Reconstruction in Robotic Surgeries

1 Shanghai Jiao Tong University 2 Valeo.ai, Paris, France
*Indicates Equal Contribution
Teaser Image

Overview of TADF. For input video frames, we first sample key points and track those sampled points with the foundation CoTracker model. Taking the key point displacement as an additional input, our designed neural deformation field could predict an accurate deformation of tissues.

Abstract

Minimally invasive procedures have been advanced rapidly by the robotic laparoscopic surgery. The latter greatly assists surgeons in sophisticated and precise operations with reduced invasiveness. Nevertheless, it is still safety critical to be aware of even the least tissue deformation during instrument-tissue interactions, especially in 3D space. To address this, recent works rely on NeRF to render 2D videos from different perspectives and eliminate occlusions. However, most of the methods fail to predict the accurate 3D shapes and associated deformation estimates robustly. Differently, we propose Tracking-Aware Deformation Field (TADF), a novel framework which reconstructs the 3D mesh along with the 3D tissue deformation simultaneously. It first tracks the key points of soft tissue by a foundation vision model, providing an accurate 2D deformation field. Then, the 2D deformation field is smoothly incorporated with a neural implicit reconstruction network to obtain tissue deformation in the 3D space. Finally, we experimentally demonstrate that the proposed method provides more accurate deformation estimation compared with other 3D neural reconstruction methods in two public datasets.

Experimental Results of all the Comparison Methods on EndoNeRF and SCARED Datasets.

Visualization Results of TADF on EndoNeRF dataset.

Teaser Image

Intermediate Process Presentation for our Pipeline. Cutting means EndoNeRF-cutting dataset and Pulling means EndoNeRF-pulling dataset. Each subfigure represents a) reference - the input video frame, b) key point displacement - the sampled key points displacement tracked by foundation cotracker model, c) 2D deformation field - 2D visualization of deformation, d) 2D rendering - the rendering 2D videos after removing the surgical tools, e) reconstructed mesh - 3D reconstruction generated with the estimated deformation, and f) 3D deformation field - 3D visualization of deformation.

A short video that display intermediate process for our pipeline on EndoNeRF-cutting dataset and EndoNeRF-pulling dataset.

BibTeX

@misc{wang2025trackingawaredeformationfieldestimation,
      title={Tracking-Aware Deformation Field Estimation for Non-rigid 3D Reconstruction in Robotic Surgeries}, 
      author={Zeqing Wang and Han Fang and Yihong Xu and Yutong Ban},
      year={2025},
      eprint={2503.02558},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.02558}, 
      }