Main Session
Sep 28
PQA 01 - Radiation and Cancer Physics, Sarcoma and Cutaneous Tumors

2137 - One-Shot Tuning-Based Patient-Specific Projection-to-Volume Translation Model for Real-Time 3D Imaging during Radiotherapy: A Feasibility Study

02:30pm - 04:00pm PT
Hall F
Screen: 26
POSTER

Presenter(s)

Hugh Lee, PhD - Washington University School of Medicine, Saint Louis, MO

E. Kim1,2, Y. Chung1, and H. Lee2; 1Department of Nuclear Engineering, Hanyang University, Seoul, Korea, Republic of (South), 2Washington University School of Medicine, St. Louis, MO

Purpose/Objective(s): Managing patient motion during radiotherapy is critical to ensuring treatment accuracy. In stereotactic body radiation therapy (SBRT), intra-treatment kV projections are commonly used for patient monitoring. However, these 2D images provide limited information and cannot fully capture anatomical changes during treatment. Real-time 3D imaging has long been a goal in radiotherapy but remains challenging due to the ill-posed nature of the reconstruction problem using conventional analytical approaches. In this study, we investigated the feasibility of a clinically applicable real-time 3D imaging approach utilizing a one-shot tuning method. This approach involves pretraining on diverse patient data and rapidly fine-tuning for an individual patient using only a single projection-volume pair.

Materials/Methods: Cone-beam CT (CBCT) scans from 80 spine SBRT patients (five fractions each) were divided into 60 for training, 10 for validation, and 10 for testing. Projection data were generated from scans using the ASTRA toolkit in Python. An Attention U-Net was pretrained on projection-volume pairs from the training set (fractions 1–5) and subsequently fine-tuned for each validation and test patient using only their first fraction. Data augmentation was performed by applying deformation vectors generated via thin plate spline interpolation, introducing random voxel displacements of 0–5 mm within each patient’s anatomy. The model was trained using a combination of L1, perceptual, gradient difference error, and adversarial losses. Performance was assessed using Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), Mean Absolute Error (MAE), and spine Dice coefficient, calculated from the body or the spinal column.

Results: For the 10 test patients, our model achieved an average SSIM of 0.7914, PSNR of 29.30 dB, MAE of 52.51 HU, and spine Dice of 0.5755 with one-shot tuning. Without fine-tuning (pretraining only), the model yielded an SSIM of 0.6761, PSNR of 24.31 dB, MAE of 106.84 HU, and spine Dice of 0.3812 on the same test patients. Pretraining and tuning required approximately 34 hours (459,000 steps) and 0.91 hours (10,100 steps), respectively, while inference was completed in under 0.05 seconds. This processing speed suggests a potential real-time application at =2 frames per second (fps), aligning with the clinical workflow of our institution’s SBRT routine. Although our SSIM is slightly lower than that reported in some studies, our approach demonstrates feasibility for real-time 3D imaging, particularly in preserving critical structures such as the spine.

Conclusion: This study demonstrates the feasibility of a clinically applicable real-time 3D imaging for SBRT by leveraging a deep learning model that rapidly adapts to individual patients using a single projection-volume pair. While our approach shows significant potential, further research will focus on enhancing structural accuracy for enhanced clinical applications.