Main Session
Sep 28
PQA 01 - Radiation and Cancer Physics, Sarcoma and Cutaneous Tumors

2220 - Comparative Evaluation of Auto-Contouring Software and Physician Review for Brain Metastasis Detection in Stereotactic Radiosurgery

02:30pm - 04:00pm PT
Hall F
Screen: 13
POSTER

Presenter(s)

Sophia Shah, BS - Thomas Jefferson University, Philadelphia, PA

S. Shah1, H. Liu2, Z. Xiao3, Y. Chen4, W. Wang5, L. J. Wilson6, K. Judy7, K. Talekar8, J. Evans9, P. Hiepe10, S. P. Giner10, and W. Shi1; 1Thomas Jefferson University, Philadelphia, PA, 2Department of Radiation Oncology, Sidney Kimmel Medical College & Cancer Center at Thomas Jefferson University, Philadelphia, PA, 3Department of Radiation Oncology, Philadelphia, PA, 4University of Pennsylvania Health System, Philladelphia, PA, 5Duke University Medical Center, Durham, NC, 6St. Jude Children's Research Hospital, Memphis, TN, 7Dept of Neurosurgery, Thomas Jefferson University Hospital, Philadelphia, PA, 8Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, PA, 9Department of Neurosurgery, Sidney Kimmel Medical College of Thomas Jefferson University, Philadelphia, PA, 10BrainLAB AG, Munich, Germany

Purpose/Objective(s): Advancements in artificial intelligence (AI) and automation have led to the development of new tools for automated contouring and treatment planning in radiation therapy. Auto-contouring software aims to enhance accuracy, increase efficiency, and minimize variability among providers. This study compares its performance with physician-generated contours to evaluate the effectiveness of auto-contouring software in stereotactic radiosurgery (SRS) for brain metastases.

Materials/Methods: Following IRB approval, we conducted a retrospective cohort study analyzing medical records of patients who underwent radiosurgery for brain metastases at our institution between 2016 and 2022. Physician-drawn contours were compared with those generated by auto-contouring software. For each pair of matched contours, DICE coefficient, Hausdorff distance, and centroid distance were calculated and reported for volumes >0.1cc and =0.1cc. Three independent physicians reviewed discrepancy lesions identified solely by physicians or auto-contouring software. Each object was assessed on a 3D T1 contrast-enhanced MRI, with physicians rating their confidence in its classification as a true brain metastasis on a 0–100 scale. The subsequent progression of these discrepant lesions was then evaluated to determine whether they represented true metastases, based on identification and treatment in later SRS sessions.

Results: The study included 42 patients with multiple brain metastases. Of 223 metastases contoured clinically, the auto-contouring software identified 236. Of these, 203 were matched between both methods, 53 were discrepant—33 were identified solely by the auto-contouring algorithm, and 20 were identified only through clinical contouring. The DICE coefficient is 0.84 ± 0.09 (mean ± standard deviation) and 0.63 ± 0.17, Hausdorff distance is 1.97 ± 1.13 mm and 1.36 ± 0.6 mm, centroid distance is 0.41 ± 0.26 and 0.44 ± 0.31 mm for lesions greater than and less than 0.1 cc, respectively. The physicians agreed that 69.7% of the auto-contours were lesions, while 55% of the physician-drawn contours were lesions. Among the 33 lesions detected exclusively by the algorithm, 18 (54.5%) were confirmed as true positives, while 15 (45.5%) were false positives. The algorithm’s performance was assessed against evaluations by three independent physicians. The lesions were notably small in five cases where physician ratings were inconsistent, with a mean volume of 0.024 cc.

Conclusion: The auto-lesion detection algorithm demonstrates promise in enhancing the speed and consistency of brain metastasis identification, potentially expediting treatment planning. Initial findings indicate its potential utility in detecting metastases; however, false positives and missed lesions underscore the importance of careful interpretation and physician oversight. Further algorithm refinement is necessary to improve accuracy and evaluate its long-term impact on clinical outcomes.