2165 - Using a Convolutional Neural Network to Predict Physician Likert Score Assignments to Synthetic Medical Images
Presenter(s)
J. Luce1,2, A. Yunker3, A. Keeler1, H. Nguyen1, J. Dingillo2, S. A. A. Gros1, A. M. Block1, N. H. Darwish1, M. Bhandari1, H. Kang1, S. Beer1, R. Kettimuthu3, G. K. Thiruvathukal2, and J. C. Roeske1; 1Department of Radiation Oncology, Stritch School of Medicine, Cardinal Bernardin Cancer Center, Loyola University Chicago, Maywood, IL, 2Loyola University, Chicago, IL, 3Data Science and Learning Division, Argonne National Laboratory, Lemont, IL
Purpose/Objective(s): Synthetic medical images generated by artificial intelligence are seeing increased use in radiation oncology. However, a challenge in synthetic image generation is the blurring introduced by pixelwise (L1, L2, Huber) loss functions used during network training. Alternatively, loss functions designed to mimic human vision (SSIM, perceptual), can be used to improve visual quality. The quality of synthetic medical images is conventionally evaluated using metrics such as SSIM, PSNR, and RMSE. However, these metrics do not always align with human perceptions of image quality. Physician observer studies are an ideal way to evaluate synthetic medical images but are labor intensive. As such, the purpose of this study is to evaluate the use of a convolutional neural network (CNN), trained as a physician model observer, as a method for evaluating the quality of synthetic medical images.
Materials/Methods: 23 head-and-neck CBCT patient scans were used to generate clinical-dose images using the full projection data as well as simulated low-dose images sampled from 1/8 of the projection data. A U-net neural network was trained to transform the low-dose images into synthetic clinical-dose images. Three radiation oncologists ranked the quality of the clinical-dose and synthetic clinical-dose images, with respect to soft tissue delineation, on a 1-5 Likert scale. These scores were used as labels to perform transfer learning with a pretrained image classifier. This trained model observer was found to predict physician Likert scores with ~95% accuracy. Separately, a U-net was trained to transform low-dose images into synthetic clinical-dose images using Huber, L1, L2, SSIM, and perceptual loss functions. The model observer was then used to predict physician Likert score assignment for these images.
Results: There were minimal differences in network performance tests (SSIM, RMSE, PSNR) for the different loss functions. However, there was a notable increase in model-observer predicted Likert scores for the SSIM and perceptual loss functions. The specific values for these metrics are shown below. A paired-t-test analysis indicates that the Likert score distributions associated with the SSIM and perceptual loss functions were statistically different from the Huber, L1, and L2 loss functions, with p < 0.05 for all comparisons.
Conclusion: The model observer predicted a higher mean Likert score for synthetic images generated from the SSIM and perceptual loss functions. This suggests the model observer was able to identify perceptual improvements in these images that were not captured by conventional imaging metrics.
Abstract 2165 - Table 1| Loss function | Mean SSIM | Mean RMSE | Mean PSNR (dB) | Mean Likert score |
| Huber | 0.94 +/- 0.05 | 0.015 +/- 0.007 | 37.0 +/- 3.6 | 2.78 +/- 0.63 |
| L1 | 0.94 +/- 0.05 | 0.016 +/- 0.008 | 37.0 +/- 3.8 | 2.79 +/- 0.62 |
| L2 | 0.94 +/- 0.05 | 0.016 +/- 0.009 | 37.0 +/- 4.0 | 2.79 +/- 0.65 |
| SSIM | 0.94 +/- 0.05 | 0.016 +/- 0.008 | 37.0 +/- 3.8 | 2.97 +/- 0.80 |
| Perceptual | 0.93 +/- 0.05 | 0.017 +/- 0.009 | 36.0 +/- 3.7 | 3.11 +/- 0.90 |