Outcome Prediction for Patients with Oropharyngeal Cancer

Abstract

With the emergence of new cancer subtypes and treatment options, there is a growing need for personalized treatment in patients with oropharyngeal squamous cell carcinoma (OPSCC). Developing robust outcome prediction models capable of identifying low and high-risk patients prior to treatment is a non-trivial task that may ultimately assist in stratifying patients for intensified or de-escalated treatment strategies, most suitable for them, without compromising their survival.

Deep learning methods have demonstrated remarkable potential for predicting prognostic outcomes in head and neck cancers. The standard approach entails fully-supervised learning on volumetric medical images. However, annotating 3D medical images is an immensely time-consuming and expensive process, and in some cases, it may be even infeasible due to stringent privacy regulations. Consequently, the availability of labeled data in this domain is often limited, rendering the training process extremely challenging.

Inspired by recent advances in self-supervised learning, this study delves into various contrastive learning frameworks for learning visual representations from medical images without relying on manual annotations. Throughout this endeavor, we also explore a diverse range of medical imaging modalities as input to determine the optimal configuration. Additionally, we conduct comparisons among different architectural choices, including convolutional neural networks and vision transformers. Furthermore, we investigate the extraction of features from multiple intermediate layers of these architectures to gain insights into the contribution of lower-level representations to the predictive performance of the models.

The ultimate goal is to improve long-term survival rates by accurately identifying potential high-risk patients prior to treatment based solely on their diagnostic imaging tests. To this end, two datasets sourced from the publicly accessible TCIA (The Cancer Imaging Archive), namely the Head-Neck-PET-CT and HNSCC collections, are employed for pre-training the models. Subsequently, the OPC-Radiomics dataset from the same repository is utilized for fine-tuning. Finally, we assess the generalization ability of our models on an independent external set of 400 OPSCC patients provided by the University Medical Center Groningen, the Netherlands (UMCG).

📈 Our best model achieves a 15% increase in accuracy compared to the state‑of‑the‑art methods.

Research Questions

We framed our investigation on contrastive representation learning for medical image analysis in terms of the following research questions:

RQ1. What is the optimal contrastive learning protocol?

RQ2. What is the optimal medical imaging modality?

RQ3. What is the optimal encoder for learning meaningful representations?

RQ4. Does multi-level feature extraction empower contrastive representation learning?

RQ5. Does ensemble learning boost performance over individual models?