Article
Open access
Published: 19 August 2023

Predicting HPV association using deep learning and regular H&E stains allows granular stratification of oropharyngeal cancer patients

Sebastian Klein ORCID: orcid.org/0000-0002-2188-9377^1,2,
Nora Wuerdemann^3,4,
Imke Demers ORCID: orcid.org/0000-0001-9531-8414⁵,
Christopher Kopp³,
Jennifer Quantius²,
Arthur Charpentier^3,4,
Yuri Tolkach²,
Klaus Brinker⁶,
Shachi Jenny Sharma^3,4,
Julie George^3,7,
Jochen Hess⁸,
Fabian Stögbauer ORCID: orcid.org/0000-0002-8897-5165⁹^nAff16,
Martin Lacko ORCID: orcid.org/0000-0003-2868-4822¹⁰,
Marijn Struijlaart¹⁰,
Mari F.C.M. van den Hout¹¹,
Steffen Wagner ORCID: orcid.org/0000-0003-0873-1601¹²,
Claus Wittekindt^12,13,
Christine Langer¹²,
Christoph Arens ORCID: orcid.org/0000-0001-8072-1438¹²,
Reinhard Buettner ORCID: orcid.org/0000-0001-8806-4786²,
Alexander Quaas²,
Hans Christian Reinhardt^1,14,15,
Ernst-Jan Speel⁵ &
…
Jens Peter Klussmann^3,4

npj Digital Medicine volume 6, Article number: 152 (2023) Cite this article

3152 Accesses
7 Altmetric
Metrics details

Subjects

Abstract

Human Papilloma Virus (HPV)-associated oropharyngeal squamous cell cancer (OPSCC) represents an OPSCC subgroup with an overall good prognosis with a rising incidence in Western countries. Multiple lines of evidence suggest that HPV-associated tumors are not a homogeneous tumor entity, underlining the need for accurate prognostic biomarkers. In this retrospective, multi-institutional study involving 906 patients from four centers and one database, we developed a deep learning algorithm (OPSCCnet), to analyze standard H&E stains for the calculation of a patient-level score associated with prognosis, comparing it to combined HPV-DNA and p16-status. When comparing OPSCCnet to HPV-status, the algorithm showed a good overall performance with a mean area under the receiver operator curve (AUROC) = 0.83 (95% CI = 0.77-0.9) for the test cohort (n = 639), which could be increased to AUROC = 0.88 by filtering cases using a fixed threshold on the variance of the probability of the HPV-positive class - a potential surrogate marker of HPV-heterogeneity. OPSCCnet could be used as a screening tool, outperforming gold standard HPV testing (OPSCCnet: five-year survival rate: 96% [95% CI = 90–100%]; HPV testing: five-year survival rate: 80% [95% CI = 71–90%]). This could be confirmed using a multivariate analysis of a three-tier threshold (OPSCCnet: high HR = 0.15 [95% CI = 0.05–0.44], intermediate HR = 0.58 [95% CI = 0.34–0.98] p = 0.043, Cox proportional hazards model, n = 211; HPV testing: HR = 0.29 [95% CI = 0.15–0.54] p < 0.001, Cox proportional hazards model, n = 211). Collectively, our findings indicate that by analyzing standard gigapixel hematoxylin and eosin (H&E) histological whole-slide images, OPSCCnet demonstrated superior performance over p16/HPV-DNA testing in various clinical scenarios, particularly in accurately stratifying these patients.

Explainable prediction model for the human papillomavirus status in patients with oropharyngeal squamous cell carcinoma using CNN on CT images

Article Open access 20 June 2024

Integrative analysis of H&E and IHC identifies prognostic immune subtypes in HPV related oropharyngeal cancer

Article Open access 03 October 2024

Deep learning for predicting prognostic consensus molecular subtypes in cervical cancer from histology images

Article Open access 11 January 2025

Introduction

The incidence of human papillomavirus (HPV)-associated oropharyngeal squamous cell carcinomas (OPSCC), a cancer localized primarily at the tonsils or base of tongue, and to a lesser extent at the soft palate and the uvula – is rising in Western countries¹. The distinct biology of this disease is appreciated by incorporating p16 status, as a surrogate marker for HPV-infection, to the most recent (8th) staging system of the American Joint Committee on Cancer – highlighting that HPV-associated OPSCC are distinct from their HPV-negative counterparts².

Due to the better prognosis of HPV-positive tumors and the considerable side effects of multimodal therapies, attempts have been made to de-intensify therapy. Here, various strategies have been pursued, such as trans-oral surgery, reduction of radiotherapy or omission of chemotherapy in several ongoing clinical trials¹. Decisive for de-escalation strategies is exact patient selection. Most often, only p16 is used as an inclusive biomarker. However, there are approximately 10–15% of cases which are discordant by p16 and HPV DNA- or RNA-status. These patients have an increased risk of distant metastases^3,4. Therefore, the combination of p16 immunohistology and HPV DNA detection (or HPV RNA, which is technically more difficult) should be considered as the gold standard. Clinically, the heterogeneity of dichotomous testing for both HPV high-risk DNA and p16 should be considered, in addition to the inherent heterogeneity of HPV-viral load^5,6,7,8,9.

A more complex picture emerges as recent studies indicate that HPV association in OPSCC seems to be more heterogeneous¹⁰. For instance, single-nucleotide polymorphisms in HPV-16 subtypes may be associated with a worse outcome in HPV-associated OPSCC¹¹. Other studies focusing on the immune microenvironment showed that highly inflamed HPV-associated OPSCC cases have a favorable outcome^12,13. In line with these findings, a retrospective analysis of tissue specimens from the TROG12.01 and de-ESCALaTE trials found that CD103 abundance identified patients with improved outcomes¹⁴. Previous studies have shown that within subpopulations of OPSCC, using p16 as surrogate for HPV association, information from regular H&E stains could be used to stratify OPSCC patients^15,16.

Recently, we developed a deep learning-based approach to predict the HPV association using scans of regular H&E stains¹⁷. The previous approach included a smaller dataset with a focus on prediction of HPV-status and comparison to human observers. However, it lacked information on treatment modalities and different disease stages, as well as different tumor locations (primary/lymph node metastases). In this study, our objective is to develop a modular and computationally efficient algorithm designed to stratify OPSCC patients with higher accuracy compared to conventional HPV testing. To achieve this, we include multiple clinical cohorts, which encompass a wide range of disease stages, primary tumors, and metastases. Subsequently, we evaluate the algorithm’s performance against the established gold standard in various clinical scenarios.

Results

Patient characteristics

The study population is derived from four centers and one database (Fig. 1a, Supplementary Table 1, n = 906). 192 (21.2%) patients were female and 714 (78.8%) were male, the mean age was 59.4 years (interquartile range, IQR = 12.9 years). There were 477 (52.7%) patients with stage I and II disease, compared to 396 with stage III-IV disease (43.7%) and 33 (3.6%) with incomplete data on disease stage (UICC 8^th). 364 patients (40.2%) received adjuvant radio-/chemotherapy and surgical treatment (S(C)RT), compared to 348 (38.4%) receiving definitive radio-/chemotherapy ((C)RT), as well as 129 (14.2%) patients receiving surgery alone (ST). There were 65 patients with incomplete treatment data (7.2%). All patients were analyzed for HPV-status, using either p16, HPV high-risk DNA or both (referred to as dichotomous p16/HPV-DNA testing). In total, there were 798 patients where dichotomous HPV-status was available (88.1%). Overall, there were 378 HPV-positive cases (41.7%), and 528 HPV-negative cases (58.3%).

**Fig. 1: Study design and overall concept.**

Building a deep learning model to predict HPV association and evaluating the performance

In continuation of our previous methodology for predicting the association of Human Papillomavirus (HPV) using histological samples¹⁷, we adopted a comparable approach. Specifically, we constructed a Feature Pyramid Network (FPN) with a ResNet-18 encoder to perform semantic segmentation of viable tumor regions. Subsequently, we classified the extracted tumor tiles based on their HPV association using an additional ResNet-18 network. To be data-efficient, we used a training dataset of 267 patients from two centers and one database (Fig. 1b).

The algorithm achieved an overall performance of area under the receiver operator curve (AUROC) of 0.83 (95% CI = 0.77–0.9; n = 639) for predicting HPV association on samples that were completely independent of the training dataset (Fig. 1c, test set). The sensitivity was 0.78 (95% CI = 0.67–0.89) and the specificity was 0.8 (95% CI = 0.62–0.99) with an accuracy of 0.77 (95% CI = 0.69–0.85). The performance on the training dataset was AUROC = 0.93 (sensitivity = 0.9, specificity = 0.83 and accuracy = 0.86; n = 267) (Supplementary Table 2, Fig. 2b).

**Fig. 2: Performance of the model to predict HPV association for six different patient populations.**

Next, we evaluated whether we could increase the performance of the model to maximize its sensitivity by applying a fixed threshold on the variance of the probability of the HPV-positive class (Supplementary Figs. 2a, b; graphically explained). We therefore randomly sampled three times 30 cases out of a collection of 536 cases of primary tumors, originating from the test set (Fig. 1c; sampling), to avoid overfitting our data. These cases were filtered by a given variance of the probability of the HPV-positive class (Supplementary Fig. 2d). By filtering cases with a threshold below $7\,{\rm{x}}\,{10}^{-2}$, the best tradeoff between sensitivity (0.89 ± 0.04), specificity (0.8 ± 0.06), accuracy (0.83 ± 0.05) and the AUROC (0.86 ± 0.04) could be observed (Supplementary Table 3, Supplementary Fig. 2d). By applying this fixed threshold, we extracted a total of 258 cases (48.1%) from a collection of 536 patients (Fig. 2c). Here, an AUROC of 0.88 could be achieved (Fig. 2D; sensitivity = 0.85, specificity = 0.84 and accuracy = 0.85, n = 258).

Prognostic relevance of determining HPV association using deep learning and a binary classification

We then investigated the prognostic role of predicting HPV association within this cohort and compared it to regular testing of HPV status. By setting a threshold of 50% of the tile class prevalence (Supplementary Figs. 2a, b), and mimicking the binary classification of HPV-positive/negative, a five-year overall survival rate of 80% [95% CI = 71–90%] for patients classified as HPV-positive was observed (Supplementary Fig. 3a), which was similar to the five-year overall survival rate of standard HPV testing of 80% [95% CI = 71–90%] (Supplementary Fig. 3b). Misclassification of the HPV-status was not to the disadvantage of patients when stratified for survival (Supplementary Figs. 3c, d).

Granular patient prognostication by predicting HPV association

To resolve the prognostic value of predicting HPV association, we used a Cox proportional hazards model. Here, the tile-class prevalence as a measure of HPV association (Supplementary Figs. 2a, b) inversely correlated with survival (Likelihood-ratio test (LRT) = 49.23, p < 0.001, employing a chi-square distribution; concordance index (c-index) = 0.71; Fig. 2e). Having shown that the prediction of HPV association strongly correlated to survival, we applied a three-tier threshold (high, referred to highly HPV-associated; low and intermediate above/below 20% and in between). By following this, we could identify patients with a five-year survival rate of 96% [95% CI = 90-100] (Fig. 2f, high) outperforming the previously mentioned 80% five-year survival rate of the gold standard of HPV testing [95% CI = 71-90%]. For patients classified as intermediate, the five-year survival rate was 54% [95% CI = 45–65%], and for patients classified as low 34% [95% CI = 23–52%]. A multivariate analysis including several clinical variables (therapy, tumor size, nodal status, smoking history, and sex) confirmed the prognostic superiority of predicting HPV association using OPSCCnet (high: HR = 0.15 [95% CI = 0.05–0.44], p < 0.001, Cox proportional hazards model; intermediate: HR = 0.43 [95% CI = 0.26–0.7], p = 0.043, Cox proportional hazards model; low: reference; n = 211; Supplementary Fig. 4a).

The gold standard of HPV testing showed less prognostic relevance of HPV testing when compared to the algorithm (LRT = 39.72, p < 0.0001, employing a chi-square distribution; c-index = 0.65), which was underlined by the results of a multivariate analysis (HPV testing: HPV-positive HR = 0.29 [95% CI = 0.15–0.54], p < 0.001, Cox proportional hazards model; n = 211; Supplementary Fig. 4b).

Developing a combined scoring of the variance of the tile class probability and the tile class prevalence for patient prognostication

By using the variance of the tile class probability (Supplementary Fig. 2a, b) we could increase the sensitivity and specificity of determining HPV association in about 50% of the study population in the external test set (n = 258), outperforming regular HPV-status in selecting patients with an improved prognosis. We therefore reasoned to investigate the prognostic relevance of combing variance of the tile class probability and the tile class prevalence together (Supplementary Figs. 2a, b), leading to a combined HPV association score (referred to as combined score; Supplementary Fig. 2c; Eq. (3)).

For the external test set, filtered for primary tumors, there was a strong association between the combined score and overall survival, by again using a Cox proportional hazards model (LRT = 62.26, p < 0.001, employing a chi-square distribution, n = 531; Fig. 3a). We then divided the patients into three separate groups (high HR = 0.17 [95% CI = 0.10–0.29]; intermediate HR = 0.49 [95% CI = 0.37–0.66]; low: reference) with distinct five-year survival rates (high: 85% [95% CI = 77–93%], intermediate: 56% [95% CI = 50–63%], low: 34% [95% CI = 25–45%]; Fig. 3b). A similar association, given the overall lower five-year overall survival rate of the training cohort, was observed in the training cohort (LRT = 24.86, p < 0.001, employing a chi-square distribution, n = 263; Fig. 3c). The five-year survival rates were 71% [95% CI = 56–90%] for the high group, 45% [95% CI = 36–56%] for the intermediate group and 31% [95% CI = 20–49] for the low group (high: HR = 0.24 [95% CI = 0.12–0.48]; intermediate HR = 0.68 [95% CI = 0.45–1]; low: reference; Fig. 3d).

**Fig. 3: Prognostic relevance of predicted HPV association.**

In addition, the generalizability of the prognostic relevance was confirmed by analyzing tissue of lymph node metastases, although the intermediate group did not reach a level of significance (LRT: 13.39, p = 0.004, employing a chi-square distribution, n = 102, Fig. 3e) high HR = 0.28 [95% CI = 0.12–0.67]; intermediate HR = 0.65 [95% CI = 0.37–1.14], p = 0.12, employing a chi-square distribution; low: reference (Fig. 3f).

Prognostication of patient subpopulations using the combined score of HPV association

Having shown that predicting HPV association allowed accurate stratification of OPSCC patients, we next evaluated the prognostic relevance of the combined score in a subgroup of patients with early-stage disease (stage I/II, UICC 8th, n = 294; Fig. 4a). Particularly this group of patients may qualify for potential treatment de-escalation strategies. Again, there was a strong inverse correlation of the combined score and survival (LRT = 29.3, p < 0.001, employing a chi-square distribution, n = 294; Fig. 4b).

**Fig. 4: Prognostic significance of the predicted HPV association for stage I/II patients compared to gold-standard.**

Patients with a high score had a HR = 0.15 [95% CI = 0.06–0.38], p < 0.001, derived by a Cox proportional hazards model, compared to patients in the intermediate group with a HR of 0.46 [95% CI = 0.29–0.73], p < 0.001, Cox proportional hazards model (Fig. 4c). The five-year survival rates were 90% [95% CI = 82–99%] for patients classified as high, 73% [95% CI = 65–82%] for patients classified as intermediate and 48% [95% CI = 35–65%] classified as low. A multivariate analysis showed that the combined score was an independent predictor among several variables (high, HR = 0.22 [95% CI = 0.03–0.59], p = 0.003, Cox proportional hazards model; intermediate, HR = 0.53 [95% CI = 0.30–95], p = 0.033, Cox proportional hazards model, n = 253; Supplementary Fig. 5a). Within the same patient population, the performance of the gold standard of HPV-DNA/p16 combination (Fig. 4d) achieved a HR = 0.31 [95% CI = 0.2–0.48], Fig. 4e, with a five-year survival rate of 83% [95% CI = 78–90] for HPV-positive cases and 48% [95% CI = 37–61] for HPV-negative cases. The LRT was 28.1, p < 0.001, employing a chi-square distribution, and a multivariate analysis revealed a HR of 0.26 for HPV-positive cases [95% CI = 0.15–0.45], p < 0.001, Cox proportional hazards model, n = 253 (Supplementary Fig. 5b).

Within the training cohort, adjusted for cases with stage I/II disease (n = 151), the combined score could stratify patients with a HR = 0.13 [95% CI = 0.04–0.5], p = 0.002, (Cox proportional hazards model) in the high group and HR = 0.54 for intermediate group [95% CI = 0.29–1.1]; p = 0.072, Cox proportional hazards model (Supplementary Fig. 6a). The five-year survival rates were 78% [95% CI = 58–100%] for patients classified as high, and 65% [95% CI = 53–79%] for patients classified as intermediate. For the low group, the five-year survival rate was 48% [95% CI = 31–75%]. A multivariate analysis yielded a HR of 0.10 [95% CI = 0.02–0.50], p = 0.005, derived by a Cox proportional hazards model for the high group and HR = 0.43 [95% CI = 0.16–1.14], p = 0.091, derived by a Cox proportional hazards model for the intermediate group (Supplementary Fig. 6b; n = 136). The HR of HPV testing was 0.38 [95% CI = 0.15–0.95], p < 0.001, derived by a Cox proportional hazards model for HPV-positive cases Supplementary Fig. 6c. A multivariate analysis of regular HPV testing showed a HR for HPV-positive cases of 0.22 [95% CI = 0.09–0.53], p < 0.001, Cox proportional hazards model, n = 136 (Supplementary Fig. 6d).

To draw a more complete picture of the performance to stratify patients using the combined score, we next selected patients of the test set for advanced stage disease (stage III/IV, UICC8^th; Supplementary Fig. 7a; combined score, LRT = 9.5, p = 0.002, employing a chi-square distribution, n = 219, c-index = 0.57; Supplementary Fig. 7b). Patients with a high combined score had a HR of 0.48 [95% CI = 0.27–0.86], p = 0.013, n = 219 (Supplementary Fig. 7c), while a level of significance could not be reached for the intermediate group (HR = 0.81, [95% CI = 0.54–1.21], p = 0.3). For regular HPV testing a level of significance could not be reached (HPV-positive, LRT = 2.77, p = 0.1, employing a chi-square distribution, c-index = 0.51, HR = 0.53 [95% CI = 0.23–1.2], p = 0.12, Cox proportional hazards model; Supplementary Fig. 7d). A multivariate analysis showed a significant prognostic association of the high combined score, HR = 0.42 [95% CI = 0.22–0.77], p = 0.006, derived by a Cox proportional hazards model and no significant association for the intermediate group HR = 0.81 [95% CI = 0.52–1.26], p = 0.35, Cox proportional hazards model (Supplementary Fig. 7e; n = 195).

In addition, we selected only HPV-positive cases (Fig. 4f) and applied the combined score to stratify these patients. Here, the combined score showed an LRT = 9.3 (p = 0.002, employing a chi-square distribution, n = 230, c-index = 0.63; Fig. 4g). By again dividing patients using the same three-tier threshold, patients with a high score had a HR = 0.18 [95% CI = 0.06–0.56], p = 0.003, derived by a Cox proportional hazards model and patients with an intermediate score HR = 0.46 [95% CI = 0.24–0.90], p = 0.022, Cox proportional hazards model (Fig. 4h). This resulted in a five-year overall survival rate of 92% [95% CI = 84–100%] for patients within the high group, 83% [95% CI = 76–90%] for patients in the intermediate group and 63% [95% CI = 47–84%] for the low group. Consistently, by following a multivariate approach, the analysis showed a significant prognostic association of the high combined score with HR = 0.21 [95% CI = 0.05–0.87], p = 0.032, derived by a Cox proportional hazards model, and no significant association for the intermediate group HR = 0.54 [95% CI = 0.24–1.24], p = 0.128, Cox proportional hazards model (Supplementary Fig. 7e; n = 196).

To this end, we explored visual correlates of misclassified cases and provided their clinical characteristics. We chose four cases, namely with the highest tile-class prevalence, but which were falsely classified as HPV-positive (false-positive) or that were falsely classified as HPV-negative (false-negative). Interestingly, two cases being falsely classified as HPV-positive were found to be p16 positive but HPV-DNA negative (Supplementary Fig. 9a) with an overall survival of 6.6 and 12.3 years, respectively. Two cases that had been falsely classified as HPV-negative were both found to be HPV-DNA/p16 positive but had an overall survival of 0.16 years and 0.42 years, respectively (Supplementary Fig. 9b).

Discussion

Here, we present an approach of segmenting viable tumor areas following a subsequent classification of tumor patches. The main advantage of this approach is that it allows visual inspection of tissue areas that are subsequently used for classification. As a result, we have better control over the input data used in the subsequent classification of the Human Papillomavirus (HPV) association. To demonstrate the generalizability of our approach, we include lymph node metastases and primary tumors from different centers and treatment modalities in our external test set. The strong association of predicting HPV association and prognosis is confirmed in early- and late-stage disease, and it could also be used to stratify patients who are annotated as HPV-positive – identified by regular HPV testing.

Throughout this study, we are consequently using the same threshold for each cohort (high, referred to highly HPV-associated; low and intermediate above/below 20% and in between). Moreover, we provide evidence that both the tile class prevalence and the combined score are strongly correlating (inversely) to prognosis – irrespective of a specific threshold. However, by using this three-tier threshold, we aim to make the results comparable, allowing subsequent multivariate analysis. In this context, it is noteworthy to acknowledge that when analyzing patients with advanced stage disease, the intermediate group does not exhibit a statistically significant different outcome. However, unlike the conventional testing for HPV, which does not show a significant prognostic value within this subgroup of advanced stage disease, the group identified as having a high combined score demonstrates a statistically significant improvement in prognosis. From a clinical standpoint, patients with advanced stage disease would not typically be considered for treatment de-escalation, and it appears that conventional HPV testing also fails to differentiate these patients for prognostication. Therefore, we propose that by employing OPSCCnet, a certain percentage of patients could be identified with a slightly decreased or beneficial prognosis, which could subsequently be utilized for ongoing clinical monitoring of these individuals. Importantly, we believe that by utilizing a three-tier classification model, as opposed to the conventional binary HPV-testing, this would be considered a more detailed biomarker that enhances the resolution of patients’ prognosis, enabling the consideration of potential treatment de-escalation as well as surveillance.

Having built an algorithm to predict HPV association using 267 cases (training/validation dataset) that are predominantly determined by both p16 and HPV-DNA (246 cases / 92.1% where dichotomous HPV-status is available for 21 cases / 7.9% from the TCGA database), we believe that this ground truth is most versatile, given the discrepancies of p16/HPV-DNA testing and the associated distinct prognostic association with discordant p16/HPV-DNA status^3,18,19,20. By employing a random subsampling approach to determine the variance of the tile class probability in our dataset and to improve its test-metrics, we sought to address generalizability and robustness. Random subsampling helps in obtaining an estimate of the threshold that is not overly influenced by specific characteristics of the entire cohort. Using the entire dataset to define the threshold could potentially lead to overfitting, where the threshold is tailored too closely to the peculiarities of the dataset itself and may not accurately generalize to other similar datasets or populations. By randomly subsampling the data, we aim to create multiple representative subsets that capture the inherent variability within the dataset. Our approach of filtering cases with a certain threshold of the variance of the tile class probability would limit the application of declaring HPV association accurately to about 50% of patients. In comparison, other techniques, such as molecular assessment of oncogenic alterations require a level of tissue quality and recent efforts have made the necessity of quality control in computational pathology evident²¹. Arguably, we incorporate a threshold reflecting a range where the predictions of the algorithm are most accurate. At the same time, recent studies by Lang Kuhs et. al.¹¹ underline the heterogeneity of HPV-positive OPSCC, which is also in line with previous findings showing that HPV viral load is association with prognosis in OPSCC⁵. We therefore argue that our approach of using the tile class prevalence, as well as the variance of the class probability (combined score; Eq. (3)) as prognostic marker has a biological correlate of HPV heterogeneity. A very recent study by Sid. et. al. defines a subpopulation of OPSCC HPV positive tumors where malignant cells retained viral gene expression (“HPVon”), as well as cells that phenotypically suppress viral gene expression (“HPVoff”)²². Although we currently have no evidence whether these results correlate to the results of our analysis, future studies may analyze these patterns and compare whether the results of OPSCCnet might recapture this biological phenomenon.

However, it should be considered that once larger, well annotated datasets of OPSCC emerge, this would warrant further investigation of declaring HPV subgroups based on HPV genotypes¹¹. Potentially, these may then be further supplemented with other deep learning techniques, including multiple-instance-learning – a method of weakly supervised learning^23,24. Currently, the presented method of using a supervised learning approach and filtering criteria appear versatile in identifying patients with improved prognosis. Notably, a simple comparison of the performance of declaring HPV association might not be the ultimate goal, given the heterogeneity of HPV association shown by recent studies^10,22. Instead, machine learning applications should be deployed to identify OPSCC patients with a distinct clinical course accurately. Surprisingly, we found that two cases initially misclassified as HPV positive by the OPSCCnet were p16 positive and HPV-DNA negative (classified as HPV negative). This indicates a false-positive result from applying the OPSCCnet algorithm. Additionally, two cases that were determined to be HPV positive (p16 positive and HPV-DNA positive) were classified as HPV negative by OPSCCnet. These cases had a significantly shorter overall survival of 0.16 and 0.42 years compared to the overall prognosis of HPV-positive patients. It is important to note that all these tumors were located at the tonsils. This highlights the need to accurately distinguish HPV status and stratify patients when evaluating algorithm performance. Therefore, we believe that by introducing our combined score, we can more accurately identify HPV-associated OPC and improve prognosis compared to standard HPV testing methods.

Nevertheless, it should be considered that our study is not without limitations. The first limitation is that the data were primarily collected from European centers and patient characteristics may vary from other geographical regions. This may indeed apply to any predictive biomarker and hence needs to be investigated in additional cohorts of various ethnicities and geographical regions. Another limitation is that we collected the samples retrospectively. This results in confounders which would need to be validated by a randomized trial – including de-escalation strategies for patients stratified using HPV prediction.

Given the strong prognostic association of predicting HPV association and overall survival, future studies exploring treatment de-escalations or prognostic biomarkers within OPSCC can make use of the freely available algorithm, which we release as “OPSCCnet”, a combination of tumor detection and tumor classification of HPV association. As there is an ongoing effort to explore the exact value of p16 as single biomarker of HPV association, in comparison to both p16 and HPV-DNA status (dichotomous HPV testing) to declare HPV association, these studies may benefit from incorporating results from the OPSCCnet.

In conclusion, we develop and validate a fully automated modular algorithm – referred to as OPSCCnet – which allows patient stratification using regular H&E virtual-whole-slide images. The algorithm can be run using regular clients outperforming the gold standard of p16/HPV-DNA testing. Together, this tool may allow the community to stratify OPSCC patients at a highly granular level.

Methods

Study design and patients

The study design is shown in Fig. 1. 906 patients with OPSCCs from four centers and one database were enrolled in this retrospective study. 263 patients were used as training cohort (Fig. 2A; Supplementary Table 1). Patients from Cologne, Giessen, Heidelberg, and Maastricht were diagnosed with primary squamous cell carcinoma of the oropharynx (ICD code C10, International Classification of Diseases for Oncology) and treated at the given local center between 2005 and 2019. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the regional ethics committees (Giessen: AZ 95/15, dated October 19, 2015; Cologne: AZ 19–1288_1, dated February 3, 2020). Informed written consent was obtained from each subject. Patient characteristics were recorded prospectively by the Giessen cancer registry database (GTDS), as well as from the cancer registry database of the Center for Integrated Oncology (CIO), Cologne. H&E stained tumor samples were provided by the Tissue Bank of the National Center for Tumor Diseases (NCT) Heidelberg, Germany in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (S-207/2005 and S-786/2021)²⁵. Ethical approval for use of the Maastricht samples was granted by the local ethical committee under the study number METC-2021 2658. All patients were treated in accordance with approved guidelines by either surgery alone (ST) or upfront surgery and concomitant (chemo)radiotherapy (S(C)RT) or definitive chemoradiotherapy ((C)RT). Overall survival was defined as the time after initial diagnosis to death from any cause.

HPV-status

HPV-status was determined by presence of HPV high-risk DNA, p16 or combination of both (dichotomous HPV-status), if both markers were available. Corresponding p16 status on tumor tissue was assessed by staining and scoring for p16, according to EORTC guidelines, based on cytoplasmatic and nucleic p16 expression in 70% of tumor cells^26,27. Isolation of tumor DNA and HPV genotyping was performed as described previously²⁸.

Evaluating the best model for accurate prediction of HPV association

To choose the best model, we tested five different network architectures. First, we build a training dataset (500 cases) and a test dataset (216 cases) from a total of 716 patients in total (Supplementary Fig. 1A). Then, we evaluated the performance of ResNet-50, ResNet-18, densenet-201, mobilenetv2, inceptionv3 on these samples (Supplementary Fig. 1B). All networks were pretrained models trained on ImageNet²⁹. Together, there were 1.000.000 224 × 224 image tiles within this training set.

Building and applying a deep learning-based HPV prediction-algorithm

Semantic segmentation

We built a Feature Pyramid Network (FPN) with a ResNet-18 encoder for semantic segmentation of viable tumor areas using segmentation models (PyTorch, https://github.com/qubvel/segmentation_models.pytorch, author: Pavel Yakubovskiy). Image tiles of 1024 × 1024 with a pixel size of 2.2 μm/pixel were extracted, exclusively from tissue carrying parts of the whole-slide-image (WSI), which were classified using a simple tissue classification network built within QuPath³⁰. We trained the FPN model using 3924 image tiles originating from 205 cases from the training dataset (Fig. 1b; segmentation). The performance metrics of the segmentation were intersection over union (iou) iou = 0.94 / 0.9 for the test dataset (dataset / image based) and iou = 0.87 / 0.86 for a holdout test set (dataset / image based). Training was performed with albumentations for augmentation and in particular staintools for color augmentation using Reinhard, Vahadane und Macenko stain normalization methods^31,32,33.

Image classification, hyperparameters

The ResNet-18 model for classification was trained using a dataset with a class prevalence of 1.03 for HPV-positive and 0.97 for HPV-negative, given that we built a balanced dataset by randomly selecting image tiles of the 267 cases of the training/validation data to a total of 1,000,000 images (224 × 224 pixels). The batch size was 32 and the dataset was split into training and validation sets (80/20), resulting in 800,000 files for training and 200,000 files for validation. Data augmentation techniques such as random horizontal and vertical flipping and rotation were applied to the training data. Moreover, we used a dedicated color augmentation strategy by applying H&E stain normalization. We applied a non-linear activation function (sigmoid activation function) to the output of the final dense layer of the ResNet-18 given the binary nature of the classification task. Adam was used to update the model’s parameters as optimizer and a cross entropy was used as loss function. The learning rate was scheduled to decrease over time using an initial learning rate of 0.01, decay steps of 10,000 and a decay rate of 0.9. The model’s performance on the validation set was monitored using accuracy and the training process was stopped early if the validation accuracy would not improve for eight consecutive epochs. The total number of training steps in each epoch was 25000.

Screening for the best suited deep learning network architecture

To choose an architecture of a deep-convolutional neural network that would yield the best results, we randomly selected 500 cases from the whole cohort and evaluated the performance on 216 independent cases (Supplementary Fig. 1a). For the screening of the most appropriate architecture, we have optimized the hyperparameters for ResNet-50 using a smaller fraction of the total cases. ResNet-18 archieved best results (AUROC 0.84; Supplementary Fig. 1b). The ResNet-18 model that was trained for screening purpose was exclusively used in the screening setting and not applied during the study – where we trained an independent network using a training dataset with fewer cases in total, but were we extracted a higher number of image tiles per case. Generally, ResNet-18 is an architecture that provides a cost-effective computational solution while exhibiting a high degree of generalizability to unseen datasets, archiving good performance metrics when trained and applied on biomedical data^34,35.

Classification of tumor tiles

For classification of tumor tiles, patches with a centroid within tumor areas were extracted (224 × 224 pixels), followed by stain normalization³¹. For training, but not for inference, overlapping tiles were extracted. The resulting image tiles were then used to train a ResNet-18 using Tensorflow. We chose ResNet-18 model that was pre-trained on the ImageNet dataset. The top layers of the ResNet18 model, originally designed for multi-class classification, were excluded to enable the integration of customized layers. A global average pooling layer was added to the output of the base model. Subsequently, a dense layer with a sigmoid activation function was appended to the global average pooling layer to generate binary class probabilities. For training, all layers have been optimized. All code, including requirements and detailed information on the software is available at https://github.com/OPSCCnet/OPSCCnet.

Whole-slide-image scanning, computing structure

Slides from Cologne, Giessen, Heidelberg and Maastricht were scanned using a NanoZoomer S360 Digital slide scanner with a pixel size of 0.23 μm/pixel (n = 884). Cases from the TCGA database were scanned at 20X magnification using Aperio slidescanners (n = 22; https://portal.gdc.cancer.gov/).

Combined score and variance of the tile class probability

The tile class prevalence (TP) and the variance of the probability of the HPV-positive class (s²) are explained schematically in Supplementary Figs. 2a, b. Given the increasing evidence of heterogeneity within HPV-associated OPSCC tumors, we calculated a combined score (Eq. (3)), which is explained schematically in Supplementary Fig. 2c.

Variance; Eq. (1):

$${s}^{2}=\frac{{{\sum }_{i=1}^{n}\left({x}_{i}-\bar{X}\right)}^{2}}{n-1}$$

(1)

Tile class prevalence (TP) for HPV-positive (Hpos); Eq. (2):

$${TP}=\left(\frac{{\rm{Hpos}}\left(n\right)}{n}\right)* 100$$

(2)

Combined score; Eq. (3):

$${combined}\,{score}={TP}* \log 2\left(\frac{1}{{s}^{2}}\right)$$

(3)

Performance metrics and threshold

To evaluate the performance of the algorithm to determine HPV-status, we used either dichotomous HPV-status (p16/HPV-DNA) - where available - or either p16 or HPV-high risk DNA as ground truth. We have calculated the Youden index for the threshold of the classification of HPV-positive cases using the subset of validation cases as part of the training dataset to avoid overfitting our data. This has been calculated separately for the screening of architectures (methods) and the final ResNet-18 model. For the three-tier threshold of the combined score we applied the same criteria (high, referred to highly HPV associated; low and intermediate above/below 20% and in between).

Statistical analysis and computation

Univariate and multivariate analysis were conducted using a Cox proportional hazards model. Likelihood-ratio tests were performed as part of a Cox proportional hazards model (also referred to as Cox-regression). A p-value of less than 0.05 was considered statistically significant. All tests were two-sided. Statistical analysis was done with R and Python. All computation was performed using dual NVIDIA RTX 3090 GPUs.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The datasets from the four centers cannot be made publicly available because of privacy requirements of the participating medical centers. All code for inference, including weights and software specifications/installation guides, is available online (https://github.com/OPSCCnet/OPSCCnet). Users can download OPSCCnet and run the analysis using their own data.

References

Lechner, M., Liu, J., Masterson, L. & Fenton, T. R. HPV-associated oropharyngeal cancer: epidemiology, molecular biology and clinical management. Nat. Rev. Clin. Oncol. 19, 306–327 (2022).
Article PubMed PubMed Central Google Scholar
Craig, S. G. et al. Recommendations for determining HPV status in patients with oropharyngeal cancers under TNM8 guidelines: a two-tier approach. Br J Cancer 120, 827–833 (2019).
Article PubMed PubMed Central Google Scholar
Rasmussen, J. H. et al. Risk profiling based on p16 and HPV DNA more accurately predicts location of disease relapse in patients with oropharyngeal squamous cell carcinoma. Ann. Oncol. 30, 629–636 (2019).
Article CAS PubMed Google Scholar
Wagner, S. et al. Evaluation of p16INK4a expression as a single marker to select patients with HPV-driven oropharyngeal cancers for treatment de-escalation. Br J Cancer 123, 1114–1122 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hashida, Y. et al. Prognostic significance of human papillomavirus 16 viral load level in patients with oropharyngeal cancer. Cancer Sci. 112, 4404–4417 (2021).
Article CAS PubMed PubMed Central Google Scholar
Klussmann, J. P. et al. Expression of p16 protein identifies a distinct entity of tonsillar carcinomas associated with human papillomavirus. Am. J. Pathol. 162, 747–753 (2003).
Article CAS PubMed PubMed Central Google Scholar
Wittekindt, C. et al. Expression of p16 protein is associated with human papillomavirus status in tonsillar carcinomas and has implications on survival. Adv. Otorhinolaryngol. 62, 72–80 (2005).
CAS PubMed Google Scholar
Gültekin, S. E., Sengüven, B., Klussmann, J. P. & Dienes, H. P. P16(INK 4a) and Ki-67 expression in human papilloma virus-related head and neck mucosal lesions. Invest Clin. 56, 47–59 (2015).
PubMed Google Scholar
Klussmann, J. P. et al. Prevalence, distribution, and viral load of human papillomavirus 16 DNA in tonsillar carcinomas. Cancer 92, 2875–2884 (2001).
Article CAS PubMed Google Scholar
Mehanna, H. et al. Prognostic implications of p16 and HPV discordance in oropharyngeal cancer (HNCIG-EPIC-OPC): a multicentre, multinational, individual patient data analysis. Lancet Oncol. 24, 239–251 (2023).
Article CAS PubMed Google Scholar
Lang Kuhs, K. A. et al. Genetic variation within the human papillomavirus type 16 genome is associated with oropharyngeal cancer prognosis. Ann. Oncol. 33, 638–648 (2022).
Article CAS PubMed Google Scholar
Corredor, G. et al. An imaging biomarker of tumor-infiltrating lymphocytes to risk-stratify patients with HPV-associated oropharyngeal cancer. J. Natl Cancer Inst. 114, 609–617 (2022).
Article PubMed Google Scholar
Abdulrahman, Z. et al. Tumor-specific T cells support chemokine-driven spatial organization of intratumoral immune microaggregates needed for long survival. J. ImmunoTher. Cancer 10, e004346 (2022).
Article PubMed PubMed Central Google Scholar
Rischin, D. et al. Prognostic stratification of HPV associated oropharyngeal cancer based on CD103+ immune cell abundance in patients treated on TROG 12.01 and De-ESCALaTE randomised trials. Ann. Oncol. 33, 804–813 (2022).
Article CAS PubMed Google Scholar
Koyuncu, C. F. et al. Computerized tumor multinucleation index (MuNI) is prognostic in p16+ oropharyngeal carcinoma. J. Clin. Invest. 131, e145488 (2021).
Article PubMed PubMed Central Google Scholar
Wilde, D. C. et al. Oropharyngeal cancer outcomes correlate with p16 status, multinucleation and immune infiltration. Mod. Pathol. 35, 1045–1054 (2022).
Article CAS PubMed PubMed Central Google Scholar
Klein, S. et al. Deep learning predicts HPV-association in oropharyngeal squamous cell carcinomas and identifies patients with a favorable prognosis using regular H&E stains. Clin. Cancer Res. 27, 1131–1138 (2021).
Article CAS PubMed Google Scholar
Perrone, F. et al. Isolating p16-positive/HPV-negative oropharyngeal cancer: an effort worth making. Am. J. Surg. Pathol. 35, 774–777 (2011). author reply 777-778.
Article PubMed Google Scholar
Shinn, J. R. et al. Oropharyngeal squamous cell carcinoma with discordant p16 and HPV mRNA results: incidence and characterization in a large, contemporary United States cohort. Am. J. Surg. Pathol. 45, 951–961 (2021).
Article PubMed PubMed Central Google Scholar
Rietbergen, M. M. et al. Human papillomavirus detection and comorbidity: critical issues in selection of patients with oropharyngeal cancer for treatment De-escalation trials. Ann. Oncol. 24, 2740–2745 (2013).
Article CAS PubMed Google Scholar
Janowczyk, A. et al. HistoQC: an open-source quality control tool for digital pathology slides. JCO Clin. Cancer Inform. 3, 1–7 (2019).
Article PubMed Google Scholar
Puram, S. V. et al. Cellular states are coupled to genomic and viral heterogeneity in HPV-related oropharyngeal carcinoma. Nat. Genet. 55, 640–650 (2023).
Article CAS PubMed Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. Proc. Mach. Learn. Res. 80, 2127–2136 (2018).
Google Scholar
Mock, A. et al. EGFR and PI3K pathway activities might guide drug repurposing in HPV-negative head and neck cancers. Front Oncol. 11, 678966 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gillison, M. L. et al. Radiotherapy plus cetuximab or cisplatin in human papillomavirus-positive oropharyngeal cancer (NRG Oncology RTOG 1016): a randomised, multicentre, non-inferiority trial. Lancet 393, 40–50 (2019).
Article CAS PubMed Google Scholar
Wagner, S. et al. Human papillomavirus association is the most important predictor for surgically treated patients with oropharyngeal cancer. Br J Cancer 116, 1604–1611 (2017).
Article PubMed PubMed Central Google Scholar
Lewis, J. S. Jr et al. p16 positive oropharyngeal squamous cell carcinoma: an entity with a favorable prognosis regardless of tumor HPV status. Am. J. Surg. Pathol. 34, 1088–1096 (2010).
Article PubMed Google Scholar
Deng, J. et al. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848.
Bankhead, P. et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Article PubMed PubMed Central Google Scholar
Reinhard, E., Adhikhmin, M., Gooch, B. & Shirley, P. Color transfer between images. IEEE Comp. Graph. App. 21, 34–41 (2001).
Article Google Scholar
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110 (Boston, MA, USA, 2009) https://doi.org/10.1109/ISBI.2009.5193250.
Vahadane, A. et al. Structure-preserving color normalization and sparse stain separation for histological images. IEEE Trans. Med. Imaging 35, 1962–1971 (2016).
Article PubMed Google Scholar
Kaissis, G. et al. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intel. 3, 473–484 (2021).
Article Google Scholar
Yang, J. et al. MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Sci. Data 10, 41 (2023).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Figures were created with https://biorender.com/ and Adobe illustrator version 26.4. Digital H&E stained tumor samples were provided by the Tissue Bank of the National Center for Tumor Diseases (NCT) Heidelberg, Germany. We acknowledge support by the Open Access Publication Fund of the University of Duisburg-Essen.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Fabian Stögbauer
Present address: Institute of Pathology, School of Medicine, Technical University of Munich (TUM), Munich, Germany

Authors and Affiliations

Department of Hematology and Stem Cell Transplantation, University Duisburg-Essen, University Hospital Essen, Essen, Germany
Sebastian Klein & Hans Christian Reinhardt
Institute of Pathology, Medical Faculty, University Hospital Cologne, Cologne, Germany
Sebastian Klein, Jennifer Quantius, Yuri Tolkach, Reinhard Buettner & Alexander Quaas
Department of Otorhinolaryngology, Head and Neck Surgery, Medical Faculty, University Hospital Cologne, Cologne, Germany
Nora Wuerdemann, Christopher Kopp, Arthur Charpentier, Shachi Jenny Sharma, Julie George & Jens Peter Klussmann
Center for Molecular Medicine Cologne (CMMC), University of Cologne, Medical Faculty, Cologne, Germany
Nora Wuerdemann, Arthur Charpentier, Shachi Jenny Sharma & Jens Peter Klussmann
Department of Pathology, GROW – School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, The Netherlands
Imke Demers & Ernst-Jan Speel
Hamm-Lippstadt University of Applied Sciences, Hamm, Germany
Klaus Brinker
Department of Translational Genomics, University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
Julie George
Department of Otolaryngology, Head and Neck Surgery, University Hospital Heidelberg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
Jochen Hess
Tissue Bank of the National Center for Tumor Diseases (NCT) Heidelberg, Germany, and Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
Fabian Stögbauer
Department of Otorhinolaryngology and Head and Neck Surgery, GROW-School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, The Netherlands
Martin Lacko & Marijn Struijlaart
Department of Pathology, GROW-School for Oncology and Reproduction, Maastricht University Medical Center, Maastricht, The Netherlands
Mari F.C.M. van den Hout
Department of Otorhinolaryngology, Head and Neck Surgery, University of Giessen, Giessen, Germany
Steffen Wagner, Claus Wittekindt, Christine Langer & Christoph Arens
Clinic of Otorhinolaryngology, Head and Neck Surgery, Klinikum Dortmund, University of Witten/Herdecke, Faculty for Health, Department of Human Medicine, Witten, Germany
Claus Wittekindt
West German Cancer Center, University Hospital Essen, Essen, Germany
Hans Christian Reinhardt
German Cancer Consortium (DKTK), Heidelberg, Germany
Hans Christian Reinhardt

Authors

Sebastian Klein
View author publications
You can also search for this author in PubMed Google Scholar
Nora Wuerdemann
View author publications
You can also search for this author in PubMed Google Scholar
Imke Demers
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Kopp
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Quantius
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Charpentier
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Tolkach
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Brinker
View author publications
You can also search for this author in PubMed Google Scholar
Shachi Jenny Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Julie George
View author publications
You can also search for this author in PubMed Google Scholar
Jochen Hess
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Stögbauer
View author publications
You can also search for this author in PubMed Google Scholar
Martin Lacko
View author publications
You can also search for this author in PubMed Google Scholar
Marijn Struijlaart
View author publications
You can also search for this author in PubMed Google Scholar
Mari F.C.M. van den Hout
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Claus Wittekindt
View author publications
You can also search for this author in PubMed Google Scholar
Christine Langer
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Arens
View author publications
You can also search for this author in PubMed Google Scholar
Reinhard Buettner
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Quaas
View author publications
You can also search for this author in PubMed Google Scholar
Hans Christian Reinhardt
View author publications
You can also search for this author in PubMed Google Scholar
Ernst-Jan Speel
View author publications
You can also search for this author in PubMed Google Scholar
Jens Peter Klussmann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.K. conceived and designed the study, developed the software and visualizations throughout the study and conducted the statistical analysis. N.W., I.D., C.K., A.C., S.J.S., J.G., J.H., F.S., M.L., M.S., M.v.H., S.W., C.W., C.L., C.A., E.J.S., J.P.K. provided samples, clinical data, and helped with data curation. S.K., N.W., C.K. and J.P.K. had access to all raw data of the study. J.Q. performed digitalization of whole-slide-images and data curation. Y.T., K.B., A.Q., R.B., H.C.R. advised on conceptional design and methodology of the study. S.K., R.B., H.C.R. and J.P.K. drafted the manuscript. All authors red and approved the manuscript and participated in reviewing and editing of the manuscript.

Corresponding author

Correspondence to Sebastian Klein.

Ethics declarations

Competing interests

J.H. received consulting and lecture fees from BMS, also received research funding from CureVac. R.B. received consulting and lecture fees from AbbVie, Amgen, AstraZeneca, Bayer, BMS, Boehringer-Ingelheim, Illumina, Lilly, Merck-Serono, MSD, Novartis, Qiagen, Pfizer, Roche, Targos MP Inc. R.B. is Co-Founder and Scientific Advisor for Targos Mol. Pathology Inc and member of the Directory Board Gnothis Inc. H.C.R. received consulting and lecture fees from Novartis, Abbvie, AstraZeneca, Vertex and Merck. H.C.R. received research funding from AstraZeneca and Gilead Pharmaceuticals. HCR is a co-founder of CDL Therapeutics GmbH. The remaining authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Klein, S., Wuerdemann, N., Demers, I. et al. Predicting HPV association using deep learning and regular H&E stains allows granular stratification of oropharyngeal cancer patients. npj Digit. Med. 6, 152 (2023). https://doi.org/10.1038/s41746-023-00901-z

Download citation

Received: 31 August 2022
Accepted: 10 August 2023
Published: 19 August 2023
DOI: https://doi.org/10.1038/s41746-023-00901-z

Subjects

Abstract

Similar content being viewed by others

Explainable prediction model for the human papillomavirus status in patients with oropharyngeal squamous cell carcinoma using CNN on CT images

Integrative analysis of H&E and IHC identifies prognostic immune subtypes in HPV related oropharyngeal cancer

Deep learning for predicting prognostic consensus molecular subtypes in cervical cancer from histology images

Introduction

Results

Patient characteristics

Building a deep learning model to predict HPV association and evaluating the performance

Prognostic relevance of determining HPV association using deep learning and a binary classification

Granular patient prognostication by predicting HPV association

Developing a combined scoring of the variance of the tile class probability and the tile class prevalence for patient prognostication

Prognostication of patient subpopulations using the combined score of HPV association

Discussion

Methods

Study design and patients

HPV-status

Evaluating the best model for accurate prediction of HPV association

Building and applying a deep learning-based HPV prediction-algorithm

Semantic segmentation

Image classification, hyperparameters

Screening for the best suited deep learning network architecture

Classification of tumor tiles

Whole-slide-image scanning, computing structure

Combined score and variance of the tile class probability

Performance metrics and threshold

Statistical analysis and computation

Reporting summary

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links