Main

The ENSEMBLE trial (NCT04505722, https://clinicaltrials.gov/ct2/show/NCT04505722) was carried out in Argentina, Brazil, Chile, Colombia, Mexico, Peru, South Africa and the United States to test the efficacy of a single dose of the replication-incompetent human adenovirus type 26 (Ad26)-vectored Ad26.COV2.S vaccine vs placebo to prevent moderate to severe-critical COVID-191,2. Estimated vaccine efficacy against COVID-19 with onset at least 28 d post injection was 66.1% (95% confidence interval (CI): 55.0% to 74.8%) in the primary analysis (median follow-up 2 months)1. The US Food and Drug Administration (FDA) granted an Emergency Use Authorization to the Ad26.COV2.S vaccine as a single primary vaccination dose for individuals aged ≥18 years and, more recently, as a single homologous or heterologous booster dose for individuals aged ≥18 years3. The Ad26.COV2.S vaccine has also been issued an Emergency Use Listing by the World Health Organization4, authorized by the European Commission5, and approved or authorized in more than 100 countries6.

A validated immune biomarker that correlates with protection7,8,9 (a ‘correlate of protection’ or CoP) has several applications, including providing evidence for approval of demonstrated-effective vaccines for populations underrepresented in the phase 3 trials (for example, young children10,11), aiding approval of refined versions of demonstrated-effective vaccines (for example, strain or schedule changes), aiding approval of candidate vaccines to test efficacy in phase 3 trials and providing a study endpoint in early-phase trials for comparison and down-selection of candidate next-generation vaccines. A CoP also has population-level applications, including estimating the level of immunity of a population using sero-survey data12.

For most licensed vaccines against viral diseases where a CoP has been established, the CoP is either binding antibodies (bAbs) or neutralizing antibodies (nAbs)8. A growing body of evidence supports these immune markers as CoPs for COVID-19 vaccines. First, both bAbs13 and nAbs14 acquired through infection have been shown to correlate with protection from reinfection, and adoptive transfer of purified convalescent immunoglobulin G (IgG) protected rhesus macaques from SARS-CoV-2 challenge15. Second, nAb titres elicited by DNA16, mRNA17 and adenovirus-vectored18 COVID-19 vaccines all correlated with protection of rhesus macaques from SARS-CoV-2 challenge. Third, passive immunization with nAbs had protective efficacy in a phase 3 trial of high-risk individuals19. Fourth, bAbs and nAbs correlated with vaccine efficacy in meta-analyses of phase 3 randomized placebo-controlled clinical trials20,21. The evidence provided by correlates analyses of randomized phase 3 trials carries extra weight in the evaluation of CoPs and is the gold standard for obtaining reliable unbiased evidence22.

The US Government COVID-19 Response Team, in public-private partnerships with vaccine developers, designed and implemented five harmonized phase 3 COVID-19 vaccine efficacy trials with a major objective to develop a CoP based on an IgG bAb or nAb assay23. The first correlates analysis in this programme evaluated the mRNA-1273 COVID-19 vaccine in the COVE trial24, which showed that both IgG bAb and nAb markers measured 4 weeks post second dose were strongly correlated with the level of mRNA-1273 vaccine efficacy against symptomatic COVID-19, with the nAb titre mediating about two-thirds of the vaccine efficacy25. These findings were consistent with those of the phase 3 COV002-UK trial of the AZD12222 (ChAdOx1 nCoV-19) vaccine, where vaccine efficacy against symptomatic COVID-19 increased with post-injection bAb and nAb markers26.

The ENSEMBLE trial was included in this US Government-coordinated effort to identify CoPs. Using the same approach as that used for COVE25, for one dose of the Ad26.COV2.S vaccine in ENSEMBLE we assessed IgG bAb and nAb markers measured 4 weeks post one dose of the Ad26.COV2.S vaccine in ENSEMBLE as correlates of risk of COVID-19 and as correlates of protection against COVID-19. (We use ‘correlate of risk’ to indicate a post-vaccination immune marker associated with the rate of COVID-19, and ‘correlate of protection’ to indicate that a correlate of risk is also predictive of vaccine efficacy against COVID-19, which is quantified by estimating a causal parameter that links the marker in some fashion to vaccine efficacy (ref. 9 and the Statistical Analysis Plan (SAP) in ref. 27)). Three markers were studied: IgG bAbs against SARS-CoV-2 spike protein (spike IgG), IgG bAbs against the spike protein receptor binding domain (RBD IgG) and nAbs measured by a pseudovirus neutralization assay (50% inhibitory dilution, ID50). We report spike IgG and RBD IgG readouts in WHO (World Health Organization) international units (IU) and calibrated ID50 titres to a WHO international standard, which enables comparison of the results to those of the COVE and the COV002-UK trials.

Results

Immunogenicity subcohort and case-cohort set

The assessment of immune correlates was based on measurement of the antibody markers at D29 (hereafter, ‘D29’ denotes the day 29 study visit, with an allowable visit window of ±3 d around 28 d post injection) in the case-cohort set, comprising a stratified random sample of the study cohort (the ‘immunogenicity subcohort’) plus all vaccine recipients with the COVID-19 primary endpoint after D29 (‘breakthrough cases’) (Extended Data Fig. 1a). (The sampling design is further detailed in the SAP.) Extended Data Fig. 1b–d describe the case-cohort set overall and by the three geographic regions Latin America (Argentina, Brazil, Chile, Colombia, Mexico and Peru), South Africa and the United States, with antibody data available from 48, 15 and 29 breakthrough cases, respectively, and from 212, 200 and 409 non-cases, respectively. All analyses of D29 antibody markers were restricted to per-protocol baseline SARS-CoV-2 seronegative participants in the case-cohort set (Supplementary Table 1 and Extended Data Fig. 2).

Participant demographics

The demographics and clinical characteristics of the immunogenicity subcohort (N = 826 in the vaccine group, N = 90 in the placebo group) are reported in Supplementary Table 2. Of all participants in the immunogenicity subcohort, 50.4% were ≥60 years old, 51.7% were considered at-risk for severe COVID-19 (defined as having one or more comorbidities associated with elevated risk of severe COVID-191) and 44.8% had been assigned female sex at birth. At US sites, 49.3% had minority status (defined as other than White Non-Hispanic). The immunogenicity subcohort was 26.0% from Latin America, 23.9% from South Africa and 50.0% from the United States. Supplementary Tables 35 provide demographics and clinical characteristics of the immunogenicity subcohort by geographic region.

COVID-19 endpoint

Correlates analyses were performed on the basis of adjudicated moderate to severe-critical COVID-19. Onset was required to be ≥28 d post vaccination (the day of vaccination defines the D1 study visit) as well as ≥1 d post D29 (the D29 study visit was not always 28 d post vaccination due to allowable study visit windows, as discussed above) through to 22 January 2021 (the data cut-off date of the primary analysis)1. This COVID-19 endpoint was selected to be as close as possible to the COVID-19 endpoint used in the primary analysis1 (efficacies against the primary1 vs correlates analysis ‘moderate to severe-critical COVID-19’ endpoints were very similar), while also seeking inclusiveness of endpoints to aid statistical precision. See Online Methods for details on the analysis databases and exact differences between the two endpoints. The last COVID-19 endpoint included in the correlates analysis occurred 48 d post D29 (Extended Data Fig. 1e). Of the 92 breakthrough cases with antibody data, 7 were severe-critical (using the same definition as in ref. 1), precluding correlates analyses restricted to severe-critical endpoints. Non-cases were defined as baseline seronegative per-protocol participants sampled into the immunogenicity subcohort with no evidence of SARS-CoV-2 infection up to the end of the correlates study period, which is up to 54 d post D29, the last day such that at least 15 such vaccine recipients were still at risk in the immunogenicity subcohort, but not later than the data cut-off of 22 January 2021.

SARS-CoV-2 lineages causing COVID-19 endpoints

Figure 1 in ref. 2 (which reports the results of the final efficacy analysis) shows the distribution of SARS-CoV-2 lineages among COVID-19 endpoint cases for each country in the trial over time during the double-blind period of the trial (21 September 2020 through 9 July 2021). Data in this figure through 22 January 2021 are relevant for the current work. With ‘reference’ referring to the Wuhan-Hu-1 strain harbouring the D614G point mutation and ‘other’ referring to sequences with substitutions departing from reference not resulting in another SARS-CoV-2 lineage or variant, the results show two lineages in the United States at approximately equal prevalence (reference, other); almost all lineages being Beta in South Africa; and lineages reference, Zeta and other in Latin America having similar proportions. For the US, most ‘other’ lineages were genetically close to the reference. These data are consistent with the preliminary sequencing data provided in ref. 1.

Fig. 1: D29 antibody marker level by COVID-19 outcome status.
figure 1

ac, Violin plots of anti-spike IgG concentration (a), anti-RBD IgG concentration (b) and PsV neutralization ID50 titre (c). Data points are from baseline SARS-CoV-2 seronegative per-protocol vaccine recipients in the set (ac: N = 92 cases, 821 non-cases). In interior box plots, upper and lower horizontal edges show the 25th and 75th percentiles of antibody level, the middle line indicates the 50th percentile, and vertical bars the distance from the 25th (or 75th) percentile of antibody level to the minimum (or maximum) antibody level within the 25th (or 75th) percentile of antibody level minus (or plus) 1.5 times the interquartile range. At both sides of the box, a rotated probability density curve estimated by a kernel density estimator with a default Gaussian kernel is plotted. Frequencies of participants with detectable responses were computed by weighting the inverse probability of sampling. Pos.cut, dectectability/positivity cut-off. Detectable responses for spike IgG and RBD IgG were defined by IgG > 10.8424 BAU ml−1 and IgG > 14.0858 BAU ml−1, respectively. ULoQ, upper limit of quantitation. ULoQ = 238.1165 BAU ml−1 for spike IgG and 172.5755 BAU ml−1 for RBD IgG. LLoQ, lower limit of quantitation. Seroresponse for ID50 was defined by a quantifiable value >LLoQ (2.7426 IU50 ml−1). ULoQ = 619.3052 IU50 ml−1 for ID50. Cases are baseline SARS-CoV-2 seronegative per-protocol vaccine recipients with the primary COVID-19 endpoint (moderate to severe-critical COVID-19 with onset both ≥1 d post D29 and ≥28 d post vaccination) up to 54 d post D29 but not later than 22 January 2021.

Source data

Lower D29 antibody marker levels in cases vs non-cases

At D29, 85.3% (95% CI: 82.0%, 88.0%) and 81.2% (77.7%, 84.3%) of vaccine recipient non-cases had a detectable spike IgG response (defined by IgG > 10.8424 binding antibody units (BAU) per ml) or detectable RBD IgG response (defined by IgG > 14.0858 BAU ml−1), respectively, whereas 56.4% (52.1%, 60.6%) had quantifiable ID50 nAb titre (Fig. 1 and Table 1). For each D29 marker, the response rate was lower in cases than in non-cases; this difference was largest for ID50 (response rate difference: −19.5% (95% CI: −29.7%, −8.2%)) (Table 1). For each D29 marker, the geometric mean value was also lower in cases than in non-cases, with ID50 again having the greatest difference (3.22 international units per ml (IU50 ml−1) (95% CI: 2.50, 4.15) in cases vs 4.95 (4.42, 5.55) in non-cases, ratio = 0.65 (0.52, 0.81)). The bAb markers had slightly higher case/non-case geometric ratios, with 95% CI upper bounds close to 1. Similar results were seen in each ENSEMBLE geographic region (Supplementary Table 6 and Extended Data Figs. 35), with D29 ID50 nAb titre in US participants having the greatest response rate difference (cases minus non-cases; −26.8% (−41.6%, −6.3%)) and the lowest geometric mean ratio (cases/non-cases; 0.55 (0.41, 0.72)) across all markers and geographic regions.

Table 1 D29 antibody marker response rates and geometric means by COVID-19 outcome status

The D29 bAb markers were highly correlated with each other (Spearman rank r = 0.91) but were only moderately correlated with ID50 (r = 0.55 for spike IgG and ID50; r = 0.54 for RBD IgG and ID50) (Extended Data Fig. 6). For each D29 marker, the reverse cumulative distribution function curve in the context of the overall vaccine efficacy estimate is shown in Supplementary Fig. 1.

As expected because the population is baseline seronegative, frequencies of placebo recipients with detectable or quantifiable responses at D29 were near zero (for example, for ID50, 0.6% and 0% for cases and non-cases, respectively) (Supplementary Fig. 2).

D29 antibody marker levels correlate with risk

The cumulative incidence of COVID-19 for vaccine recipient subgroups defined by D29 antibody marker tertile (Fig. 2a–c) shows that COVID-19 risk decreased with increasing tertile. The hazard ratio (High vs Low tertile) was significantly less than one for ID50 (estimate of 0.41; 95% CI: 0.22, 0.75), and there were weak trends towards inverse correlates for the two IgG markers (estimates of 0.75 (0.42, 1.32) for spike IgG and 0.61 (0.34, 1.09) for RBD IgG). Only ID50 passed the pre-specified family-wise error rate (FWER) multiplicity-adjusted P value threshold for testing whether the hazard rate of COVID-19 differed across the Low, Medium and High tertiles (Table 2A; P = 0.003, FWER-adjusted P = 0.011) (multiplicity adjustment was performed over the six categorical and quantitative markers). Evidence for the spike and RBD bAb markers as inverse correlates of risk across tertiles was weaker, with unadjusted P values of 0.50 and 0.16, respectively (Table 2A).

Fig. 2: COVID-19 risk by D29 antibody marker level.
figure 2

ac, Plots showing covariate-adjusted cumulative incidence of COVID-19 by Low, Medium or High tertile of D29 antibody marker level in baseline SARS-CoV-2 seronegative per-protocol participants. a, Anti-spike IgG concentration. b, Anti-RBD IgG concentration. c, PsV neutralization ID50 titre. Baseline covariates adjusted for were baseline risk score and geographic region.

Source data

Table 2 Covariate-adjusted hazard ratio of COVID-19 across D29 antibody marker tertiles (A) or per 10-fold increase in D29 quantitative marker (B)

Similar results were observed for the D29 quantitative markers, with estimated hazard ratio per 10-fold increase in antibody marker level clearly indicating an inverse correlate of risk for ID50 (estimate of 0.49 (95% CI: 0.29, 0.81)), with estimates less than one for each IgG marker, yet with 95% CIs including 1.0 (estimates of 0.69 (0.41, 1.16) for spike IgG and 0.59 (0.33, 1.06) for RBD IgG) (Table 2B). Again, only ID50 passed the multiple testing correction (FWER-adjusted P = 0.016). (Supplementary Table 7 shows the hazard ratios per standard deviation-increase in each D29 marker.) An additional post-hoc analysis was done reporting Cox model fits for each antibody marker with a set of demographic factors also in the model (Supplementary Table 8). The results are similar; for example, the estimated hazard ratio per 10-fold increase in ID50 is 0.49 (0.30, 0.80). Extended Data Fig. 7 shows analogous results across pre-specified subgroups of vaccine recipients for RBD IgG and ID50, respectively. The point estimates indicate stronger correlates of risk for participants assigned female vs male sex at birth and for communities of colour vs White Non-Hispanics in the United States, generating potential hypotheses about the role of sex and race/ethnicity on vaccine-induced immunity. However, because the 95% CIs overlap, these apparent differences could be false positives.

When vaccine recipients were divided into subgroups defined by having an antibody marker level above a specific threshold and varying the threshold over the range of values, nonparametric regression showed that cumulative incidence of COVID-19 (from 1 to 54 d post D29) decreased as the ID50 threshold increased (Fig. 3a). This decrease in risk was steepest across increasing thresholds closer to the assay lower limit of quantitation (LLoQ = 2.74 IU50 ml−1) and was more gradual across higher increasing thresholds. The risk estimate for COVID-19 was 0.009 (95% CI: 0.007, 0.012) for all vaccine recipients and decreased to 0.006 (0.004, 0.009) for vaccine recipients with any quantifiable ID50 titre, whereas at the highest threshold examined (>185 IU50 ml−1), the risk estimate was 0.004 (0.000, 0.009). The bAb markers also showed decreases in risk (although less pronounced) with increasing threshold value (Extended Data Fig. 8a,b).

Fig. 3: Analyses of D29 ID50 titre as a correlate of risk and as a correlate of protection.
figure 3

Analyses were performed in baseline SARS-CoV-2 seronegative per-protocol vaccine recipients. a, Covariate-adjusted cumulative incidence of COVID-19 by 54 d post D29 by D29 ID50 titre above a threshold. Blue dots are point estimates at each COVID-19 primary endpoint linearly interpolated as shown by solid black lines; the grey shaded area indicates pointwise 95% CIs. The estimates and CIs were adjusted using the assumption that the true threshold-response is non-increasing. The upper boundary of the green shaded area is the estimate of the reverse cumulative distribution function (CDF) of D29 ID50 titre. The vertical red dashed line is the D29 ID50 threshold above which no COVID-19 endpoints occurred (in the time frame of 1–54 d post D29). b, Covariate-adjusted cumulative incidence of COVID-19 by 54 d post D29 by D29 ID50 titre, estimated using a Cox model (solid purple line) or a nonparametric method (solid blue line). Each point on the curve represents the covariate-adjusted cumulative COVID-19 incidence at the given D29 ID50 titre value. The dotted black lines indicate bootstrap pointwise 95% CIs. The upper and lower horizontal grey lines are the overall cumulative incidence of COVID-19 from 1 to 54 d post D29 in placebo and vaccine recipients, respectively. c, Vaccine efficacy (solid purple line) by D29 ID50 titre, estimated using the Cox proportional hazards implementation of ref. 44. Each point on the curve represents the vaccine efficacy at the given D29 ID50 titre value. The dashed black lines indicate bootstrap pointwise 95% CIs. Vaccine efficacy (solid blue line) by D29 ID50 titre, estimated using a nonparametric implementation of ref. 44 (described in the SAP). The blue shaded area represents the 95% CIs. In b and c, the green histograms are estimates of the densities of D29 ID50 titre and the horizontal grey line is the overall vaccine efficacy from 1 to 54 d post D29, with the dotted grey lines indicating the 95% CIs. Baseline covariates adjusted for were baseline risk score and geographic region. In b and c, curves are plotted over the range from LLoQ/2 to the 97.5th percentile =96.3 IU50 ml−1.

Source data

Figure 3b and Extended Data Fig. 8c,d show the Cox modelling results in terms of estimated cumulative incidence of COVID-19 (from 1 to 54 d post D29) across D29 marker levels. For each antibody marker, COVID-19 risk decreased as antibody marker level increased. Across the full range of D29 ID50 values examined (non-quantifiable ID50 < 2.74 IU50 ml−1 to 96.3 IU50 ml−1, the 97.5th percentile value), estimated risk decreased from 0.016 (0.011, 0.021) to 0.004 (0.002, 0.008), a 4-fold reduction in risk (Fig. 3b). For D29 RBD IgG, estimated risk also decreased across the range of values examined, from 0.016 (0.010, 0.025) at negative response (7 BAU ml−1) to 0.008 (0.004, 0.013) at 173 BAU ml−1 (the 97.5th percentile), a 2-fold reduction in risk (Extended Data Fig. 8d). Results for D29 spike IgG were similar (Extended Data Fig. 8c).

Vaccine efficacy increases with D29 antibody marker level

Figure 3c and Extended Data Fig. 8e,f show estimated vaccine efficacy (VE) against COVID-19 (from 1 to 54 d post D29) across a range of levels of a given D29 antibody marker. For each marker, estimated vaccine efficacy rose with increasing marker level. This increase was greatest for ID50 titre: at non-quantifiable D29 ID50, estimated vaccine efficacy was 60% (43%, 72%); this increased to 78% (69%, 86%) at 9.9 IU50 ml−1 and to 89% (78%, 96%) at 96.3 IU50 ml−1 (purple curve, Fig. 4c). Nonparametric estimation of the vaccine-efficacy-by-D29 ID50 curve suggests that vaccine recipients with non-quantifiable ID50 titre had an estimated vaccine efficacy of 60%, with a jump in vaccine efficacy just above the LLoQ to 79% (blue curve, Fig. 3c).

Fig. 4: Vaccine efficacy by D29 ID50 titre for each ENSEMBLE geographic region and for United States participants in ENSEMBLE compared to two other efficacy trials.
figure 4

ac, Vaccine efficacy (solid lines) in baseline SARS-CoV-2 seronegative per-protocol vaccine recipients by D29 PsV-nAb ID50 titre to D614G in ENSEMBLE by geographic region (US, United States; Lat Am, Latin America; S Afr, South Africa) (a); D29 predicted geometric mean PsV-nAb ID50 titre to strains that circulated during follow-up in each designated geographic region (see Supplementary Note 2) (b); and D57 ID50 titre to D614G in COVE, D29 ID50 titre to D614G in ENSEMBLE (US), D56 ID50 titre to D614G in COV002, all estimated using the Cox proportional hazards implementation of ref. 44 (c). The dotted lines indicate bootstrap pointwise 95% CIs. The follow-up periods for the VE assessment were: a, ENSEMBLE-US, 1–53 d post D29; ENSEMBLE-Lat Am, 1–48 d post D29; ENSEMBLE-S Afr, 1–40 d post D29; b, COVE (doses D1, D29), 7–10 d post D57; ENSEMBLE-US, 1–53 d post D29; COV002 (doses D0, D28; VE defined as 1 − relative risk of whether or not an event occurred =28 d post D28 until the end of the study period). The histograms are an estimate of the density of D29 ID50 titre in ENSEMBLE (including by geographic region in a and b). The blue histograms are estimates of the densities of ID50 titres in baseline SARS-CoV-2 negative per-protocol vaccine recipients in COVE. Curves are plotted over the range from 10 IU50 ml−1 to the 97.5th percentile of the marker for COVE and from the 2.5th percentile to the 97.5th percentile for ENSEMBLE. Baseline covariates adjusted for were: ENSEMBLE: baseline risk score and geographic region; COVE: baseline risk score, comorbidity status and community of colour status; COV002: baseline risk score.

Source data

Two sensitivity analyses (see SAP for details) were performed to evaluate how strong unmeasured confounding would have to be to overturn an inference that D29 antibody marker impacted vaccine efficacy. The first sensitivity analysis, based on E-values28, assessed the robustness of the inference that vaccine efficacy is greater at High vs Low ID50 tertile. The results indicated some robustness to confounding of this inference for ID50 but not for the bAb markers (Supplementary Table 9). The second sensitivity analysis ‘flattened’ the estimated vaccine-efficacy-by-D29 antibody marker level curve by assuming a certain amount of unmeasured confounding. Estimated vaccine efficacy still increased with D29 ID50 titre (Extended Data Fig. 9).

Vaccine efficacy rises with D29 ID50 titre in each region

Vaccine efficacy increased with D29 ID50 titre in each geographic region (Fig. 4a). The US curve was shifted upwards compared with the South African curve, which was in turn shifted upwards compared with the Latin American curve. The curves also indicated higher vaccine efficacy at non-quantifiable ID50 in the United States (69%; 95% CI: 43%, 83%) compared with South Africa (60%; 16%, 82%) and Latin America (43%; 5%, 64%); however, the confidence intervals overlapped. Extended Data Fig. 10 shows similar results for spike IgG and RBD IgG, where vaccine efficacy also increased with D29 bAb marker level (except that vaccine efficacy appeared to remain constant in South Africa with increasing D29 RBD IgG concentration) and the lowest bAb levels were needed in the United States out of the three regions to mark a given level of vaccine efficacy. (Participant demographic characteristics of geographic region subgroups of the immunogenicity subcohort are shown in Supplementary Tables 35; response rates and magnitudes are shown by case/non-case status for each geographic region in Supplementary Table 6 and Extended Data Figs. 35. Supplementary Fig. 3 shows the distribution of the number of days from D29 until COVID-19 endpoint occurrence or until right censoring, stratified by case/non-case status and by geographic region).

Vaccine efficacy by circulating-matched D29 ID50 titre

In the United States, the circulating strains during follow-up were Wuhan-like, being genetically and antigenically similar to the D614G strain against which neutralizing antibodies were measured. In contrast, in South Africa, Beta predominantly circulated and in Latin America, several variants circulated, such that for these regions the correlates analyses had a mismatch where antibodies were measured to D614G and vaccine efficacy was measured against circulating strains different from D614G. One model for a correlate of protection, the ‘variant-invariant CoP model’, states that the level of ID50 against a circulating strain required to achieve a certain vaccine efficacy value against that strain is constant across strains. To evaluate this model, we repeated the analysis of Fig. 4a using a new D29 ID50 marker for each of the three geographic regions, defined as the predicted geometric mean ID50 to the strains that circulated during follow-up in the given geographic region, with the prediction based on measurement of neutralization titres of Ad26.COV2.S vaccine recipients to a panel of variants (see Supplementary Note 2). The vaccine efficacy curves for the United States and South Africa become closer when creating this greater match of the ID50 measurements to circulating strains, providing some support for the model (Fig. 4b). For example, VE for South Africa is 81% (57%, 98%) at ID50 = 10 IU50 ml−1 averaged to the South African circulating strains (Beta variant), compared to the United States where VE is 86% (75%, 95%) at ID50 = 10 IU50 ml−1 to D614G that circulated in the United States. In contrast, for Latin America, the VE curve based on ID50 to circulating strains did not change noticeably compared to the curve based on ID50 to D614G. This is explained by the fact that more than 90% of the placebo arm COVID-19 endpoints in Latin America through 22 January 2021 were of the ancestral lineage.

Cross-trial cross-platform comparison of ID50 titre as a CoP

We next compared the vaccine efficacy-by-ID50 titre curves for three double-blind, placebo-controlled COVID-19 vaccine efficacy trials: ENSEMBLE (one dose: D1; VE curve by D29 ID50 titre), COVE (two doses: D1, D29; VE curve by D57 ID50 titre) and the COV002 (United Kingdom) trial29 of the AZD1222 (ChAdOx1 nCoV-19) chimpanzee adenoviral-vectored COVID-19 vaccine (two doses: D0, D28; VE curve by D56 ID50 titre). In this comparison for ENSEMBLE, we restricted to the United States (ENSEMBLE-US) to match COVE in its restriction to the United States.

In each trial, vaccine efficacy rose with increasing ID50 titre (Fig. 4c). Comparison at high and at low ID50 titres is hindered by the limited overlap of adenovirus-vectored and mRNA vaccine-elicited ID50 titres, with span of values (IU50 ml−1) from 2.5th to 97.5th percentile of 1.4 to 96.3 in ENSEMBLE (the span in ENSEMBLE-US is 1.4–98) vs 32 to 1,308 in COVE. In the intersection of these ID50 titre spans (32–96.3 IU50 ml−1) (the only titre spans where vaccine efficacy levels can be directly compared), the point estimates of vaccine efficacy are similar and the confidence bands show large overlap. While the confidence intervals of the curves in ENSEMBLE-US are wide, the lower overall vaccine efficacy in ENSEMBLE-US compared with COVE could be explained by the lower ID50 titres, consistent with results of meta-analyses21,30.

Discussion

We report that each D29 antibody marker evaluated was an inverse correlate of risk of moderate to severe-critical COVID-19 over 83 d post Ad26.COV2.S vaccination, with strongest evidence for ID50 titre, passing the pre-specified multiple testing correction bar. We found that vaccine efficacy increased with higher D29 antibody marker levels, with results supporting the importance of achieving quantifiable antibodies; negative binding antibody response and non-quantifiable neutralization corresponded to moderate vaccine efficacy of about 60%. We found that the risk of COVID-19 decreases incrementally with D29 neutralization titres (Fig. 3) and that non-zero risk remains at highest titres, and estimated vaccine efficacy increases incrementally from 60% at non-quantifiable titres to 90% at highest titres, which supports a relative, not an absolute, correlate of protection. The moderate vaccine efficacy in vaccine recipients with non-quantifiable neutralizing antibodies indicates that this marker did not fully mediate vaccine efficacy: other immune responses or immune markers at other time points or not quantifiable in serum must have contributed to vaccine efficacy. Memory B cells, Fc effector functions, CD4+ and CD8+ T cells (at least for severe disease) all probably contribute to protection31. Overall, our findings are a step towards establishing an immune marker surrogate endpoint for adenovirus-vectored COVID-19 vaccines, and potentially a surrogate endpoint that might prove useful across vaccine platforms.

Strengths of our study include the fact that analyses were pre-specified; the fact that the data come from the double-blind follow-up period of a randomized placebo-controlled phase 3 vaccine efficacy trial; and the restriction to SARS-CoV-2-naïve individuals, ensuring that only vaccine-elicited immune responses are studied as correlates. (The latter restriction could also be viewed as a limitation, as a correlate of protection may be altered by previous infection and/or vaccination and the global proportion of SARS-CoV-2-naïve individuals is declining32.) In the continuing follow-up of ENSEMBLE, participants who experienced the COVID-19 endpoint have been receiving vaccinations, and future analyses are planned to assess the same antibody markers as immune correlates in these individuals. The degree to which each evaluated D29 antibody marker predicts vaccine efficacy against SARS-CoV-2 strains other than those circulating during the trial period, as well as over longer follow-up periods will be important for informing the use of any of these biomarkers as a surrogate endpoint in practice.

The estimated relationship between ID50 titre and vaccine efficacy differed among the United States, Latin America and South Africa, which might be explained by the greater match of the vaccine strain to the reference strain in the United States compared with the different strains that circulated in Latin America and South Africa. In support of this hypothesis, Ad26.COV2.S efficacy against moderate to severe-critical COVID-19 with onset ≥28 d post vaccination was reported to be higher against the reference strain (58.2% (95% CI: 35.0%, 73.7%)) than against non-reference lineages (44.4% (34.6%, 52.8%)), particularly against Gamma (36.5% (14.1%, 53.3%)), over a median follow-up of 121 d post vaccination2. Another potential explanation for the apparent difference in the estimated relationship between ID50 titre and vaccine efficacy by geographic region is that COVID-19 cases tended to occur earlier in South Africa than in the other two geographic regions, and the longer follow-up in the United States. This longer follow-up may have allowed expansion of neutralizing antibody breadth, which is associated with improved coverage of SARS-CoV-2 variants over time33. An additional potential explanation may be a lower placebo arm attack rate in the United States (as greater antibody levels may be needed to protect against greater exposure8). However, a post-hoc interaction test in a marginalized Cox model for whether the association of quantitative D29 ID50 titre with COVID-19 differed across the three geographic regions yielded P = 0.83, indicating no statistical evidence for a differential correlate by region.

We found that in ENSEMBLE, the pseudovirus neutralization assay readout (D29 ID50 titre) had stronger evidence as a correlate than either of the binding antibody assay readouts. However, given that the hazard ratio estimates per 10-fold increase of each of the D29 binding antibody markers were less than 1.0, the binding and pseudovirus neutralization assay readouts were substantially correlated, and the fact that both assays were strong inverse correlates of risk (of similar strength as ID50 nAb titre) in the COVE25 and COV002 (Ad-vectored)26 trials, we believe it is likely that both binding antibody markers are also correlates (albeit weaker ones) for the Ad26.COV2.S vaccine. However, even the two Ad-vectored vaccines (Ad26.COV2.S and AZD1222) differ (one vs two doses, with one implication being potentially increased avidity of post-dose two antibodies; pre-fusion stabilized vs native-like spike; human vs chimpanzee adenovirus). Moreover, different variants (B.1.177 and B.1.1.7 (Alpha)) were circulating at the sites where the COV002 trial was conducted26. Future correlates analyses should help clarify whether the binding antibody markers are also correlates for Ad26.COV2.S.

In the range of overlapping titres, similar vaccine efficacy by nAb ID50 curves were observed in ENSEMBLE-US and COVE. In both trials, the vast majority of circulating strains were similar to the reference strain1,2,34 (used in the nAb assay). Thus, the most transportable correlate across vaccine platforms may involve assessing nAbs against circulating strains, which can be evaluated in the future.

Our study has limitations. First, other Ad26.COV2.S-induced immune responses (for example, spike-specific T-cell responses35, Fc effector antibody functions36) were not assessed. Analyses of D29 spike-specific antibody-dependent cellular phagocytosis (ADCP) are underway; future work will address how ADCP and other immune markers may work together with bAb and/or nAb markers as correlates of protection. A second limitation is the relatively short follow-up (slightly over 2 months post D29), which prevented assessment of D29 antibody marker correlates over longer-term risk. Measurement of the D29 markers in vaccine breakthrough COVID-19 events occurring after the cut-off of the primary analysis will enable a future analysis of correlates for COVID-19 through 6–7 months. The primary analysis of ENSEMBLE showed waning of overall vaccine efficacy from 67% at 2–3 months post vaccination1 to 53% at 6–7 months post vaccination, with the waning evidently restricted to variants of concern2, yet antibody levels did not decrease from 2 to 7 months. The future analyses may help understand these results by directly assessing D29 antibodies as correlates for COVID-19 through 6–7 months. A third limitation is that the study took place before the emergence of the Delta and Omicron variants (with analysis pooled over all SARS-CoV-2 strains that were mainly reference, Beta, Zeta and other1,2) and before any boosters were given. Future work is being planned to assess ENSEMBLE levels of post vaccination nAbs against spike-pseudotyped viruses of each sufficiently prevalent variant of concern as correlates of risk and protection against COVID-19 with the matched variant of concern: these include Beta in South Africa and Gamma, Lambda and Mu in Latin America. The region-specific differences in circulating strains comprise a fourth limitation in that it is not possible to assess whether strain and/or geographic region had isolated impacts on the correlates of risk and protection. A fifth limitation is that the comparison of vaccine efficacy by antibody marker curves across efficacy trials did not use a common reference covariate distribution in the adjustment for prognostic factors, and the estimates of vaccine efficacy by antibody marker can be biased if a confounder of the effect of the marker on COVID-19 risk was not accounted for. Additionally, the primary endpoints differed among studies (COVE, COV002: symptomatic COVID-19 of any severity vs ENSEMBLE: moderate to severe-critical COVID-19; all 14 d post second dose/vaccination in baseline seronegative participants). However, in the ENSEMBLE primary efficacy analysis, only 1 case was mild out of 117 symptomatic COVID-19 events in the vaccine group and only 3 of 351 in the placebo group1, supporting similarity of the endpoints across the three trials.

Our study evaluated antibody levels measured 4 weeks post vaccination (D29) as correlates of COVID-19 occurrence over the subsequent 54 d, whose results can be approximately interpreted as outcome-proximal correlates for vaccine recipients’ average antibody level during follow-up for 54 d after D29. Alternative ‘outcome-proximal’ correlates analyses measure antibody levels over time and assess their association with the instantaneous hazard of COVID-19 occurrence, which account for the fact that antibody levels change over time; these two types of analyses address distinct questions. Antibody levels of one-dose Ad26.COV2.S recipients do not decrease from D29 to D71 and slightly increase37, suggesting that antibody dynamics do not play a major role in complicating the interpretation of the current results, given the short-term follow-up of 54 d.

Considering the interest in assessing correlates against severe COVID-19 and the fact that many Ad26.COV2.S-induced antibody responses show increased magnitude and affinity maturation over time post D2933,38, our study’s scope of a single clinical endpoint (moderate to severe-critical COVID-19) and a single antibody measurement time point (D29) are further limitations. Currently, antibody responses are being assayed in D29 and D71 samples from the remaining ~300 vaccine breakthrough COVID-19 events during the entire double-blinded period. Planning is underway to assess correlates for COVID-19 over longer-term follow-up for severe COVID-19, for asymptomatic SARS-CoV-2 infection and for viral load.

Another important question is how vaccine efficacy depends on SARS-CoV-2 spike features (for example, amino acid motifs, distances to the vaccine insert, neutralization sensitivity scores) and whether/how the immune correlates depend on these spike features. Future work is planned to address these questions, with the overarching objective of building a general model for predicting vaccine efficacy across SARS-CoV-2 strains/spike features and time since vaccination, on the basis of D29 and possibly also D71 antibody markers. The data from the additional vaccine breakthrough cases discussed above will provide an opportunity to construct and evaluate such a model. In the meantime, the contributions of the current correlates study are to: (1) establish that pseudovirus neutralization assay readouts are a correlate of risk for COVID-19 for the Ad26.COV2.S vaccine and (2) provide proof of concept that this marker is probably also a correlate of protection for this vaccine. After the additional evidence about this marker as a correlate of protection is gathered as indicated above, it should be possible to define whether and how to use this marker as a surrogate endpoint for predicting vaccine efficacy.

Methods

Trial design, study cohort, COVID primary endpoints and case/non-case definitions

Enrolment for the ENSEMBLE trial began on 21 September 2020. A total of 44,325 participants were randomized (1:1 ratio) to receive a single injection of Ad26.COV2.S or placebo on D1. Serum samples were taken on D1 and on D29 for potential antibody measurements. Antibody measurements were evaluated as correlates against the moderate to severe-critical COVID-19 endpoint defined in the main text.

While the correlates analysis only included COVID-19 primary endpoints up to 22 January 2021 (the cut-off date of the primary analysis1), the correlates analysis was performed using the analysis database of the final analysis2. Compared to the analysis database of the primary analysis, the analysis database of the final analysis includes changes to the SAP and protocol, as well as information that became available only after the database lock date on cases up to 22 January 2021. Specifically, for the primary analysis, the definition of the moderate to severe-critical COVID-19 endpoint was algorithmically programmed according to the protocol definition (with only severe-critical being assessed by the Case Severity Adjudication Committee). After the primary analysis, severity was assessed by the (blinded) adjudication committee for all case definitions. This also includes central confirmation results which were obtained after the primary analysis on COVID-19 primary endpoints with an onset before 22 January. Other differences between the moderate to severe-critical COVID-19 endpoint for the correlates analysis vs that for the primary analysis are: (1) both analyses included endpoints that occurred at least 28 d post vaccination, where the correlates analysis additionally required that endpoints occurred after the D29 visit (which could have occurred ±3 d around 28 d post vaccination, on the basis of the allowable study visit windows), when the markers were measured; (2) the correlates analysis only required RT–PCR SARS-CoV-2 positivity of a nasal swab at a local laboratory (with or without central confirmation), whereas the primary analysis required that participants with RT–PCR SARS-CoV-2 positivity of a nasal swab at a local laboratory must also have a respiratory tract sample confirmed to be RT–PCR SARS-CoV-2 positive at a central laboratory using the m-2000 SARS-CoV-2 real-time RT–PCR assay (Abbott)1.

Correlates analyses were performed in baseline SARS-CoV-2 seronegative participants in the per-protocol cohort, with the same definition of ‘per-protocol’ as in ref. 1. Within this correlates analysis cohort, cases were COVID-19 primary endpoints in vaccine recipients starting at both ≥1 d post D29 and ≥28 d post vaccination up to the end of the correlates study period, which is up to 54 d post D29 but not later than the data cut-off (22 January 2021). Participants with any evidence of SARS-CoV-2 infection, such as a positive nucleic acid amplification test or rapid antigen test result, up to D29 were excluded. Correlates analyses were also done by counting endpoints starting at 7 d after D29 or later through the same data cut-off, under the rationale that the D29 antibody marker measurements in participants who were diagnosed with the COVID-19 endpoint between 1–6 d post D29 might have been influenced by SARS-CoV-2 infection. The point estimates of both analyses were similar; we report only the results that started counting COVID-19 endpoints at both ≥1 d post D29 and ≥28 d post vaccination, given the greater precision (approximately 35% more vaccine breakthrough cases).

Within the correlates analysis cohort, non-cases/controls were vaccine recipients sampled into the immunogenicity subcohort with no evidence of SARS-CoV-2 infection up to the end of the correlates study period, which is up to 54 d post D29 but not later than the data cut-off (22 January 2021).

Solid-phase electrochemiluminescence S-binding IgG immunoassay

Serum IgG binding antibodies against spike and serum IgG binding antibodies against RBD were quantitated using a validated solid-phase electrochemiluminescence S-binding IgG immunoassay and the MSD Discovery Workbench software (version 4.0) as previously described25. Within an assay run, each human serum test sample was added to the precoated wells in duplicates in an 8-point dilution series. Antibodies bound to spike or to RBD were detected using an MSD SULFO-TAG anti-human IgG detection antibody (Meso Scale Diagnostics, R32AJ-1, goat polyclonal) diluted to 1X from a 200X vendor-provided stock. Conversion of arbitrary units per ml (AU ml−1) readouts to bAb units per ml (BAU ml−1) was based on the WHO 20/136 anti-SARS-CoV-2 immunoglobulin international standard39 also as previously described25. Antibody response was defined by detectable IgG concentration above the antigen-specific positivity cut-off (10.8424 BAU ml−1 for spike, 14.0858 BAU ml−1 for RBD).

Pseudovirus neutralization assay

Neutralizing antibody activity was measured at Monogram in a formally validated assay (detailed in ref. 40) that utilized lentiviral particles pseudotyped with full-length SARS-CoV-2 spike protein. The lentiviral particles also contained a firefly luciferase reporter gene, enabling quantitative measurement (via relative luminescence units) of infection of HEK 293T cells transiently transfected to express human ACE2 cell surface receptor protein and the TMPRSS2 protease. Supplementary Table 10 provides the assay limits. Readouts from the Monogram assay (also used in the immune correlates analysis of the COV002 trial of the ChAdOx1 nCoV-19 (AZD1222) vaccine26) had been calibrated to those from the Duke pseudovirus neutralization assay (used in the immune correlates analysis of the COVE trial of the mRNA-1273 vaccine25) on the basis of the WHO 20/136 anti-SARS-CoV-2 immunoglobulin international standard39 and conversion to IU50 ml−1, enabling direct comparison of vaccine efficacies at a given ID50 titre in ENSEMBLE to the same ID50 titre in COVE or in COV002. Neutralizing antibody seroresponse was defined by quantifiable ID50 greater than the LLoQ, 2.7426 IU50 ml−1.

Ethics

All experiments were performed in accordance with the relevant guidelines and regulations. All participants whose serum samples were assayed in this work provided informed consent.

Statistical methods

All data analyses were performed as pre-specified in the SAP (available as a supplementary file), with one exception. We had originally pre-specified to include COVID-19 primary endpoints through the last COVID-19 primary endpoint with antibody data in the vaccine arm, and to let the time of this COVID-19 primary endpoint set the total duration of follow-up for the correlates analyses. However, after learning that the marginalized Cox modelling method yielded confidence intervals about the vaccine-efficacy-by-D29 marker level curve that were wider than they should be based on statistical theory (precipitated by only a few vaccine recipients in the immunogenicity subcohort being at-risk for COVID-19 at 66 d, the time of the last COVID-19 primary endpoint with antibody data in the vaccine arm), we revised this rule to set follow-up through to the last time point at which there were still 15 participants from the immunogenicity subcohort still at risk, which corresponded to 54 d post D29. Consequently, two COVID-19 primary endpoints and some non-cases beyond 54 d post D29 were excluded from the analysis. The point estimates of the vaccine-efficacy-by-D29 marker level curve were very similar for the two choices (follow-up through 54 vs 66 d post D29).

Case-cohort set included in the correlates analyses

A case-cohort41 sampling design was used to randomly sample participants for D1 and D29 antibody marker measurements. This random sample was stratified by the following baseline covariates: randomization arm, baseline SARS-CoV-2 serostatus and 16 baseline demographic covariate strata defined by all combinations of: underrepresented minority (URM) within the United States vs non-URM within the United States vs Latin America vs South Africa participant, age 18–59 vs age ≥60, and presence vs absence of comorbidities (see the SAP for details, also Extended Data Fig. 2 and Supplementary Table 1).

Covariate adjustment

All correlates analyses were adjusted for the logit of predicted COVID-19 risk score built from machine learning of data from placebo arm participants (see Supplementary Note 1 and Table 11) and geographic regions (United States, South Africa, Latin America).

Correlates of risk in vaccine recipients

All correlates of risk and protection analyses were performed in per-protocol baseline seronegative participants with no evidence of SARS-CoV-2 infection or right censoring up to D29. For each of the three D29 markers, the covariate-adjusted hazard ratio of COVID-19 (either across marker tertiles or per 10-fold increase in the quantitative marker) was estimated using inverse probability sampling-weighted Cox regression models with 95% CIs and Wald-based P values. These Cox model fits were also used to estimate marker-conditional cumulative incidence of COVID-19 through 54 d post D29 in per-protocol baseline seronegative vaccine recipients, with 95% CIs computed using the percentile bootstrap. The Cox models were fit using the survey package42 for the R language and environment for statistical computing43. The same marker-conditional cumulative incidence of COVID-19 parameter was also estimated using nonparametric dose-response regression with influence-function-based, Wald-based 95% CIs44. Point and 95% CI estimates about marker-threshold-conditional cumulative incidence were computed by nonparametric targeted minimum loss-based regression45.

Correlates of protection

Controlled vaccine efficacy

For each marker, vaccine efficacy by marker level was estimated by a causal inference approach using both Cox proportional hazards estimation and nonparametric monotone dose-response estimation44. The causal parameter being estimated is one minus the probability of COVID-19 by 54 d for the vaccine group, supposing the D29 marker was set to a given level for all vaccine recipients, divided by this probability for the placebo arm (see SAP section 12.3.2, 15.1 for details). Two sensitivity analyses of the robustness of results to potential unmeasured confounders of the impact of antibody markers on COVID-19 risk were also conducted, which specified a certain amount of confounding that made it harder to infer a correlate of protection (see the SAP for details). One of the sensitivity analyses was based on E-values28 and assessed the robustness of the inference that vaccine efficacy is greater for the upper marker tertile compared with the lower marker tertile. The other sensitivity analysis estimated how much vaccine efficacy increases with quantitative D29 antibody marker despite the specified unmeasured confounder.

Hypothesis testing

For hypothesis tests for D29 marker correlates of risk, Westfall-Young multiplicity adjustment46 was applied to obtain false-discovery rate (FDR)-adjusted P values and FWER-adjusted P values. Permutation-based multi-testing adjustment was performed over both the quantitative marker and tertilized marker CoR analyses. All P values were two-sided.

Cross-trial comparisons

Calibration of ID50 nAb titres between the Duke neutralization assay (COVE trial samples) and the Monogram PhenoSense neutralization assay (COV002 and ENSEMBLE trial samples), performed using the WHO anti-SARS-CoV-2 immunoglobulin international standard (20/136) and Approach 1 of ref. 40 (with arithmetic mean as the calibration factor), is described in the supplementary material of ref. 25.

Software and data quality assurance

The analysis was implemented in R version 4.0.343; code was verified using mock data.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.