Thresholding on Cq simplifies the serum miR-371a-3p test without affecting assay performance
The requirement for a normal control serum sample in each assay run for normalization is costly and adds another potential source of variation. To determine if assay normalization is required, we examined our previously published data from samples taken pre-orchiectomy [10] and pre-RPLND [9]. We examined four metrics with varying levels of normalization- Cq (raw value), ∆Cq (Cq normalized to internal control miR-30b-5p), corrected ∆Cq (∆Cq corrected with an external control cel-miR-39-3p), and RQ (corrected ∆Cq of sample normalized to corrected ∆Cq of normal serum).
Calculated sensitivity and specificity were both greater than 0.9 in all cases and did not change appreciably across any of the metrics tested, Table 1. AUC was 0.97–0.99 for all four metrics, and none were statistically different from one another (all p > 0.05). These results suggest that normalization to endogenous or exogenous controls, or normal healthy serum, does not impact the performance of the serum miR-371a-3p assay.
To examine interlaboratory variation, we conducted a concordance study between the two laboratories. Aliquots of the same serum sample collection were exchanged, and both sites ran identical protocols. miR-371a-3p Cq was highly concordant, with a mean difference of < 0.5 cycles between sites (p = 0.251), Fig. 1. The exogenous non-human spike-in control cel-miR-39-3p was discordant (p = 0.002), likely due to separate preparations of highly concentrated standards. Surprisingly, the endogenous control, miR-30b-5p, was also discordant (p < 0.001). These results suggest that this normalization process introduces additional variation and contributes to interlaboratory heterogeneity. We therefore recommend use of raw Cq values for cutoffs for the serum miR-371a-3p test going forwards.
Table 1
Performance metrics of raw and normalized values of serum miR-371a-3p test.
Orchiectomy (n = 69) | Cq | ∆Cq | RQ |
Threshold | 32.75 | 18.03 | 23.46 |
AUC | 0.98 | 0.98 | 0.98 |
Sensitivity | 0.93 | 0.91 | 0.93 |
Specificity | 1.00 | 1.00 | 1.00 |
1° RPLND (n = 24) | Cq | ∆Cq | RQ |
Threshold | 36.35 | 21.92 | 20.59 |
AUC | 0.97 | 0.97 | 0.97 |
Sensitivity | 1.00 | 1.00 | 0.92 |
Specificity | 1.00 | 0.92 | 0.92 |
Identification and establishment of an indeterminate range
The serum miR-371a-3p test is extremely sensitive, due in part to the pre-amplification step used prior to qPCR, which also exposes to risk of false positives. This risk is already heightened by the need to open PCR tubes following pre-amplification to set-up the qPCR, which may inadvertently spread amplification products. The inclusion of a water (‘no template’) control (NTC) sample initiated at the reverse transcription step is recommended to combat this— a positive qPCR result on NTC suggests such upstream contamination. However, we noted occasional cases where known control samples would yield an inconsistent/stochastic positive result despite a negative NTC sample result on the same qPCR run. Repeating these samples from the reverse transcription step usually yielded the anticipated negative result. In contrast, repeating runs on samples from patients with pathologically verified disease typically returned similar Cq values. Examples of repeated runs for pathologic negative and positive samples are presented in Supplementary Fig. 1.
To investigate the above observation, we aggregated a total of 150 runs from our previously published studies [9, 10]. We examined the distribution of Cq values split by group, Control vs. Viable GCT, Supplementary Fig. 2A. Individual sample Cq values are displayed in Supplementary Fig. 2B. The samples in the Viable GCT group show a broad distribution with a mean Cq and standard deviation (SD) of 26.4 ± 4.33. This wide distribution is expected given the heterogenous population with differing amounts of disease burden. However, the distribution of Cq values in the Control group appeared to be bimodal, with the mean Cq of the first peak at 32.2 ± 1.53, and the mean Cq of the second peak at 39.8 ± 0.7. The mean of the second peak is anticipated, as undetected samples are assigned Cq of 40. We were surprised that approximately 25% of all runs in the Control group fell into the first peak. Two separate research laboratories (Cambridge, UK; UTSW, Dallas, US) and one clinical laboratory (Department of Pathology, UTSW, Dallas, US) all independently reported this observation, indicating that this is unlikely to be due to technical errors. We have not found any reliable predictor for this assay behavior; it appears to be an entirely stochastic and non-predictable event. This suggests that as currently applied, the qPCR-based serum miR-371a-3p assay has an approximately 25% chance to misclassify any true negative as positive.
Mitigation of this misclassification is critical prior to clinical implementation of the test. We reasoned that defining an ‘indeterminate’ range based on the first distribution and repeating the qPCR for any sample that fell into that range would reduce misclassification from ~ 25% to ~ 6% (0.25 x 0.25 = 0.0625). Based on our established assay pipeline, we defined the indeterminate range as Cq 28–35, which approximates the mean of the first Cq peak ± 2 SDs in the controls. We then interrogated our aggregated data again to simulate how application of this revised methodology might improve viable GCT classification. To simulate the original methodology, the first chronological run per sample was selected. To simulate our revised methodology, the first chronological run per sample was selected unless its result fell into the indeterminate range (28 < Cq < 35). If so, the second chronological run was selected. Any sample that remained indeterminate after the second run was classified ‘indeterminate’ and removed from performance calculations. With this model, the original method had 81 runs. In the revised method, nine samples (11.1%) had two indeterminate results and were classified as truly indeterminate, leaving 72 runs. Two of these nine samples were in the Control group, and the remaining seven were in the Viable GCT group. We then compared the resulting Cq distributions, Fig. 2A-B. Application of the revised methodology prevented six false positives with accuracy improved from 0.85 to 0.93, and AUC from 0.909 to 0.954, Fig. 2C-D and Supplementary Table 2. False positives in the Control group declined from 8/23 (34.8%) to 2/23 (8.7%), supporting the observation that this event is stochastic in nature.
Application of revised methodology to an updated primary RPLND dataset
Improved performance of the serum miR-371a-3p test would allow for both early detection of recurrence and avoidance of unnecessary treatment. The detection of minimal residual disease (MRD) therefore carries great clinical significance in this context. As serum miR-371a-3p Cq is correlated with tumor burden, detection of MRD demands the greatest performance of this test. We therefore expanded a cohort of chemotherapy naïve patients receiving primary RPLND and compared the performance of the original and revised methodology.
Patient characteristics are summarized in Table 2. Thirty-two patients receiving primary RPLND were included in the present analysis. Most patients were clinical stage (CS) II (62.5%); 37.5% were CSI. At RPLND, nine patients (28.1%) had no viable tumor, 12 patients (37.5%) had pure seminoma, and 11 patients (34.4%) had non-seminomatous GCT. Pathologic stage (PS) was PS I in 28.1% and PS II in 71.9%.
The median Cq for the Control group was 40 under the original and revised methodology. Median Cq for the Viable GCT group shifted from 27.7 under the original methodology to 26.2 under the revised methodology. After applying the revised method, eight samples remained truly indeterminate, which were removed from further analysis, Fig. 3A-B. Three of these samples were in the Control group, all of which harbored pure teratoma. The remaining five indeterminate samples were in the Viable GCT group. The AUC was 0.898 (95% CI: 0.79-1.00) with the original method and 0.934 (95% CI: 0.84-1.00) with the revised method, Fig. 3C. Application of the revised methodology improved most other metrics, including specificity (0.80 to 0.92) and PPV (0.83 to 0.92), Fig. 3D and Supplementary Table 3.
Table 2
Patient characteristics of minimal residual disease dataset.
Age | Years | Median (IQR) | 28 (23.5–35.0) |
Race/Ethnicity | White Non-Hispanic | N (%) | 18 (56.3) |
Hispanic | 13 (40.6) |
| Other | | 1 (3.1) |
Body Mass Index | kg/m2 | Median (IQR) | 27.4 (23.7–29.8) |
Primary Tumor Size | cm | Median (IQR) | 3.3 (2.2-6.0) |
Primary Histopathology | Seminoma | N (%) | 9 (28.1) |
Non-Seminoma | 21 (65.6) |
Burnt-Out Primary | 2 (6.3) |
Primary Tumor Lymphovascular Invasion (LVI) | No | N (%) | 21 (65.6) |
Yes | 11 (34.4) |
Primary Tumor Rete Testis Invasion (RTI) | No | N (%) | 20 (62.5) |
Yes | 12 (37.5) |
pT Stage | pT0 | N (%) | 2 (6.3) |
pT1 | 14 (43.8) |
pT2 | 16 (50.0) |
cN Stage | cN0 | N (%) | 12 (37.5) |
cN1 | 15 (46.9) |
cN2 | 4 (12.5) |
cN3 | 1 (3.1) |
S Stage | S0 | N (%) | 23 (71.9) |
S1 | 9 (28.1) |
Clinical Stage (CS) | CS I | N (%) | 12 (37.5) |
CS II | 20 (62.5) |
RPLND Histopathology | Benign | N (%) | 9 (28.1) |
Seminoma | 12 (37.5) |
Non-Seminoma | 11 (34.4) |
pN Stage | pN0 | N (%) | 9 (28.1) |
pN1 | 11 (34.4) |
pN2 | 11 (34.4) |
pN3 | 1 (3.1) |
Pathologic Stage (PS) | PS I | N (%) | 9 (28.1) |
PS II | 23 (71.9) |