Published online Aug 19, 2021.
https://doi.org/10.3348/kjr.2021.0579
Mistakes to Avoid for Accurate and Transparent Reporting of Survival Analysis in Imaging Research
INTRODUCTION
A time-to-event analysis is an analysis of any dichotomous outcome (i.e., events vs. no events) occurring over time. Survival analysis is used practically as a synonym for time-to-event analysis although time-to-event analysis is not restricted to death (as the event) and survival. Survival analysis has been increasingly used in imaging research studies. Examples include studies evaluating the association between imaging findings/biomarkers and patient survival and image-based modelling studies to predict survival [1, 2, 3, 4]. We have noticed numerous such studies reporting the methods and results incompletely or unclearly, either among manuscripts submitted to the Korean Journal of Radiology or those published in other journals and other fields of medical research [5, 6, 7, 8, 9]. The purpose of this review is to list the relatively frequent mistakes in reporting survival analysis observed in research studies in the field of imaging research. This article focuses on the adequacy of description and clarity in reporting survival analysis. It does not intend to discuss more fundamental issues regarding the methodological appropriateness of survival analysis, such as non-informative censoring, proportional hazards assumption, time-dependent covariates and coefficients, immortal time bias, and competing risks [10, 11, 12, 13]. Researchers should confirm in the first place whether their analyses considered the fundamental methodological issues well. This article also does not cover reporting of the studies to evaluate the performance of survival prediction models, for which the methodologic guide can be found elsewhere [14].
The Basics
Survival analysis observes the development of events of interest (such as death) as follow-up time elapses, and the survival curve is a plot of the probability (%) of staying free of events until a certain follow-up time, referred to as survival probability or cumulative survival, on the y-axis against the follow-up time on the x-axis (Fig. 1). An alternative plot of ‘100% – survival probability’ referred to as cumulative incidence of events or incidence proportion [15], against the follow-up time can also be drawn. Patients may drop out of study observation before developing events, and they are referred to as censored patients. Although we do not know what happened to the censored patients after the censored time, we know that they were free of events until the time of censoring. Therefore, they still contribute useful information that should be included when analyzing survival. The Kaplan-Meier method is a popular method to create a survival curve considering the censored patients (Fig. 1). More explanations about how to construct a Kaplan-Meier survival curve can be found elsewhere [14]. The related statistical parameters and analytic methods commonly used for survival analysis are summarized in Figure 1.
Fig. 1
Example Kaplan-Meier survival curves and a graphic summary of the related statistical parameters and statistical methods commonly used for survival analysis.
Two Kaplan-Meier survival curves, one each for patients with (red) and without (green) the imaging biomarker, are shown. The Kaplan-Meier method recalculates the survival probability every time a new patient develops an event, which decreases as shown by a downward step in the curve. Downward blips represent the censored patients. When a patient is censored, the survival curve does not dip down. The mean survival time in the sense of the mean length of time a subject can be expected to survive cannot be calculated until the last patient has developed an event. The median survival time can be obtained if the survival probability has dropped to 50%. Therefore, the median survival can be obtained for patients who are the imaging biomarker + (red); however, not for patients who are the imaging biomarker − (green). To determine the median survival time, draw a horizontal line at 50% survival, see where it crosses the curve, and look down at the x-axis to read off the time. The median survival time is 622 days for the group with the imaging biomarker (red). The survival of the two groups can be compared in several different ways. The log-rank test and the Cox proportional hazards regression are commonly used to compare the survival curves as a whole across the entire follow-up time. Patients without the imaging biomarker (green) shows significantly better survival according to both methods (p = 0.007 and p = 0.011, respectively). The Cox proportional hazards regression calculates HR. Hazard has the meaning of the slope of a survival curve, which is the rate of developing events in a time period, and the HR (i.e., the ratio of hazards of two survival curves) estimated by the Cox regression is essentially a relative risk. The HR of 2.98 indicates that the risk of death is 2.98 times greater in the patients with the imaging biomarker (red) compared to those without the imaging biomarker (green, the reference category). If one wants to compare the survival probability at a specific follow-up time, for example, at 1 year (90.4% vs. 73.1%), the z-test is commonly used (p = 0.075). HR = hazard ratio
Common Mistakes
Mistake 1: Unclear Definition of Events
A sound survival analysis starts with a clear definition of events. The definitions are well-known for some circumstances, such as the analysis of overall survival in cancer patients, for which the events are death from any cause [16]. However, the definitions may often vary according to research questions, clinical settings, or cancer types [17, 18, 19, 20, 21, 22]. For example, even if disease-free survival in oncologic survival analysis considers disease recurrence or death from any cause as events, the exact definition of disease recurrence may vary across studies. Therefore, providing a clear description of the definition of events accompanied by references when available is helpful [23, 24, 25, 26, 27, 28, 29, 30, 31, 32]. Some examples are shown below.
• “the earliest signs of HCC progression (LTP, intrahepatic distant recurrence, gross vascular invasion, or extrahepatic distant metastasis) as determined by CT or MR imaging using the modified RECIST criteria, or death from any cause (22, 23)”[23]
• “major adverse cardiovascular event (MACE) defined as cardiac death, acute myocardial infarction (AMI), CAD requiring coronary revascularization, or stroke/transient ischemic attack (TIA)” [25]
Mistake 2. Reporting a Comparison between Patients with and without Events to Explore Factors associated with Survival
This approach [29, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42] may sound reasonable at a glance but is generally invalid. This analysis is logical only when the follow-up time is fixed and specified for the grouping of patients (e.g., patients developing events by 6 months after treatment vs. patients free of events until 6 months) and all patients have completely been followed until the specified time point (e.g., all have been followed until 6 months without dropouts unless they had events) [36, 37, 38, 43]. Complete follow-up is difficult to achieve in clinical research, especially in retrospective studies or studies that involve long periods of follow-up. Patients who dropped out before the specified time cannot be categorized into either events or no events as there is no way to know if they would have developed events if they had been followed further. Some investigators then exclude dropouts for the analysis [39]; however, such exclusion is inappropriate and may cause selection biases. Even if the two conditions are met, other issues remain, such as 1) whether events do not occur after the specified follow-up time, 2) if events can still occur after the specified time, is it okay to ignore them by categorizing them into no events group, and 3) why does the specific time matter instead of other follow-up time points. Unless there are explicit sound explanations for these questions, the analysis will be unsatisfactory.
Accordingly, this analytic approach is more reasonable if a study is looking for events that occur within relatively short periods. For example, one study [36] divided patients with glioblastoma multiforme into those who had early progression after treatment (i.e., progression before 6 months) and those who did not because the study had a specific purpose of finding factors associated with the early progression after the treatment. All study patients could be followed completely without dropouts as the follow-up period was relatively short.
For the same reason, case-control design, i.e., separate collection of patients who had events and those who did not, is generally inappropriate for survival analysis. As an exception, case-control design combined with some other specialized methodological features may be used for a huge epidemiological study to determine factors associated with the occurrence of rare events [44, 45, 46].
Mistake 3. Inappropriately Reporting the Mean Survival Time
The mean survival time, reported in some studies [39, 47], can be misleading. The mean survival time in the sense of the mean length of time a subject can be expected to survive cannot be calculated until the survival time for every patient is known as every patient has died. We simply do not know the survival time of a patient who is not dead yet. Clinical studies where all study patients had events are rare and, therefore, the mean survival time is generally not obtainable. The “mean” survival times reported in clinical research studies are typically the area under the survival curve between time zero to the finish of study observation that statistical software programs calculate [48] or, maybe, merely the mean of the follow-up times, both of which should not be mistaken for the mean length of time a subject can be expected to survive. The authors should first consider if such “mean” value is truly informative in the study or redundant only creating confusion. Generally, it is more appropriate to present the median survival time as a statistic that represents the survival lengths of the study patients. The median survival time is the length of time that half of the patients have developed events (Fig. 1). If fewer than half the subjects have developed events by the end of the study, the median survival cannot be determined, either.
Mistake 4. Not Clarifying the Unit Amount When Reporting Hazard Ratio for a Continuous Variable
With a continuous variable, the hazard ratio (HR) indicates the change in the risk of events if the parameter rises by one-unit amount. Therefore, it is important to state in the report what was considered one-unit amount. For example, one study reported an HR of 1.34 for systolic right ventricular mass index measured on cardiac MRI for the development of major adverse cardiac and cerebrovascular events [33]. The systolic right ventricular mass index was a continuous variable measured in g/m2. The study specifically describes that the HR of 1.34 is per increase of 5 g/m2. Without the xplanation, one might inadvertently misinterpret it as an HR of 1.34 for a 1 g/m2 increase in the index value, which would erroneously make the HR for a 5 g/m2 increase 4.32 (= 1.345).
Mistake 5. Making Imprecise Reference to the p Values from the Log-Rank Test and the Cox Regression
Some studies cite p values from the log-rank test or the Cox proportional hazards regression alongside when contrasting survival probabilities at a particular follow-up time or the median survival times between groups [23, 47, 49, 50, 51, 52, 53]. Some examples are shown below:
• “The 5-year OS rate was 100% (no event) for mrTRG 1, 92.7% for mrTRG 2, 89.6% for mrTRG 3, 80.1% for mrTRG 4, and 40.0% for mrTRG 5 (p = 0.024 by Cox proportional hazards regression)” [47]
• “The 2-year LTP-free survival rates of patients in the DSM-RFA and SSM-RFA groups were 90.0% and 94.4%, respectively (p = 0.331 by log-rank test), and the 2-year recurrence-free survival rates were 54.9% and 75.7%, respectively (p = 0.265 by log-rank test)” [49]
• “The median overall survival time in the validation set were 137.5 months, 76.1 months, and 44.0 months for low-, intermediate-, and high-risk groups, respectively (p < 0.001 by log-rank test)” [23]
Caution is needed in the reporting to prevent it from being interpreted as if the statistical analyses specifically refer to the comparison of survival probabilities at a particular time or the comparison of the median survival times because the log-rank test and the Cox proportional hazards regression compare the survival curves as a whole for the entire follow-up time (Fig. 1).
If the investigators want to specifically compare the survival probability at a specific time, the z-test is commonly used (Fig. 1) [54]. Methods to specifically compare median survival times have also been proposed [55], if the comparison is particularly needed for reasons such as crossing survival curves. However, such statistical testing is rarely used in clinical research studies. Instead, presenting the median survival with its 95% confidence interval would be clear enough as shown below.
• “The multiple Cox's proportional hazard analysis showed that the location of distal end of biliary stent was the only independent predictor of biliary stent patency (hazard ratio, 3.771; 95% CI, 1.157–12.283). The median biliary stent patency rate was significantly longer in patients in whom the distal end of biliary stent was beyond the distal end of the duodenal stent (median, 327 days; 95% CI, 249–405 days), compared with cases in which the distal end of the biliary stent was within the duodenal stent (median, 170 days; 95% CI, 115–225 days)” [56]
Mistake 6: Multivariable Cox Regression Followed by Univariable Log-Rank Test
The Cox proportional hazards regression applies regression methodology to the analysis of survival data. It has an advantage over the log-rank test, which is a univariable analysis, that it can compare the survival between groups after adjusting for other variables, i.e., multivariable analysis. The multivariable Cox regression analysis is typically used to further interrogate the variables that are identified as significant at univariable analyses which can be the univariable Cox regression or the log-rank test [57]. Therefore, the results from the multivariable Cox regression are considered more conclusive than the results from the univariable analysis. The HR from multivariable Cox regression is referred to as adjusted HR to distinguish it from unadjusted (or crude) HR from the univariable analysis. Some investigators perform a multivariable Cox regression to identify a factor associated with survival. They, then report crude Kaplan-Meier survival curves segregated by the factor identified and additionally compare them using the log-rank test [29, 35, 58]. This reporting may deliver an incorrect message as if the crude Kaplan-Meier curves and the log-rank test provide more ultimate results. If one wants to show the Kaplan-Meier curves regarding a risk factor identified by multivariable Cox regression, adjusted Kaplan-Meier curves can be presented accompanied by adjusted HR [59, 60, 61].
CONCLUSION
Paying attention to avoid the mistakes listed above would help make the research report more accurate and transparent. Referring to published papers that report survival analysis relatively adequately [26, 56, 62, 63, 64, 65, 66, 67] would also be helpful.
Conflicts of Interest:The authors have no potential conflicts of interest to disclose.
Author Contributions:
Conceptualization: all authors.
Writing—original draft: Seong Ho Park.
Writing—review & editing: Kyunghwa Han, Seo Young Park.
References
-
Therneau T, Crowson C, Atkinson E. Using time dependent covariates and time dependent coefficients in the cox model. [Published April 25, 2021]. [Accessed July 15, 2021].Cran.r-project.org Web site. https://cran.r-
project.org/web/packages/survival/vignettes/timedep.pdf.
-
-
Lee H, Nunan D. Immortal time bias. [Accessed July 15, 2021].Catalogofbias.org Web site. https://catalogofbias.org/biases/immortal-
time- bias/.
-
-
CDC. Lesson 3, Section 2: morbidity frequency measures. [Accessed July 15, 2021].CDC. gov Web site. https://www.cdc.gov/csels/dsepd/ss1978/lesson3/section2.html.
-
-
FDA. Clinical trial endpoints for the approval of cancer drugs and biologics: guidance for industry. [Published December 2018]. [Accessed July 15, 2021].FDA.gov Web site. https://www.fda.gov/media/71195/download.
-
-
Bellera CA, Penel N, Ouali M, Bonvalot S, Casali PG, Nielsen OS, et al. Guidelines for time-to-event end point definitions in sarcomas and gastrointestinal stromal tumors (GIST) trials: results of the DATECAN initiative (Definition for the Assessment of Time-to-event Endpoints in CANcer trials). Ann Oncol 2015;26:865–872.
-
-
Gourgou-Bourgade S, Cameron D, Poortmans P, Asselain B, Azria D, Cardoso F, et al. Guidelines for time-to-event end point definitions in breast cancer trials: results of the DATECAN initiative (Definition for the Assessment of Time-to-event Endpoints in CANcer trials). Ann Oncol 2015;26:2505–2506.
-
-
Gourgou-Bourgade S, Cameron D, Poortmans P, Asselain B, Azria D, Cardoso F, et al. Guidelines for time-to-event end point definitions in breast cancer trials: results of the DATECAN initiative (Definition for the Assessment of Time-to-event Endpoints in CANcer trials)†. Ann Oncol 2015;26:873–879.
-
-
Bonnetain F, Bonsing B, Conroy T, Dousseau A, Glimelius B, Haustermans K, et al. Guidelines for time-to-event end-point definitions in trials for pancreatic cancer. Results of the DATECAN initiative (Definition for the Assessment of Time-to-event End-points in CANcer trials). Eur J Cancer 2014;50:2983–2993.
-
-
Yoon SH, Kim E, Jeon Y, Yi SY, Bae HJ, Jang IK, et al. Prognostic value of coronary CT angiography for predicting poor cardiac outcome in stroke patients without known cardiac disease or chest pain: the assessment of coronary artery disease in stroke patients study. Korean J Radiol 2020;21:1055–1064.
-
-
Kang Y, Hong EK, Rhim JH, Yoo RE, Kang KM, Yun TJ, et al. Prognostic value of dynamic contrast-enhanced MRI-derived pharmacokinetic variables in glioblastoma patients: analysis of contrast-enhancing lesions and non-enhancing T2 high-signal intensity lesions. Korean J Radiol 2020;21:707–716.
-
-
Jo SW, Choi SH, Lee EJ, Yoo RE, Kang KM, Yun TJ, et al. Prognostic prediction based on dynamic contrast-enhanced MRI and dynamic susceptibility contrast-enhanced MRI parameters from non-enhancing, T2-high-signal-intensity lesions in patients with glioblastoma. Korean J Radiol 2021;22:1369–1378.
-
-
Kim SH, Song BI, Kim HW, Won KS, Son YG, Ryu SW. Prognostic value of restaging F-18 fluorodeoxyglucose positron emission tomography/computed tomography to predict 3-year post-recurrence survival in patients with recurrent gastric cancer after curative resection. Korean J Radiol 2020;21:829–837.
-
-
Kim EK, Lee GY, Jang SY, Chang SA, Kim SM, Park SJ, et al. The extent of late gadolinium enhancement can predict adverse cardiac outcomes in patients with non-ischemic cardiomyopathy with reduced left ventricular ejection fraction: a prospective observational study. Korean J Radiol 2021;22:324–333.
-
-
BMJ. 12. Survival analysis. [Accessed July 15, 2021].bmj.com Web site. https://www.bmj.com/about-
bmj/resources- readers/publications/statistics- square- one/12- survival- analysis.
-
-
Schaubel DE, Zhang H, Kalbfleisch JD, Shu X. Semiparametric methods for survival analysis of case-control data subject to dependent censoring. Can J Stat 2014;42:365–383.
-
-
Klein JP, Moeschberger ML. In: Survival analysis: techniques for censored and truncated data. 2nd ed. New York: Springer; 2003.
-
-
Choi JW, Lee JM, Lee DH, Yoon JH, Kim YJ, Lee JH, et al. Radiofrequency ablation using a separable clustered electrode for the treatment of hepatocellular carcinomas: a randomized controlled trial of a dual-switching monopolar mode versus a single-switching monopolar mode. Korean J Radiol 2021;22:179–188.
-