Introduction

Shockwave lithotripsy (SWL) has been proven to be an effective treatment for renal tract calculi1,2. Published SWL success rates range from 35% to 89%1,3,4. This variation is likely attributed to differences in clinical practice, and to inconsistent clinical outcome measures of success across the field. For stones less than 10 mm in size in the ureter, or non-lower pole positions of the kidney, SWL or ureteroscopy are the preferred treatment options. For renal stones between 10 mm and 20 mm, SWL or percutaneous nephrolithotomy or ureteroscopy is recommended except for lower pole renal stones where SWL is second line if there are unfavourable factors for SWL5,6. It was previously thought that ureteroscopy, as the alternative to SWL, had superior stone clearance rates to SWL with published success rates of between 85–95%7. However, recent prospective studies, which have used CT imaging to measure the rate of being completely stone free after ureteroscopy, have found a much lower success rate of 38–54%8. A recent evidence review conducted as part of the National Institute of Health and Care Excellence (NICE) guidelines on renal and ureteric stones concluded that there was only small benefit of ureteroscopy for stone removal over SWL for ureteric stones less than 10 mm; and that the clinical and cost effectiveness between SWL and ureteroscopy for renal stones less than 10 mm favoured SWL as the first line treatment choice9. Therefore, we need to improve the efficacy of these stone treatments. The ability to identify which stone cases will be unlikely to be successfully treated with SWL, is one method of increasing the efficacy of SWL.

Several studies have investigated factors that may be used in a predictive model for successful SWL4,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. Patient factors associated with decreased probability of SWL success include increasing patient age4,10,13,16, body mass index (BMI) or skin-to-stone distance (SSD)11,12,13,21, longer infundibular length12, ureteropelvic junction diameter13, and female gender11,13. Other unfavourable stone characteristics include higher Hounsfield unit (HU) density11,15,18,21, larger stone diameter and volume and greater number of stones4,12,13,15,20,21,23, greater stone heterogeneity15,22 and stone location in the kidney compared to the ureter4,17. Technical factors such as frequency of shock waves used, energy levels, accuracy of targeting the stone, focus size and patient breathing patterns will also affect SWL efficacy5.

Drawbacks of these previous studies include: the lack of consideration of all the potential factors together; the loss of the information from categorizing continuous variables; and the inaccurate and inconsistent measurement of stone parameters on imaging. To improve characterisation of stone heterogeneity as a predictor variable, we have used a software programme validated for measuring tumour heterogeneity26. This software performs CT textural analysis (CTTA) on every pixel within the area of the stone to give measurements of mean HU and total number of pixels, as well as statistical analysis of the nature of the distribution of the different HU values within the stone27.

We aimed to improve on the methodology of previous studies by including a comprehensive selection of both patient and imaging variables thought to influence SWL success; and by using multivariable analysis methods - other than logistic regression - to improve variable selection and reduce overfitting of the model.

Results

Patient characteristics

459 consecutive SWL cases (from 420 patients between 2011 and 2016 treated at a single centre) were included for analysis. Patient and stone characteristics are outlined in Table 1. BMI data was available in 176 cases (out of 459) and mean BMI was 28.1 ± 5.2 kg/m2. There was a weak correlation between BMI and SSD (Pearson r = 0.421). 41 cases (8.9%) had previous urological intervention including 22 cases of previous SWL in the same kidney, 9 cases of previous ureteroscopy or percutaneous nephrolithotomy (PCNL) for stones, and the rest consisted of previous non-stone interventions including pyeloplasty and ureteric reimplantation. There were also 8 cases (1.7%) of structurally abnormal kidneys including horseshoe kidney, dilated collecting system and solitary kidney.

Table 1 Demographics and clinical outcomes.

SWL treatment and follow-up

Based on the information provided in the SWL treatment report of the first session, 85.7% of cases received 4000 shocks, and 51.4% received the full anticipated energy level of 6 or more during SWL. There was no statistically significant difference in success rates between cases which received 4000 shocks or energy level of 6 or more, and cases which received less than 4000 shocks or an energy level of less than 6. 90.9% of cases reported satisfactory or good pain tolerance, with 9.1% reporting poor pain tolerance during SWL. Based on the radiographer’s assessment of extent of fragmentation seen at time of SWL treatment, 27% reported no fragmentation seen, 30% reported possible fragmentation and 43% of cases showed clear fragmentation. There was no statistically significant difference in stone free rates between different degrees of pain tolerance reported, or different extents of fragmentation seen at the time of SWL.

The median length of follow up from the last SWL session was 124 days (IQR = 27–420). Of the 46.4% of cases with an outcome of ‘Completely Stone Free’, 56.6% required just one SWL session, 28.8% required two sessions, 7.1% required three sessions and 7.5% required four or more sessions.

Univariate analyses

Table 2, and Fig. 1 and 2, summarize the univariate analyses for both the outcomes of ‘Completely Stone Free’ and ‘Stone Free with clinically insignificant residual fragments (CIRFs)’. Variables associated with a significantly (p < 0.05) lower SWL success rate based on an outcome of ‘Completely Stone Free’ were: male gender, increasing age, larger stone size (based on all three axis measurements or stone volume), two or more stones in the same location, a stone located in the kidney compared to the ureter, a stone not in the vesicoureteric junction location, higher ‘mean HU’, ‘mean of the positive pixels’ ‘entropy’ and ‘total number of pixels’. Increasing SSD, presence of a stent, a lower pole location (compared to upper or midpole locations), and stone laterality did not result in a significantly lower SWL success rate.

Table 2 Univariate analysis of the pre-treatment factors, number of SWL sessions and follow-up imaging modality.
Figure 1
figure 1

Forest plot of the log of the odds ratio and its 95% confidence interval for comparison of categorical predictor variables for the odds of having been ‘Completely Stone Free’ and ‘Stone Free with CIRFs’. The odds ratio refers to the first mentioned variable e.g. Female, and the second listed variable is the reference category e.g. Male. Therefore, Females have a slightly better odds of SWL success than Males. CIRFs, clinically insignificant residual fragments; CT KUB, computed tomography of the kidneys, ureter and bladder; PUJ, pelviureteric junction; US, ultrasound scan; VUJ, vesicoureteric junction.

Figure 2
figure 2

Volcano plot of the difference in median and the p-value of the Mann–Whitney U test for the comparison of continuous predictor variables between an SWL outcome of ‘Completely Stone Free’ and ‘Stone Free with CIRFs’. Variables labelled with * have had their median value divided by a factor of ten to allow representation of the ‘Difference in Median’ of all variables on the same scale axis. CIRFs, clinically insignificant residual fragments; HU, Hounsfield units; MPP, mean of the positive pixels; SSD, skin to stone distance.

The significant (p < 0.05) predictor variables on univariate analysis for the outcome of ‘Stone Free with CIRFs’ are the same as for the outcome of ‘Completely Stone Free’ except for the texture analysis variables of Standard Deviation of the HU (which was significantly higher for the unsuccessful cases) and Skewness (which was significantly lower in the unsuccessful cases).

Multivariable analyses

LASSO (least absolute shrinkage and selection operator) analysis was first performed without the inclusion of CTTA variables. Table 3 shows the variables chosen in this initial model which were: sex, age, number of stones in the same location, length of the major axis, length of the vertical axis, SSD and stone location in the vesicoureteric junction (VUJ). Of note here, SSD and vertical axis size contributed an effect in the opposite direction to that expected: higher variable values resulted in a greater likelihood of a stone free outcome. However, along with age, these three variables had a very small effect on the probability of the outcome with coefficients of <0.1. This model had an AUC (area under the curve) of 0.66 on ROC (receiver operator curve) analysis and a Hosmer-Lemeshow p value of <0.001 indicating the model is poorly calibrated. Confirmation using Random Forests approach yielded a similar AUC of 0.67.

Table 3 Results of multivariable analysis using the LASSO method for the outcome of ‘Completely Stone Free’ showing chosen predictor variables and their corresponding coefficients.

Table 4 summarizes the results of all the multivariable analyses and shows that, the ability of the LASSO or Random Forests models to correctly classify a case for both ‘Completely Stone Free’ and ‘Stone Free with CIRFs’ outcomes of SWL, was moderate, with AUCs of 0.64 to 0.67 produced from ROC analysis. Although this model was not reliable in predicting for successful SWL cases based on this ROC analysis, it could help identify those cases most likely to fail SWL treatment with a negative predictive value of 84.3% (when using a predicted probability cut-off of 0.29 and sensitivity and specificity threshold of 95.8% and 19.5% respectively).

Table 4 Comparison of the performance of predictive models for the outcome of ‘Completely Stone Free’ or ‘Stone Free with CIRFs’ using three methods of multivariable analyses; first without the inclusion of CTTA variables and then with the addition of CTTA variables to the models.

There was no additional benefit to the predictive ability of the LASSO model after CTTA variables were included (Table 4). This was confirmed using partial least squares (PLS) analysis, which showed a reduced quality assessment (Q2) statistic from 0.117 to 0.086, once CTTA variables were added to the model. LASSO analysis of a subset of the data of solitary stones only (n = 371) produced an AUC of 0.63 for the outcome of ‘Completely Stone Free’ and an AUC of 0.75 for the outcome of ‘Stone Free with CIRFs’, with no improvement on predictive ability after the addition of CTTA variables.

Discussion

This study aimed to evaluate the performance of a model to predict for the outcome of SWL treatment using both patient and stone related variables. The resulting model showed moderate predictive ability (AUC of 0.64–0.67) in terms of the ability to discriminate between a successful or unsuccessful outcome after SWL treatment. Predictive ability in terms of calibration was poor based on a significant Hosmer–Lemeshow test, meaning that the observed and expected outcomes in tested subgroups of our sample were not similar. Including variables calculated from CTTA did not increase the predictive ability of the model. There could be several reasons for this including bias in the data collection and classification of outcome, and not having selected the most influential factors for SWL efficacy as variables in the model. Our results suggest there is not enough current understanding of the important predictive factors for SWL efficacy to be able to produce a useful model to aid clinical decision making for which cases are most suitable for SWL treatment.

This is the first study that has examined a wide range of both patient and stone related variables using three different methods of multivariable analyses. Several studies have found the significance of single factors to predict for SWL success4,11,15,20,23. Of the studies which produced a predictive model using a combination of different factors thought to affect SWL success rate12,17,18,19,28, only three studies (which both used logistic regression) have presented ROC analysis on the predictive performance of these models10,16,21. Our AUC values (0.64–0.67) were lower than the AUC of 0.75 to 0.87 found using logistic regression models in these three previous studies10,16,21 - even though there were variables in agreement between our study and the above three studies, in terms of finding that age, SSD, stone size and number of stones were significant predictors in a multivariable model. This may be explained due to the problems of overfitting and the inclusion of related or co-correlated variables when using logistic regression. The LASSO method reduces this problem by shrinking the weight of each predictor, and the PLS method also has the advantage of not using univariate results for the pre-selection of variables into the multivariable analysis. These methods used in our study therefore reduces the overestimation of variable significance, and overall predictive ability. We also kept continuous data as continuous (to avoid loss of information through categorization which three other studies have done with some variables before statistical analysis)10,13,16,28.

Furthermore, unlike previous studies, we also performed several multivariable analyses. A comparison of the performance of predictive models for the outcome of ‘Completely Stone Free’ or ‘Stone Free with CIRFs’ using three methods of multivariable analyses in Table 4, shows that, neither the LASSO method nor the Random Forests method produced an AUC of more than 0.7. At this predictive level, the models produced in this study are unlikely to be useful in discriminating for cases most likely to succeed or fail SWL.

Use of CTTA on stone imaging

Recent interest in extracting more information from CT images of the stone have produced results suggesting the importance of stone heterogeneity, in addition to stone attenuation, as a predictive factor for successful SWL14,15,22,23,24,25,28. Stone attenuation is a well-researched predictive factor for SWL outcome. Stone attenuation, as measured in this study using the mean HU value, was a significant predictor for SWL outcome on univariate analyses but was not chosen to be included in the multivariable analyses, suggesting that size may have had more weight in our model. Previous studies have used a variety of methods to quantify stone heterogeneity from use of subjective visualization to statistical methods14,15,22,23,24,25,28. The CTTA method used in this study calculates variables which have been previously validated in tumor imaging26. This method also reduces measurement bias as there is no user-dependent variability in drawing the region of interest (ROI) or interpreting the results. Our study has the advantage of using objective measures of stone heterogeneity by using textural analysis of the distribution of all of the pixels in a cross-section of stone, rather than subjective observation of stone appearance on CT which can be difficult to differentiate14,23. One study found that they were unable to view the internal structure of stones on non-contrast CT to classify as hyper-or hypodense centre or homogeneous19. Rather than subjecting patients to higher resolution CT scans, textural analysis may provide a more practical way of assessing stone heterogeneity. However, in our study, the addition of textural analysis variables did not significantly improve the predictive ability of the multivariable model, although many textural features were significant on univariate analysis. It is likely that variables related to the size of the stone, including total number of pixels, and entropy which is correlated to stone size, are the most influential factors. However, as methods of CT image analysis develop, it is foreseeable that we will gain more information on stone characteristics, including architecture and composition to aid treatment.

Study limitations

Limitations of our study include: its retrospective nature and the measurement of some variables on the largest stone only if there was more than one stone. Retrospective collection of variable and outcome data may have led to bias and reduced the predictive ability of a multivariable model. However, this also allowed a more pragmatic approach, by including cases with more than one stone and repeated SWL sessions, our results may be more applicable to clinical practice. A previous study has found that the size and HU density of the largest stone was a better predictor than the mean where more than one stone was treated18. Our analysis of the solitary stones showed similar predictive ability and suggests that our measurements based on the largest stone have not biased the data.

In summary, analysis of clinical and stone imaging factors, including more novel variables of CTTA in this study has not produced a useful model for predicting the outcome of SWL. This study supports findings from previous studies on the importance of predictor factors relating to patient age and stone size, as well as contradicting some popular beliefs on the importance of skin-to-stone distance and the lower pole position. However, our results do not support previous study findings which suggest CTTA variables have additional predictive value above traditional factors related to stone size.

Methods

Data was analyzed from a single center in the UK in accordance with the relevant guidelines and regulations as set out by the approving body, the Health Research Authority, United Kingdom (ethics committee reference 16/HRA/6001). We used an existing anonymized database that was prospectively entered for all patients undergoing SWL, at a single site, available between 2011 and 2016. Individual patient consent was not required as no patient identifiable records were obtained in this study. Inclusion criteria were availability of a pre-treatment CT, and available follow up imaging. This identified 459 consecutive cases (from 420 patients). Each case consisted of a course of SWL treatment focused on one or more stones in a single location in the kidney or ureter. CT scans were performed on a multidetector row helical CT scanner (LightSpeed plus, General Electric Medical Systems, USA) and reconstructed at 1.25 mm slice thickness. SWL was performed using a Storz Modulith SLX F2 lithotripter (Storz Medical AG, Switzerland) by an experienced radiographer. The aim was to deliver 4000 shocks at 2 Hz for each treatment session at energy levels recommended by the manufacturer, and according to pain tolerance. At two weeks after the first two SWL treatments, stone clearance was reviewed using a plain x-ray (or US or CT kidneys ureters bladder if not visible on follow-up x-ray) by a urologist and radiologist to decide further management. If stone fragments were still visible, patients were offered either more SWL or other treatment options. All follow up information relevant to the stone being treated was collected from the date of the last SWL treatment to 2017.

Patient and imaging variables

Supplementary Information Table-A includes a list of the patient and imaging related variables included in the statistical analysis, with descriptions of the methods of measurement. The patient variables included were: patient age, body mass index, skin-to-stone distance (SSD), and, for the initial SWL treatment session, the pain tolerance, energy level reached in joules, number of shocks given and radiographer comment of whether fragmentation was seen during SWL. Measurements of stone size and SSD were taken using radiographic calipers on a workstation (Advantage Windows 4.0, GE Medical Systems). SSD was taken as the distance between the centre of the stone and the skin at 90° (or parallel to the line of the vertebral spinous process) using radiographic calipers (Fig. 3). This method was chosen to reflect the path of the SWL beam during treatment.

Figure 3
figure 3

Measurement method of skin-to-stone distance from the centre of the stone to the posterior skin, in line with the angle of the spinous process (yellow arrow).

The stone variables included were: major, minor and vertical axis length (measured on the image slice containing the largest cross-sectional diameter of the axis of interest, noting that this may not represent the ‘real’ axes of the stone if this does not coincide with the axial view), number of stones being treated in the same location, volume of stone(s) and stone location (categorized as upper pole, midpole, lower pole, renal pelvis, pelviureteric junction (PUJ), proximal ureter, midureter, distal ureter or VUJ). The volume of the stone was measured using the ellipsoid formula29, and if there was more than one stone being treated in the same SWL session, the volumes of all stones were summed.

To improve characterization of stone heterogeneity as a predictor variable, we have used a CT textural analysis (CTTA) software program (Stone Checker Software Limited, Radstock, UK). This software includes a filtration-histogram technique validated for measuring tumor heterogeneity27, which we have also previously tested on kidney stones22. For calculation of CTTA variables, a region of interest was automatically fitted to just inside the outline of the stone using the CT image slice with the largest cross-sectional diameter (Fig. 4). The following stone variables were measured using the software: mean HU, standard deviation of the HU, mean of the positive pixels (MPP) present, entropy, skewness, kurtosis, and the total number of pixels present. The variables of MPP, skewness and kurtosis were calculated through a process of analyzing histograms of the HU values of all the pixels.

Figure 4
figure 4

Example of a Region of Interest (ROI) automatically drawn (in blue) using CT textural analysis software (CTTA) to fit the largest cross-sectional area of the largest stone being treated by shockwave lithotripsy.

Outcome of treatment

Two definitions of successful outcome of SWL were used in this study: 1) ‘Completely Stone Free’ 2) ‘Stone free with clinically insignificant residual fragments (CIRFs)’ ≤ 4 mm. Outcome was determined by radiologist review of the follow-up imaging modality of either x-ray, US or CT KUB, and confirmed with patient records that no further treatment was required. For cases with only US or x-ray follow up imaging (which are known to overestimate stone free rate compared to CT follow-up), and had required more than 3 sessions of SWL treatment, were classified as failures of treatment. This aimed to reduce the measurement bias of having different imaging modalities in determining the outcome measure. The length of follow-up was measured from the last SWL session attendance.

Statistical analysis

Summary statistics of the continuous factor variables were presented as medians as the distributions could not be assumed to be normal, and therefore nonparametric Mann Whitney U test was performed for univariate analysis to test for difference between the groups of successful or unsuccessful outcome of SWL treatment. Categorical variables underwent the Chi-square test. Multivariable analysis using the pre-treatment variables was performed using the LASSO (Least absolute shrinkage and selection operator) regression analysis method30. This approach combines variable selection with shrinkage of the weights of each variable, which gives better prediction when using a large number of predictor variables. All predictor variables which were significant at p < 0.1 were considered in the LASSO analysis. This method does not produce p values but does produce the weighted coefficients of each variable in the model. The discrimination, and calibration, ability of the LASSO model was evaluated using ROC analysis and the Hosmer–Lemeshow goodness of fit test respectively.

In addition to the LASSO method, the Partial Least Squares and Random Forests approach was used to further ensure our conclusions were robust. Partial Least Squares finds an optimal weighted combination of predictor variables to discriminate between two groups. This weighted combination is based on all supplied variables, meaning there is no pre-selection of variables to enter into the model, unlike LASSO. It is designed for situations with a large number of variables and fewer observations. Random Forests method selects repeated bootstrap samples from the data set and creates a decision tree at each node where an optimum variable to split is selected. Each node uses only a subset of the variables. Predictions are obtained by consensus over bootstraps and applied to a holdout test sample. Our sample size was deemed adequate in this study by the statisticians to support the number of variables being investigated for multivariable analysis. All statistical analysis was performed by Exploristics Ltd. (Belfast, UK) using R (R Foundation for Statistical Computing, Vienna, Austria)31.

Ethical approval

Ethical approval was obtained from the Health Research Authority in the United Kingdom for this study protocol. Local hospital research and development approval was also obtained.