Introduction

The biceps brachii muscle has a proven function in forearm supination and elbow flexion [1]. The separate role of the long head of the biceps tendon (LHBT) is still debated. Cadaver studies [26] suggest that the LHBT plays an essential role in the stability of the glenohumeral joint, while the results of in vivo studies are controversial [79].

The pathology of the LHBT includes inflammation, partial or complete rupture (including SLAP lesions (superior labrum anterior and posterior)), and instability [1], which can lead to anterior shoulder pain or diminished function [10]. These lesions are often associated with other shoulder pathology, such as rotator cuff (RC) tears [1115].

In patients undergoing RC repair, the incidence of LHBT pathology shows great heterogeneity throughout the different studies: 36.1–82% [13, 14].

Besides conservative therapy, surgery plays an important role in the treatment. The most used methods are tenotomy and tenodesis; however, there is more than one surgical approach in both groups. Tenotomy is the more straightforward method, where the tendon is released from the supraglenoid tubercle [16]. This can be performed with or without creating a funnel-shaped proximal stump [17] or releasing the LHBT with a portion of the superior labrum [18]. Tenodesis can be performed arthroscopically or through an open approach, and the tendon may be fixed to multiple anatomical locations, such as soft tissue or bone. The site can also be suprapectoral or subpectoral [19]; the fixation may involve suturing to tendons, interference screw, bone tunnels, keyholes, suture anchors, and suture buttons [10, 20, 21].

Some studies have results supporting the beneficial nature of tenodesis [2227], while others suggest that there is no relevant difference in functional outcomes when comparing tenotomy to tenodesis [17, 2833].

The previous meta-analyses either did not reach a firm conclusion [34] or included cohort studies [3541].

Due to the controversial results of clinical trials and limitations of previous meta-analyses, we aimed to provide the most comprehensive analysis to date comparing tenodesis to tenotomy in managing LHBT pathologies.

Methods

We used the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement [42] to report our research.

Protocol

We registered our research protocol on PROSPERO in advance under the registration number CRD42021244613. There were no protocol deviations.

Search strategy, inclusion, and exclusion criteria

While stating our clinical question, we used the PICOTS framework. P (population) were the patients who have undergone LHBT operations, I (intervention) was tenotomy, our C (comparison) was tenodesis, and our outcomes were the following: pain on the ten-point Visual Analog Scale (VAS), bicipital cramping pain events, bicipital groove pain events, Constant score (range: 0–100), American Shoulder and Elbow Surgeons (ASES) score (range: 0–100), Simple Shoulder Test (SST) score (range: 0–12), operative time in minutes, elbow flexion strength, forearm supination strength, and Popeye deformity events. Regarding T (timing), we statistically analysed every outcome when at least three studies reported them at the same time point. If an outcome did not qualify for quantitative synthesis, we included it only in the systematic review section. The S (study type) was randomized controlled trials (RCTs).

On 28 November 2020, we conducted a systematic search using the databases of MEDLINE (via PubMed), Embase, Cochrane Central Register of Controlled Trials (CENTRAL), Web of Science, and Scopus, using the following search key: “bicep* AND teno*”. We used the “all fields” option (or the equivalent of it) in the first four databases, while in Scopus we used the “Article title, Abstract, Keywords” search field. We applied no filters in any of the databases.

Our inclusion criteria were the following: RCTs, comparing tenotomy and tenodesis and reporting on the outcomes of interest.

Our exclusion criteria were the following: review, meta-analysis, cohort study, case report, surgical technique description, studies comparing different submodalities (for example, different tenodesis techniques), distal biceps tear, biomechanical study, cadaver study, and animal study.

Selection and data extraction

We used EndNote X9 (Clarivate Analytics, Philadelphia, PA, USA) for the selection process. After removing the duplicates, two independent review authors (M.V., S.L.) performed the selection, first by title, then abstract, and finally by full text. Following every step of the selection, Cohen’s kappa was calculated to assess the agreement between the two investigators with the following parameters: 0.00–0.20 no agreement, 0.21–0.39 minimal agreement, 0.40–0.59 weak agreement, 0.60–0.79 moderate agreement, 0.80–0.90 strong agreement, and above 0.90 almost perfect agreement [43]. We screened the references of the eligible records for possible additional articles to include in the meta-analysis. The same two review authors conducted data extraction using a pre-specified Excel sheet (Office 2016, Microsoft, Redmond, WA, USA). We gathered data from the articles about the first author, year of publication, country, study design, demographic data, indication of the surgery, surgical methods, and outcomes that we presented. If the strength measurement results were reported in Newton (N), we converted them to kilogram (kg) using an online calculator (calculator-converter.com). If the studies did not report the Strength Index (SI) but did report the strength measurement result of both sides, we calculated SI from them.

Two independent review authors (M.V., L.S.) resolved the disagreements by consensus regarding both the selection and the data extraction process.

Statistical analysis

For dichotomous outcomes, odds ratios (ORs) with their 95% confidence intervals (CI) were calculated from the original raw data of the articles. We decided to use continuity correction [44] in case of the number of reported bicipital cramping pain events, final data outcome as we observed zero events in some studies. For continuous outcomes, weighted mean differences (WMDs) with 95% CI were calculated from the original raw data of the articles except in some cases where standard deviations (SDs) and means were calculated from the minimum, median, maximum, and sample size according to Wan’s method [45]. The random effect model by DerSimonian and Laird [46] was applied in all cases, with the estimate of heterogeneity. Following the Cochrane Handbook, the I2 values were considered moderate heterogeneity between 30 and 50%, substantial heterogeneity between 50 and 75%, and considerable heterogeneity higher than 75%. We used forest plots to display the results graphically. When it was statistically possible, we performed a trial sequential analysis (TSA) [47] to confirm the statistical reliability of the data with the calculation of the required information size by adjusting the significance level for sparse data.

We statistically analysed and compared every outcome when at least three studies reported them at the same time point. To provide a clear picture of the available data, we present the individual results of all included studies, comparing the two surgical methods in the systematic review section.

All data management and statistical analysis were performed with Stata (version 16.0, StataCorp) and TSA (trial sequential analysis tool from Copenhagen Trial Unit, Centre for Clinical Intervention Research, Denmark).

Risk of bias assessment and quality of evidence

We performed the risk of bias assessment for every examined outcome according to the Cochrane recommendation using the RoB 2: A revised Cochrane risk of bias tool for randomized trials [48].

To assess the certainty of the evidence, we used the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system [49] and classified our results into four levels: high, moderate, low, and very low certainty of evidence.

Two independent review authors (M.V. and L.S.) performed the risk of bias and certainty of evidence assessments. The disagreements were resolved by consensus.

Results

Search and selection

The summary of our selection process, including the Cohen’s kappa for each step, is shown in Fig. 1. We identified 5450 records in the five databases. After completing the selection process, we were left with nine eligible full-text articles in the meta-analysis [5058] and eleven studies in the systematic review section [5060].

Fig. 1
figure 1

A Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) flow chart representing the search and selection process

Characteristics of the studies included

We summarized the basic characteristics of the included studies (shown in Table 1). All the included studies were RCTs, and ten of them compared tenotomy to tenodesis [5055, 5760]. We included nine studies and 572 participants in the meta-analysis, 293 in the tenotomy group and 279 in the tenodesis group. Two studies ([59, 60]) did not have outcomes with a comparable matching time point; therefore, we were only able to include these in the systematic review section.

Table 1 Characteristics of the included studies

All studies included patients with LHBT pathology; nine of the eleven studies [5052, 5457, 59, 60] also included patients with concomitant rotator cuff tear, while two [53, 58] excluded them.

Tenotomy was performed arthroscopically in all studies. Tenodesis was also performed arthroscopically, except in the case of 31.5% of patients (17 out of 54) in the study of MacDonald et al. [54], where surgeons used an open subpectoral approach.

The follow-up times were different in the studies, mostly between 12 and 24 months, with some variation. The evaluation times of several outcomes were also different.

Meta-analysis results

Post-operative function

The analysis of elbow flexion strength in kg at the 6-month follow-up showed no statistically significant difference [51, 53, 54] (WMD, 2.82; 95% CI, − 1.79–7.22; p = 0.237; I2 = 71.7%; low grade of evidence) (Supplementary Fig. 1). When comparing the results, the 12-month elbow flexion scores in kg showed statistically significant difference in favour of tenodesis [5254] (WMD, 3.67; 95% CI, 1.07–6.27; p = 0.06; I2 = 36.6%; moderate grade of evidence) (Fig. 2). Analysis of the 12-month forearm supination strengths also resulted in statistically significant difference [5254] (WMD, 0.36; 95% CI, 0.08–0.64; p = 0.012; I2 = 7.2%; low grade of evidence) (Fig. 2).

Fig. 2
figure 2

A forest plot that compares the results of elbow flexion strength measurements in kg in tenotomy and tenodesis at the 12-month follow-up and the results of the 12-month forearm supination strength levels of tenotomy and tenodesis. The black diamonds represent the effect of individual studies, and the vertical lines show the corresponding 95% confidence intervals (CI). The size of the grey squares reflects the weight of a particular study. The blue diamond reflects the overall or summary effect. The outer edges of the diamonds represent the CIs

We were able to analyse the Constant score in three studies at the six month follow-up [51, 53, 55] (WMD, 0.78; 95% CI, − 2.44–4.00; p = 0.634; I2 = 27.7%; moderate grade of evidence) (Supplementary Fig. 2) and three studies at the 12-month follow-up time [52, 53, 55] (WMD, 2.26; 95% CI, − 1.12–5.65; p = 0.190; I2 = 59.1%; low grade of evidence) (Supplementary Fig. 3). Neither result showed a statistically significant difference between the two groups. The study of Lee et al. [60] also reported the six month and 12-month Constant scores, but it was not possible to analyse these outcomes due to a lack of data.

Post-operative pain

Three studies reported three month pain scores on the ten-point VAS [50, 54, 58] (WMD, 0.99; 95% CI, 0.51–1.48; p < 0.001; I2 = 0.0%; high grade of evidence) (Fig. 3). The difference was significant in favour of tenotomy, therefore, leading to the conclusion that there is earlier pain relief with tenotomy than with tenodesis. Four studies reported the 6-month [51, 54, 55, 58] (WMD, 0.05; 95% CI, − 0.21–0.30; p = 0.724; I2 = 0.0%; moderate grade of evidence) (Supplementary Fig. 4), 12-month [52, 54, 55, 58] (WMD, 0.19; 95% CI, − 0.26–0.63; p = 0.411; I2 = 80.1%; very low grade of evidence) (Supplementary Fig. 5), and 24-month [50, 51, 54, 55] (WMD, 0.01; 95% CI, − 0.04–0.07; p = 0.637; I2 = 0.0%; moderate grade of evidence) (Supplementary Fig. 6) pain scores on VAS (different studies reported it at different time points), and we found no significant difference at these time points. The study of Lee et al. [60] also reported the three month, six month, and 12-month level of pain, but it was not possible to analyse these outcomes due to lack of data.

Fig. 3
figure 3

A forest plot that compares the level of postoperative pain on the Visual Analog Scale (VAS) in tenotomy and tenodesis, measured three months post-operatively. The black diamonds represent the effect of individual studies, and the vertical lines show the corresponding 95% confidence intervals (CI). The size of the grey squares reflects the weight of a particular study. The blue diamond reflects the overall or summary effect. The outer edges of the diamonds represent the CIs

The analysis of bicipital cramping pain events showed no significant difference at 6 months [51, 53, 56] (OR, 0.92; 95% CI, 0.09–9.07; p = 0.943; I2 = 47.8%; moderate grade of evidence) (Supplementary Fig. 7).

Popeye deformity

Three studies [51, 54, 55] reported the occurrence of Popeye deformity at the 24-month check-up. The difference between tenotomy and tenodesis was significant in this outcome in favour of tenodesis (OR, 0.19; 95% CI, 0.08–0.41; p < 0.001; I2 = 0.0%; moderate grade of evidence) (Fig. 4).

Fig. 4
figure 4

A forest plot that compares the occurrence of Popeye deformity in tenotomy and tenodesis, measured 24 months post-operatively. The black diamonds represent the effect of individual studies, and the vertical lines show the corresponding 95% confidence intervals (CI). The size of the grey squares reflects the weight of a particular study. The blue diamond reflects the overall or summary effect. The outer edges of the diamonds represent the CIs

Operative time

When comparing the operative time (measured in minutes) of tenotomy and tenodesis, we found no statistically significant difference [54, 57, 58] (WMD, 17.15; 95% CI, − 2.05–36.35; p = 0.080; I2 = 97.5%; very low grade of evidence) (Supplementary Fig. 8).

TSA (trial sequential analysis)

The results of our TSA are depicted in Supplementary Figs. 916. Due to lack of data, TSA was not possible for the following outcomes: 6-six month Constant scores, six month VAS pain scores, 24-month VAS pain scores, and bicipital cramping pain events at six months post-operatively.

Systematic review results

Eight studies reported the elbow flexion strength levels [5154, 56, 57, 59, 60], six studies reported the forearm supination strength levels [5254, 56, 57, 60], seven studies reported the Constant score [5153, 55, 57, 59, 60], five papers included the ASES score [50, 53, 54, 56, 60], and three studies reported the SST scores [53, 55, 59]. Nine studies reported pain levels [5052, 5458, 60], six studies reported the number of bicipital cramping pain events [5153, 5557], and three studies reported the number of bicipital groove pain events [50, 52, 56]. All studies reported the Popeye deformity outcome [5060]. The article of Lee et al. [60] reported the 3-month, 6-month, 12-month, and final data of the Constant score, ASES score, and level of pain, but it was not possible to analyse these outcomes due to lack of data.

The summary of calculated odds ratios and weighted mean differences for the outcomes that were not eligible for the meta-analysis are shown in Table 2.

Table 2 Systematic review: comparing the final data in the individual articles

Risk of bias assessment and quality of evidence

A summary of the risk of bias assessment is shown in Supplementary Figs. 1738. The Popeye deformity was the only outcome that all studies reported. In this analysis we found four studies with high risk of bias [55, 5860], six studies carried “some concerns” [5053, 56, 57], while one study resulted in low risk of bias [54]. Lower grades were mostly due to the unclear randomization process, the lack of blinding, and the missing trial protocols.

The results of the GRADE analysis are shown for every outcome in the results section. A detailed description of the quality of evidence is found in Supplementary Table 1.

Discussion

The earlier meta-analyses also included non-randomized trials [3541] with the exception of Ahmed et al. [34]; hence, their results must be regarded with caution.

Biceps brachii has an essential role in elbow flexion strength. For this reason, we decided to choose this as one of the primary outcome parameters. Even though our analysis did not significantly differ at the 6-month follow-up, at 12 months, the elbow flexion strength was significantly better in the tenodesis group. To our knowledge, this result is a novelty compared to the results of previous meta-analyses that examined this particular outcome [34, 3740]. Nevertheless, our TSA indicates that further RCTs are needed in the case of the six month results. Even though the required sample size was reached for the 12-month results, potential spurious significance was present; thus, this should be considered inconclusive according to the TSA result. If we consider the results of the individual studies included in the systematic review, we are left with mixed results, but due to the differences in time points, we could not perform more statistical comparisons.

Another major role of the biceps brachii is forearm supination. Our results showed a statistically significant difference between the 12-month supination strength results in favour of tenodesis, contradicting the literature so far [34, 3740]. According to our trial sequential analysis, further clinical trials are needed to reach a more certain result. Examining the final data from the individual studies, we discovered a tendency in favour of tenodesis.

The Constant score is a widely accepted scoring system used to evaluate post-operative function after shoulder operations. However, it is not specific to biceps function but was designed to assess the overall functional state of the shoulder [62]. Although we found no significant difference between the Constant scores (6 months, 12 months post-operatively), if we add the systematic review results, there is a trend suggesting that tenodesis might lead to better post-operative scores than tenotomy. This result is in accordance with the previous meta-analyses, where they either found statistically significant difference without reaching the minimal clinically important difference [63] (MCID) [3436, 3841] or did not find any significant differences when comparing the two methods [37].

From the patient’s perspective, post-operative pain might be the strongest quality measure after surgery. We could analyse the degree of pain as the VAS indicated at three, six, 12, and 24 months after surgery. The difference was significant only at the three month follow-up in favour of tenotomy. The TSA for this outcome showed that no further studies are needed to confirm the result. Thus we can conclude that patients experience less pain three months after tenotomy than those who underwent tenodesis. Despite this, we found no significant differences between the two methods in the long term. Out of the meta-analyses that examined pain on VAS [34, 3840], only Ahmed et al. [34] evaluated more time points (6, 12, 24 months), but they did not find significant differences between tenotomy and tenodesis. The systematic review results did not suggest any strong tendency toward the preference of tenotomy or tenodesis.

According to some previous articles, one of the drawbacks of tenotomy is that it leads to a higher incidence of cramping pain events [35, 37]. The results of our analysis at the six month follow-up do not support this assumption and are in accord with those analyses which found no difference between tenotomy and tenodesis [34, 36, 3841]. The results remained the same after we evaluated the data of the systematic review.

In a recent study on 1723 patients, tenotomy was associated with a higher incidence of Popeye deformity than tenodesis [23]. Our results confirmed this data: we also found a significant difference between the two groups in favour of tenodesis, in accordance with earlier meta-analyses [3441]. The TSA showed that no further clinical trials are needed to confirm this result.

Surgical times can vary greatly for various reasons, including concomitant procedures such as rotator cuff repair and the surgical team’s experience. According to a recent systematic review and meta-analysis, shorter operative time is one of the advantages of tenotomy [35]. Surprisingly, even though all of the included RCTs that examined this outcome [54, 57, 58] found that tenodesis requires more time to perform, the result of our analysis showed no statistically significant difference between tenotomy and tenodesis in this regard. Considering the results established in the literature and the conflicting result of our TSA, no conclusion can be drawn on this topic at present.

Strengths and limitations

This meta-analysis from nine studies has considerable strengths. Unlike previous analyses, a strict methodology was applied with outcomes assessed only at the same time points. Since we only included randomized controlled trials, this analysis portrays the highest level of achievable evidence on this topic. Trial sequential analyses were performed to assess whether further clinical trials are needed. It was deemed conclusive regarding three month pain levels on the VAS and Popeye deformity at the 24-month follow-up outcomes.

Our meta-analysis had some limitations, including the small sample size that influenced some of the TSA results. In addition, the indication for treatment differed among the included trials, and there was heterogeneity among the studies regarding intervention submodalities and rehabilitation protocols. In some cases, standard deviations (SDs) and means were calculated from the minimum, median, maximum, and sample size. TSA was not conclusive in the following outcomes: six month elbow flexion strength in kg, 12-month elbow flexion strength in kg, 12-month forearm supination strength in kg, 12-month Constant score, 12-month pain levels on the Visual Analog Scale, and operative time in minutes.

We suggest conducting further randomized controlled trials focusing on elbow flexion strength, forearm supination strength, pain, and operative time, as these were deemed inconclusive based on our TSA. When designing an RCT, exact time points regarding the assessment of outcomes are required. The importance of biceps function-specific outcomes such as flexion and supination strength should be highlighted and should be focused on by further RCTs. The use of LHB score [61] might be beneficial in studies focusing on LHBT treatment methods, since it is specific to biceps, unlike the score systems most studies use (Constant, ASES, SST, UCLA (University of California at Los Angeles), etc.). Creating and reporting subgroups would be beneficial (i.e., a group with concomitant rotator cuff surgery and a group without it or comparing different tenotomy methods with the potential for autotenodesis).

Conclusions

Based on our results, tenodesis should be preferred over tenotomy due to a less frequent occurrence of Popeye deformity, better postoperative biceps function, and the non-inferior nature of tenodesis regarding long-term pain.