Background

Hypotension is considered a major side effect of spinal anaesthesia for caesarean section that can compromise foetal circulation, causing hypoxia and foetal acidosis in the unborn baby [1]. Many randomised controlled trials (RCTs), the gold standard of clinical research, were undertaken to study measures aiming to prevent hypotension. In a previous report, we demonstrated that research in this field is limited by a wide variety of definitions of hypotension making it difficult if not impossible to compare the results of the RCTs [2].

The present study sought to evaluate the quality of RCTs in this field. Incomplete data reporting and methodological flaws can limit the value of a RCT. In order to avoid that poor studies receive false credibility, readers need to evaluate the quality of the methods. In response to growing concerns about the quality of clinical trials, a group of 30 experts including medical journal editors, clinical trialists, epidemiologists and methodologists identified 22 items for which there was evidence that inadequate reporting can introduce bias. A checklist containing all 22 items and a flow diagram were designed, both instruments examining transparent reporting of chronological enrolment, intervention allocation, follow-up and data analysis. So the Consolidated Standards of Reporting Trials (CONSORT) statement was developed and first published in 1996; a revised version was issued in 2001 [3].

There is evidence that studies with a low-quality reporting tend to overestimate the effect of the evaluated intervention by 30–50% [4]. For the assessment of quality of reporting of clinical studies, the CONSORT tool offers some advantages. Proper performance of randomisation, blinding of randomisation and blinding of the subjects and investigators to study interventions will help to reduce bias in clinical studies [5]. Eligibility criteria relate to the external validity of the study, assisting the clinician to decide whether the results of the trial can be applied to his own patients. Scientific validity is reflected by aspects involving statistical analysis. In addition, standards for abstract and title facilitate the process of finding relevant articles.

Since quality of reporting RCTs of interventions to prevent hypotension due to spinal anaesthesia in parturients undergoing caesarean section has not been systematically assessed the present study applied the CONSORT checklist to RCTs. We compared a period before CONSORT (1990–1994) with a post-CONSORT period (2004–2008).

Methods

To identify relevant literature, we performed a PubMed with the search terms “caesarean section” and “hypotension” and “randomised controlled trial” as well as a hand-search of anaesthesiologic journals, the journals Obstetrics and Gynecology as well as the American Journal of Obstetrics and Gynecology. The search was restricted to articles in English language. Two periods were searched: A first period comprised the time from January 1, 1990 until December 31, 1994. The second period ranged from January 1, 2004 until December 31, 2008.

The retrieved articles were then independently screened for eligibility by two authors. Studies were included into this review when healthy parturients scheduled for caesarean section under spinal anaesthesia were randomly assigned to at least one intervention arm that aimed to prevent or treat anaesthesia-induced hypotension. In agreement with others [6], blinding of the graders was not performed since research on the impact of blinding on the evaluation of quality of RCTs has yielded ambiguous results [7, 8].

The CONSORT statement was used to evaluate the reporting quality of the included trials. Each of the 22 CONSORT items was graded as either “yes” or “no”. In case that blinding was not possible, a “not applicable” was also possible. The CONSORT score of each RCT was calculated by adding the correctly reported domains of the CONSORT checklist and expressed as percentage of the maximally available number (in general out of 22, out of 21 in cases that blinding was not possible). All domains had the same weight.

For the purpose of this study, the reviewers underwent systematic training. Initially, they received the German version of the “Revised CONSORT statement for reporting randomised trials: Explanation and elaboration document” which provides the meaning of each item and examples of good practise. Consecutively, they were trained by scoring two articles that were not part of this review.

All reports were independently graded by two investigators. According to our protocol, a third reviewer had to be involved if consensus was not reached between the two first investigators.

To assess chance-adjusted interrater reliability, Cohen’s kappa statistic was determined. This method involves the degree of reviewers’ agreement on whether a domain was reported or not. It was calculated for three articles that were not included into this study.

The articles were grouped by publication period (1990–1994: pre-CONSORT; 2004–2008: post-CONSORT). The pre- and post-CONSORT periods were compared by calculating the odds ratio and the 95% confidence interval for each domain. CONSORT scores are given as mean ± standard deviation. Student’s t test served for the comparison of CONSORT scores in the two periods. The level of statistical significance was set at the two-sided 0.05 value.

Findings

We identified 48 articles with our search strategy of which 37 [945] met the inclusion criteria (Fig. 1). All articles found by the hand-search were also retrieved by the PubMed search. Of these 37 articles, 13 were published in the pre-CONSORT period (1990–1994) and 24 in the post-CONSORT period (2004–2008). Table 1 shows the sources of the retrieved articles, 33 articles were published in anaesthesiologic journals, two in obstetric journals and two others in general medical journals.

Fig. 1
figure 1

Flow diagram with included and excluded articles

Table 1 Source of articles

The CONSORT scores increased from 66.7 ± 12.5% in the pre-CONSORT period to 87.4 ± 6.9% in the post-CONSORT era (p < 0.01). Agreement between the evaluators was good with a k = 0.94 (0.92–0.96).

Figure 2 portrays the percentage of correctly described CONSORT items in the pre- and post-CONSORT era. More than a third of all articles from both periods reported 90–100% correctly, and only 5% had a correct reporting of 40–50% of the CONSORT items. When comparing the two time periods, a significant improvement was observed in eight items including sample size calculation, method of randomization, implementation of randomization, blinding, statistical methods, participant flow, intention-to-treat analysis (ITT) and generalizability. A non-significant improvement was found for endpoints, allocation concealment, baseline data, outcomes and estimation of effect, ancillary analysis, interpretation of results and overall evidence (Table 2). For two items, a decrease of correctly reporting articles was found, i.e. recruitment and follow-up as well as adverse effects. In the pre-CONSORT period, five domains were correctly reported in all 13 articles.

Fig. 2
figure 2

Percentage of correct CONSORT items in individual articles in the pre- (1990–1994) and post-CONSORT (2004–2008) period

Table 2 Proportion of reporting of CONSORT items in randomised controlled trials in periods pre- (1990–1994) and post-CONSORT (2004–2008)

Discussion

To our knowledge, this is the first study evaluating the compliance with CONSORT of RCTs dealing with hypotension due to spinal anaesthesia for caesarean section. Our approach of studying a period before publication of CONSORT with a time period thereafter has several precedents [6, 46, 47].

As a major result, we found a significant increase in the CONSORT scores from 66.7% to 87.4%. Also, a statistically significant improvement was observed for eight items after CONSORT was published and an improvement in seven items that did not reach statistical significance. Of the total of 22 items, five items were already correctly reported in the pre-CONSORT era.

A remarkable improvement in the items relating to randomization was found. This is critical for the detection of selection bias, and deficiencies were associated with an exaggeration of the treatment effect [4, 48]. Another area of improvement was the participant flow diagram which is intended to explicitly report the number of subjects undergoing randomization, receiving treatment, the number of dropouts and the number finally being analysed [49].

ITT was used by all reports of the post-CONSORT era versus ten out of 13 pre-CONSORT articles. ITT is strongly recommended since it preserves the randomization process and allows for non-compliance and deviations from policy [50]. ITT together with methods of randomization and blinding is crucial for internal validity and helps to avoid selection, performance, detection and attrition bias [5].

Sample size calculation was an item that showed the sharpest increase over time, i.e. the percentage of trails correctly reporting on this item increased from 23% (three of 13 reports) to 71% (17 of 24 reports) in the post-CONSORT period. Sample size calculation is required to quantitatively estimate the power of a trial to answer the studied question [46]. Underpowered trials are prone to bias and can negatively affect the quality of meta-analyses [46]. However, it has to be emphasized that this CONSORT item only evaluates whether a sample size calculation has been performed but it does not give information of whether this calculation is correct or not.

Interestingly, our analysis found a lower percentage of correct reporting for the items recruitment and follow-up as well as adverse effects in the post- versus the pre-CONSORT period, although statistically not significant. It could be speculated that the reporting of adverse effects was omitted since authors feared that the occurrence of side effects and complications could question their positive results and lead to a rejection of the manuscript. The fact that inadequate reporting is found in the more recent papers could be due to the increasing competition in the academic field leading to a growing pressure to produce publications, since the number of authored papers is one of the parameters for evaluating scientific careers [51, 52]. A report that evaluated surgical papers which were published in 2005 confirmed our finding of a high rate of inadequate reporting of adverse effects [53]. Editors should probably draw the conclusion that authors have to be encouraged to report on undesired effects. Proper definition and reporting of adverse events is crucial for critical appraisal of study results and, in addition, facilitates systematic reviews and meta-analysis. Previous studies showed that acute surveillance, i.e. actively asking study subjects whether undesired events occurred by use of structured questionnaires, interviews or diagnostic tests at predefined time intervals, is more effective than passive disclosure [44, 45]. Reporting of adverse events should already be considered during study design since data on adverse events are less susceptible to bias and confounders when they are collected prospectively rather than retrospectively [54, 55]. Our finding is confirmed by a study about the reporting quality of surgical trials [53]

There are many other tools to evaluate the quality of RCTs. We chose the CONSORT checklist for several reasons. Firstly, CONSORT is officially supported by the World Association of Medical Authors and the International Committee of Medical Journal Editors. Secondly, several reports demonstrated that poor compliance with CONSORT criteria is associated with an exaggeration of the effect size. Moher and colleagues found that RCTs that were not double-blind overestimated the effects by 17% [4]. Inadequate or unclear allocation concealment exaggerated odds ratios by 41% and 30%, respectively. A similar finding was obtained by another group [56]. An exaggeration of the treatment effect in single RCTs by inappropriate reporting has also consequences for subsequent meta-analyses, considered to be the highest level of evidence-based medicine and often guiding our clinical practise. In their meta-analysis on anticoagulants, Lensing et al. [57] clearly favoured low molecular heparins. However, when taking quality of reporting of the underlying RCTs into consideration, superiority of low molecular weight heparin on mortality due to venous thrombosis was no longer given [4]. Exclusion of low-quality trials may thus directly impact our clinical practise.

It is probably a consequence of its widespread use that CONSORT has already been proven effective: A meta-analysis including 248 articles clearly demonstrated an improvement of the reporting of RCTs by the adoption of this tool [49]. It is noteworthy that these authors compared a time period before CONSORT publication with a period thereafter; this approach was also chosen in our report.

Thirdly, CONSORT has in the meantime been applied in many other medical specialities [53, 58]. We used the CONSORT checklist in its 2001 version. An update has been published very recently [59], after our study was completed. However, use of the 2001 version allows us to compare our results with those obtained in other medical specialities and therapeutic areas.

In only 12% of trials of analgesics given for pain relief after trauma or orthopaedic surgery sample size calculation was reported [60]. This percentage was 54% in our survey, combining the pre- and post-CONSORT period. The reason for the discrepancy could be due to the fact that the other group included trials published from 1966 on, thus covering a time period when probably little or no attention was paid to this issue, whereas our early period began much later, in 1990. In a study of nutritional support trials [58], use of blinding was found to increase significantly from 19% to 41% from a pre- (before 1996) to the post-CONSORT (after 1996) era. We observed an increase from 31% to 80%. In our study, intention-to-treat analysis was used in 76% of the pre-CONSORT and in 100% of the post-CONSORT articles, an improvement reaching statistical significance. No improvement was found for this parameter in the study by Doig and colleagues [58].

The CONSORT score of urological and non-urological surgical trials published between 2000 and 2003 was 11.1 and 11.2 (corresponding to 50.45% and 50.90%) [61], respectively, which is much lower than the CONSORT scores in either of our study periods.

Comparing our results with a study of obstetric anaesthesia trials could be of particular interest. This report by Halpern and colleagues [61] focussed on articles published from June 2000 to June 2002, providing no information on whether reporting quality might have changed over time. When comparing the reports included into this study [61], which were published between June 2000 and June 2002, with articles from our post-CONSORT era (2004–2009), we found a higher percentage of correctly reported CONSORT items in our study. More than 80% correctly reported domains were observed in 62% of our articles whereas less than 5% of the articles from the study by Halpern et al. [61] reach this value. The majority of the obstetric anaesthesia articles only ranged between 50% and 70% correctly reported items, compared to less than 5% in our sample of RCTs of interventions to prevent hypotension after spinal anaesthesia. We think it is the time difference between the two studies (2004 to 2009) in our post-CONSORT period versus June 2000 to June 2002 in the study by Halpern et al. [61] which accounts for the higher percentage of compliance with some CONSORT items in our study, since awareness of the CONSORT statement published in 1996 (respective in 2001 for the revised version) had less time to spread in the study period of Halpern et al. [61]. This notion receives further support from studies by others: Only few improvements could be observed when comparing trials from before 1996 with those appearing immediately after 1996 [60]. Moher et al. [46] also considered it a flaw of their study that their post-CONSORT period began only 12 to 18 months after publication of CONSORT. They argued that “effective dissemination is a slow process and that to estimate the true influence of CONSORT requires more time.”

It is a limitation of our study that we do not provide data on the association of quality of reporting and CONSORT adoption. We did not test for the association between the CONSORT score in adopters and non- or late adopters since every additional analysis increases the likelihood of a type I error, i.e. incorrectly inferring that there is a significant differences when there is no such difference.

In summary, the CONSORT scores increased over time from 66.7% to 87.4%, reflecting a remarkable improvement. We observed a significantly better reporting in eight of 22 items in the post-CONSORT compared to the pre-CONSORT period. However, we saw a decrease, although statistically not significant, in the reporting of adverse events which deserves further attention.

Conclusion

We conclude from our study that the reporting quality has improved significantly in the period after dissemination of the CONSORT statement. Therefore, journal editors, reviewers and authors should be encouraged to adhere to the CONSORT checklist in order to ensure high-quality trials. Consequently, clinicians can spend more time considering the findings, rather than scrutinizing quality of reporting of the trial.