INTRODUCTION

The use of reporting guidelines1 by authors has been shown to improve the transparency of reporting of randomized controlled trials2, 3 and observational studies.4, 5 Reporting guidelines are also important for clinical practice guidelines (CPGs) because CPGs are used to translate evidence into practice6 by synthesizing strong evidence into actionable recommendations.7, 8

Out of more than 400 reporting guidelines available for different study designs and approaches,1, 9 eight have been developed for CPGs: updates,10 patient involvement,11 palliative care,12 complementary and alternative medicine,13 and guideline-based performance measures.14 Following a proposal from the Conference of Guideline Standardization from 2003,15 two recent guidelines address the reporting of CPGs: the Appraisal of Guidelines, Research and Evaluation (AGREE) Reporting Checklist from 2016,16 and the Reporting Items for Practice Guidelines in Healthcare (RIGHT Statement) from 2017.17

The AGREE checklist is based on items from the AGREE II tool for evaluating the methodological quality and transparency of CPGs.18 The RIGHT Statement was developed in collaboration with World Health Organization experts to complement methodological guidelines and assist guideline developers, readers, editors, reviewers, and health care practitioners.17 As the two guidelines were developed independently and used different methodological approaches, it is not clear whether these differences are reflected in their performance. Thematic comparison of the two reporting guidelines has been performed,19 but they have not been compared in practice, on published CPGs.

We used the AGREE and RIGHT checklists to compare completeness of reporting in two sets of CPGs: national (Croatian) and corresponding international (European) CPGs. We also compared the two checklists to see whether they assess the same concepts of CPG reporting, and compared their language characteristics, as there is evidence that the language and tone of health messages are important for decision-making in health.20, 21

METHODS

Setting and Sample

We tested the two checklists on a set of 79 Croatian (n = 37) and European (n = 42) CPGs, collected for a previous study which used the RIGHT checklist to compare Croatian and European CPGs.22 The final search for guidelines was performed in January 2019. Two authors (RT and MV) independently identified guidelines for inclusion, and the final decision was made by consensus and consultations with the third expert (AM). CPGs that were not available in Croatian or English were excluded from the analysis.

Data Extraction

Two authors (RT and MV), medical practitioners whose mother tongue is Croatian (as Croatian CPGs are published in Croatian by the Croatian Medical Association) and who are also fluent in English, independently assessed the adherence of published CPGs to the RIGHT and AGREE checklists. Before data extraction, the assessors were trained in using the checklists and piloted the data extraction. The Kappa statistic was used to measure the level of agreement.

Data Analysis

As the two reporting checklists have different numbers of items (35 for RIGHT and 87 for AGREE), the completeness of reporting was expressed as the percent of reported items out of the total number of items on a reporting checklist for each CPG. Median percentages of reporting completeness (with interquartile range, IQR) for each of the two checklist were compared using the Mann-Whitney test.

We also compared the completeness of reporting for national and international CPGs, separately for each item and for each checklist. These data, expressed as the percent of CPGs reporting the individual checklist item, were compared using Fisher’s exact test.

The correlation of reporting completeness between the two checklists was calculated using Spearman’s rho correlation coefficient.

A Bland Altman plot23 was used to analyze agreement between the two checklists, for which we used CPGs’ adherence to checklists expressed as proportions. We set the acceptable difference of the two measures at 0.2.24

We compared the content of the checklists according to the domains stated in the checklists. We used the English versions of both checklists. Two authors (RT and AM) compared their content independently, and then discussed the results with the third author (MV) to resolve any disagreements.

The text of the two reporting guidelines was analyzed for linguistic content using the Linguistic Inquiry and Word Count (LWIC) program.25 We used four summary variables, which are algorithms made from various LIWC variables based on previous language research.25 The variables represent the proportion of words from each specific summary category within the overall text (ranging from 0 to 100).

  1. 1.

    Analytical thinking measures the use of words suggesting logical, formal, and hierarchical thinking. Lower scores indicate language that is more narrative and focused on the “here and now” situation.

  2. 2.

    Clout refers to confidence, leadership, or social status. Higher clout scores suggest that the author is speaking from the perspective of high expertise and confidence and lower clout scores suggest a more tentative and humble style.

  3. 3.

    Authenticity measures the linguistic presentation in an authentic or honest way, with higher scores indicating a more personal and humble tone.

  4. 4.

    Emotional tone measures positive and negative emotions in the text. Scores below 50 indicate a more negative emotional tone.25

The proportions of words in the two checklists that belong to each of the 4 summary LIWC variables were compared using Fisher’s exact test.

Readability of the two reporting guidelines was assessed using the SMOG (“simple measure of gobbledygook”) index.26 We used the SMOG index because it is considered to be the most appropriate for health communication.27 The SMOG score corresponds to the years of education required for a reader to understand the text, so a SMOG readability index of 8 suggests that a 13- to 14-year-old (an 8th grader in the US educational system) would be able to understand it. The interpretation of the SMOG index in the context of health information is that values over 6 are difficult for a non-specialist reader.28

Statistical analyses were performed using MedCalc Statistical Software version 16.4.3 (MedCalc Software, Ostend, Belgium), and JASP software, version 0.8.6 (JASP team, Amsterdam, Netherlands, 2018).

RESULTS

Comparison of Checklists in Assessing CPGs Reporting Completeness

We analyzed 79 CPGs: 37 from the Croatian Medical Association and 42 corresponding European CPGs (the full list of included guidelines is in Tables 1 and 2 in the Appendix). The Kappa coefficient between the two assessors ranged from 0.86 to 1.0 for individual items on the two checklists. CPGs scored higher on the RIGHT than the AGREE checklist for the total sample (median 43% (interquartile range (IQR) 37–46) vs 23% (IQR 21–29) of reported items; P < 0.001, Mann-Whitney test) and for the Croatian and European CPGs. Both checklists showed differences in reporting completeness for individual items between Croatian and European guidelines (Tables 3 and 4 in the Appendix).

Overall, the items with the highest reporting completeness were from the domains of Basic information, Background of the health problem, and Recommendations for RIGHT, and the domains of Scope and purpose and Clarity of presentation for AGREE. In both checklists, these items indicate the purpose of the CPG, the members of the development group, and statements of recommendation. The items with the lowest completeness of reporting were in the Evidence and Recommendations domains for RIGHT, and the Rigor of development and Applicability domains for AGREE. In both checklists, those items correspond to the outcome and evidence selection, critical review of the evidence, barriers for implementation, and cost implications. The checklist items with the lowest reporting frequencies were patient values and preferences in RIGHT, and monitoring/auditing criteria in AGREE.

There was a high correlation between the two checklists in the completeness of reporting in the total sample of CPGs (Spearman’s ρ = 0.72, P < 0.001, Fig. 1). This was true for both the subsamples of Croatian (ρ = 0.72, P < 0.001) and European (ρ = 0.71, P < 0.001) CPGs (Figure 1 and Figure 2 in the Appendix).]-->

Fig. 1
figure 1

Correlation plot of percent reported items from the RIGHT and AGREE checklists for 79 clinical practice guidelines (CPGs). Each dot represents an individual CPG (n = 79, Spearman’s ρ = 0.72, P < 0.001).

A Bland Altman plot showed no significant differences between the two checklists (Fig. 2): 95% of item reporting scores were between the limits of agreement, and the mean difference between the two checklists was 0.18 (95% CI 0.17–0.19). There was no difference in agreement between the Croatian and European CPGs (Figure 3 and Figure 4 in the Appendix).]-->

Fig. 2
figure 2

Bland Altman plot of the differences between two checklists against the mean completeness of checklist item reporting for the full set of clinical practice guidelines (CPGs, n = 79). Dots represent data points for individual CPGs. The y-axis shows the difference between the two checklist item reporting proportions, individually for each CPG. The x-axis shows the mean proportion of the two measures for each CPG individually: (RIGHT proportion + AGREE proportion)/2.

Linguistic Analysis

The RIGHT Statement has 845 and the AGREE Reporting Checklist 1529 words. Their SMOG readability scores were 13.3 and 12.3, respectively.

Linguistic analysis showed that AGREE contained more words related to analytical tone, clout, and authenticity than RIGHT (Table 1). However, RIGHT was characterized by a higher emotional tone than AGREE (Table 1).

Table 1 Linguistic Analysis of the RIGHT and AGREE Reporting Checklists

Comparison of Checklist Item Contents

The AGREE Reporting Checklist is considerably longer than RIGHT, with 87 items spread over 6 domains, compared with 35 items over 7 domains. Matching of the individual items showed that 11 of the 35 items (31%) in the RIGHT checklist were not covered by AGREE and 10 out of 87 items (11%) in AGREE were not in RIGHT (Table 5 in the Appendix).

A prominent difference between the checklists (Table 6 in the Appendix) was the requirements for methods of searching for evidence. The RIGHT checklist requires specific information on systematic reviews and a risk-of-bias analysis for included studies. The AGREE Reporting Checklist includes 7 sub-items for the assessment of evidence—methodology, appropriateness of outcomes, consistency and direction of results, magnitude of benefit vs harm, and applicability to practice.

The RIGHT checklist expects authors to state the strength of each recommendation, whereas AGREE recommends evidence summaries and/or tables. Another difference is that AGREE focuses on voting outcomes and how they influenced the final recommendation, whereas RIGHT considers how values and preference of the target populations influenced individual recommendation. The RIGHT checklist also asks if cost and resources were considered in the formulation of recommendations and requires an explanation if they were not conducted or considered in the final recommendation.

DISCUSSION

The RIGHT and AGREE reporting checklists perform very similarly in measuring the completeness of reporting of clinical practice guidelines. The results from both checklists were similar in terms of reporting completeness for the overall sample and for a set of national (Croatian) and international (European) CPGs, demonstrating that these checklists can be applied to CPGs that have been created in different settings. The consistent finding of greater adherence of the CPGs in our study to the RIGHT checklist (median 43% for RIGHT vs median 23% for AGREE) may indicate that the RIGHT checklist covers a wider range of issues relevant and/or present in currently available CPGs.

A recent study by Yao et al.19 compared the items of the AGREE and RIGHT qualitatively. Our study expanded the qualitative analysis of the checklists by assessing their language and by comparing them in practice, on published CPGs. To the best of our knowledge, this is the first use of language and readability analysis of reporting checklists. Our study is also the first comparison of the two checklists in practice, using sets of national and international CPGs. In contrast to the study of Yao et al.,19 we used methods for comparing diagnostic tools in our comparison of the two reporting checklists, as they “diagnose” the completeness of reporting of essential items. Whereas Yao et al. concluded that a new reporting guideline should be developed based on the content of the two existing guidelines,19 our interpretation of the results from our study is that there is sufficient guidance in the existing reporting guidelines. For end users, who may be practitioners, CPG developers, or patients, the RIGHT checklist may be more user-friendly. It also puts more emphasis on reporting the limitations and exceptions in methodology. For greater rigor in reporting CPG methodology, AGREE Reporting could be an additional tool for CPG developers.

The developmental background of the two checklists is reflected in their content. One of the important differences is in their approach to “non-reporting.” While all of the AGREE Reporting items require reporting of an action, the RIGHT checklist requires an explicit explanation of reasons why some procedures and analyses were not performed. For example, the item on stakeholder involvement in AGREE Reporting (domain 2) requires a description of methods and strategies for collecting the views and preferences of the target population. The corresponding item in the RIGHT checklist (item 14a) asks for the same information, but also states that if the views and preferences were not sought or considered, the authors should explain why. It is sometimes difficult to address stakeholder issues in practice, so not all CPGs will include them in their development.17 Nevertheless, an explanation should be provided.

Thus, the RIGHT checklist requires more reflection from the authors on the guideline development in the form of self-criticism and acknowledgment of gaps and limitations. Such a reflective approach could be particularly relevant for health systems lacking rigor or structure for the development and audit of CPGs.29 It is also helpful to readers of CPGs when a limitation is clearly stated, rather than silently omitted.

The two checklists had similar readability levels: final high-school (12th) grade level for AGREE and early college level for RIGHT. These SMOG index scores indicate low readability of the text and requirement for a specialized reader, but are nevertheless better than some online health information about diseases, such as Wikipedia pages on autoimmune disorders, which had a SMOG index score of 15,30 or published scientific articles generally.31

Both readability and linguistic characteristics have been shown to be important for understanding and interpretation of a text.20, 32 Linguistic analysis of the checklists showed that both use a high proportion of words related to analytical tone, with AGREE having significantly more such words than RIGHT. High analytical linguistic content reflects formal and logical thinking, while lower content represents more narrative and informal thinking.25 The AGREE Reporting Checklist also uses significantly more words related to clout, which suggests that the checklist content was written from the level of high expertise and confidence. Words reflecting authenticity tone were lower for the RIGHT checklist, which would suggest a slightly more guarded tone. However, words with a positive emotional tone were significantly more frequent for RIGHT, indicating a more positive style of writing.25

Simplicity of language, selection of words, and the way in which CPGs are written have been shown to influence their use and implementation.6, 32,33,34,35 This may also be true for reporting guidelines. Our results warrant further investigation into how language affects the understanding and use of such checklists, as current evidence mostly addresses the effects of different formats and user interfaces for reporting guidelines.36, 37 Reporting guidelines, just like CPGs,6 should be easy to use and understand, employing language suitable for busy researcher/practitioners. Use of simpler language might increase adherence to reporting guidelines, and, consequently, improve knowledge transfer and uptake of evidence into practice.38

One limitation of our study lies in the use of a relatively small set of CPGs from a small national health care system and those at the European level, which limits the generalizability of our findings. We purposely used CPGs from various sources to control for geographical, conceptual, and methodological approaches to CPG creation and reporting.29 The similarity of the results for both sets of CPGs and high agreement between two independent assessors strengthen the validity of our findings.

We compared the completeness of reporting of all checklist items, but it is important to keep in mind that the items do not have the same weight or value for practitioners. For example, items relating to the title of the guideline and those on cost and resource implications would not be of the same importance in the development of a CPG. Looking at item-per-item comparison, the items with the lowest adherence for both checklists (Tables 3 and 4 in Appendix) were related to outcome and evidence selection and critical review of evidence. An evidence-to-recommendation process cannot be adequately assessed without this information, and that may be the most important reporting item for practitioners looking to implement recommendations in clinical practice.

In conclusion, our comparison of the two CPG reporting checklists, RIGHT and AGREE, showed that they could be used interchangeably as assessment tools because they produce consistent results. For CPG development, higher scores for the RIGHT checklist indicate that it has a broader coverage of current issues relevant for CPGs, and the AGREE Reporting Checklist could be considered aspirational guidance for reporting methodological rigor of CPG development. The RIGHT checklist is also shorter and may have linguistic characteristics which make it simpler to use, both for clinical practitioners and patients.