Introduction

Published cancer research is increasing rapidly [1], and having trustworthy health recommendations accessible is in high demand by guideline users [2]. For decision-makers, clinical practice guidelines (CPGs) have emerged to be a reference to improve the quality of care by bringing together the best available scientific evidence into specific recommendations to improve the quality of care [3]. Their development involves identifying and refining a subject area, forming a multidisciplinary panel of experts, conducting a systematic review of the evidence, formulating recommendations based on the evidence, and grading the strength of the recommendations [4]. All the process should be transparent, evidence-based, and involve stakeholder input [5].

The European Society for Medical Oncology (ESMO) [6], the American Society of Clinical Oncology (ASCO) [7] and the Spanish Society of Medical Oncology (SEOM) [8] guidelines are prepared and reviewed by leading experts and they provide a set of evidence-based recommendations to serve as a guide for healthcare professionals and outline appropriate methods of treatment and care. Particularly, SEOM is a national, non-profit scientific society that promotes studies, training, and research activities. SEOM wants to increase its role as a reference society, a source of opinion, and rigorous knowledge about cancer for all the agents involved, patients, and society in general [8]. The SEOM’s project to develop guidelines started in 2010 as a perceived need by the Spanish oncologists, that required clinical practice documents tailored to the peculiarities of the Spanish healthcare system. Since 2014, open-access CPGs are available to facilitate clinical practice providing an eminently practical view of the most relevant considerations concerning various cancer-related scenarios [9].

To date, no independent assessment of its quality has been made, despite reports indicating that critical reviews of guidelines worldwide show they are not sufficiently robust [10, 11]. The quality of CPGs can be assessed using various tools, such as AGREE II and AGREE-REX, which evaluate the methodological quality, rigor, and transparency of guideline development. These tools can help to identify areas for improvement in CPGs and ensure that they are evidence-based and relevant to clinical practice. In this context, this study aims to critically assess the quality of CPGs on cancer treatment published by SEOM since 2014.

Methods

Study design

This is a critical review of CPGs. We followed rigorous standards [12] and reported our results according to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) 2020 checklist [13] (Supplementary file 1). Before starting the review process, we published a research protocol online in the Open Science Framework (OSF) repository [14].

Eligibility criteria

We considered a definition for CPGs previously reported by the Institute of Medicine (IOM) [15]. The inclusion criteria (all criteria required) were as follows: (a) CPGs for cancer treatment; (b) supported by SEOM; (c) published in English, or Spanish; and (d) published from 2014 onwards.

Exclusion criteria (any criterion required) were: (a) CPGs on cancer prevention, screening, detection, diagnostic, mapping, staging, imaging, scanning, or follow-up without treatment recommendations; (b) CPGs not containing recommendations for specific cancer (pathology-related guidelines); or (c) unavailable papers, surveys, audits, editorials, letters to the editor, case reports or notes.

Literature search

In February 2022, we identified eligible CPGs through electronic searches on MEDLINE (via PubMed), the SEOM website, and the Clinical and Translational Oncology Journal website, an international journal where SEOM guidelines are published. The search strategy for MEDLINE is presented in Supplementary file 2.

Screening and data extraction

Two reviewers performed an independent title and abstract screening and full text afterwards. A third reviewer solved disagreements. We used Rayyan®, a free web-based software tool for conducting reviews [16]. Two reviewers extracted data independently from included guidelines, in a previously piloted form. We extracted a minimum dataset considering the general characteristics of included CPGs. Also, we extracted data for the development of a recommendation mapping. The lack of agreements was resolved through discussion until a consensus was reached. When we found more than one guideline for a specific cancer (also of different guideline versions), we decided to create a publication thread analyzing the CPGs as a whole with all its references.

Quality assessment

Three independent reviewers assessed the quality of included CPGs using the AGREE II tool [17], developed by the International Appraisal of Guidelines, Research and Evaluation (AGREE) research team. The tool has become a widely used standard for evaluating the methodological quality and transparency of CPGs internationally [18]. The reviewers rated 23 key items across six domains: (1) scope and purpose; (2) stakeholder involvement; (3) rigor of development; (4) clarity of presentation; (5) applicability; and (6) editorial independence. Each item, including the two global rating items, was rated on a 7-point scale (1—strongly disagree to 7—strongly agree). As a complement to AGREE II, only for the guidelines scored as “high quality”, we used the instrument AGREE-REX, a new tool designed in 2019 to evaluate and optimize the clinical credibility and implementability of CPGs recommendations [19, 20]. AGREE-REX includes three key quality domains: clinical applicability (domain 1), values and preferences (domain 2), and implementability (domain 3), comprising 9 items that must be considered to ensure that guidelines recommendations are of high quality. This tool was used by the same three independent authors on a 7-point scale (1—strongly disagree to 7—strongly agree). Furthermore, the evaluator was asked about the recommendation of this guideline in the appropriate context or in the reviewers’ context. All the assessments were performed independently and blinded using an internally piloted data extraction spreadsheet in Microsoft Excel 2019.

Statistical analysis

As suggested by the AGREE II instructions, we calculated domain scores as a sum of the average scores of individual items from all evaluators’ assessments in the domains. Then, we expressed the total scores for each domain as a percentage of the maximum possible score for that domain. Therefore, the range of possible scores was 0–100%, representing the worst and best possible ratings for each domain, respectively. Each domain assessed with a score of ≥ 60% was considered effectively addressed. We considered CPGs as “high quality” if they scored ≥ 60% in at least three of six AGREE II domains, including domain 3. If three domains or more were assessed with a score of  ≥ 60%, except domain 3, they were considered to be of “moderate quality” overall quality. Finally, CPGs scored < 60% in two or more domains and scored < 50% in domain 3 were considered “low quality”.

We performed all statistical analyses in RStudio [21], including boxplot and ggplot2 packages. Descriptive analyses were performed by estimators’ central tendency and dispersion including mean and standard deviation (SD) or median and interquartile ranges (IQR). We calculated inter-rater reliability using the average intraclass correlation coefficient (ICC) (two-way random mixed model), including the 95% confidence interval. ICC can be interpreted as follows, 0–0.2 (poor agreement); 0.3–0.5 (fair agreement); 0.5–0.75 (moderate agreement); 0.75–0.9 (strong agreement); and > 0.9 (almost perfect agreement).

Results

The PRISMA statement flow diagram (Fig. 1) depicts the flow of information through the different phases of our critical review. Finally, 208 relevant references remained for the full-text review, and 69 met the inclusion criteria for detailed analysis (excluded studies are presented in Supplementary file 3). After considering the subsequent updates of the same CPGs, a total of 33 CPGs were included [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54].

Fig. 1
figure 1

PRISMA 2020 flowchart. From: Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. https://doi.org/10.1136/bmj.n71. For more information, visit: http://www.prisma-statement.org/

Table 1 presents the characteristics of the reviewed CPGs. We identified 25 cancer types. Colorectal and breast cancer were the locations with more publications. On average, SEOM published seven guidelines per year. The year that witnessed the highest production was 2015 (13 guidelines), while 2017 had the lowest production (no guidelines published). Among the identified guidelines, 28 had been published over three years ago.

Table 1 Characteristics of included clinical practice guidelines by the Spanish Society of Medical Oncology (SEOM) (n = 33)

Table 2 summarizes the standardized domain scores of the AGREE II and the overall quality rating of the CPGs. As per the pre-defined criteria of this study, 84.8% of CPGs (28) were considered “high quality” [23,24,25,26,27,28, 30,31,32,33, 39, 45, 55,56,57,58,59,60,61,62,63,64,65,66], 12.0% of CPGs (4) were assessed as “moderate quality” [22, 38, 44, 52, 67,68,69,70] and 3.0% CPGs (1) were considered as “low quality” [29]. A moderate agreement was present across all appraisers in this study (average measures ICC = 0.6; 95% CI 0.4, 0.7). Among the six domains of AGREE II in all guidelines, four median scores were rated ≥ 60 (domains 1, 3, 4, and 6). The highest median standardized scores (96.3) were observed in domain 4 (clarity of presentation), whereas domain 5 (applicability) was distinctively low (31.4), with only one of the CPGs scoring above 60%. Regarding domain 3 (rigor of development), the standardized scores ranged from 55.6 to 86.1 (median = 74.3), and 28 of the CPGs scored above 60.0%. The CPG for gastrointestinal sarcomas (GIST) [48] obtained the highest scores, fulfilling 100% of the criteria in two domains, and > 85.0% in domain 3.

Table 2 AGREE II standardized domain scores: CPGs by SEOM (n= 33)

Figure 2 summarizes the item scores of AGREE II. SEOM´s guidelines did not include in their development the views and preferences of the target population as can be inferred from item 5 (1–2, lowest scores). Moreover, they did not clarify the updating methods either, so all obtained the worst score on the Likert scale in item 14 [45]. Also, items 18–21 regarding applicability scored in most cases lower than 4. Specifically, item 21, which refers to monitoring and audit criteria, has a median score of 1.7, being the guideline about glioblastoma the only one with a high score (6.3) in this item [45].

Fig. 2
figure 2

AGREE II item scores of included clinical practice guidelines by the Spanish Society of Medical Oncology (SEOM) (n = 33). I1—The overall objective(s) of the guideline is (are) specifically described; I2—The health question(s) covered by the guideline is (are) specifi- cally described; I3—The population (patients, public, etc.) to whom the guideline is meant to apply is specifically described; I4—The guide- line development group includes individuals from all the relevant professional groups; I5—The views and preferences of the target population (patients, public, etc.) have been sought; I6—The target users of the guideline are clearly defined; I7—Systematic methods were used to search for evidence; I8—The criteria for selecting the evidence are clearly described; I9—The strengths and limitations of the body of evidence are clearly described; I10—The methods for formulating the recommendations are clearly described; I11—The health benefits, side effects, and risks have been considered in formulating the recommendations; I12—There is an explicit link between the recommendations and the supporting evidence; I13—The guideline has been externally reviewed by experts prior to its publication; I14—A procedure for updating the guideline is provided; I15—The recommendations are specific and unambiguous; I16—The different options for management of the condition or health issue are clearly presented; I17—Key recommendations are easily identifiable; I18—The guideline provides advice and/or tools on how the recommendations can be put into practice; I19—The guideline describes facilitators and barriers to its application; I20—The potential resource impli- cations of applying the recommendations have been considered; I21—The guideline presents monitoring and/ or auditing criteria; I22—The views of the funding body have not influenced the content of the guideline; I23—Competing interests of guideline development group members have been recorded and addressed

Table 3 summarizes the standardized domain scores of AGREE-REX. Considering only the 28 high-quality CPGs the mean overall AGREE-REX score was 48.5 (SD 11.0) with variability in performance across the individual 9 items. The overall average score of the recommendations was 4.2 out of 7 (SD 0.6). The domain 1 “clinical applicability” got the highest scores (mean 75.8, SD 14.3) and the domain 2  “values and preferences” got the lowest (mean 26.0, SD 12.2). AGREE-REX items that scored the highest were “2. Applicability to Target Users” (mean 6.2; SD 0.7), and “1. Evidence” (mean 5.8; SD 0.5), while the lowest scores were observed for the item “5. Values and Preferences of Patients/Population” (mean 1.6; SD 1.1), and item “6. Values and Preferences of Policy/Decision-Makers” (mean 1.6; SD 1.4).

Table 3 AGREE-REX standardized domain scores: CPGs by SEOM (n= 28)

Discussion

Overall, this critical review provides a thorough analysis of the quality of CPGs published by the Spanish Society of Medical Oncology on cancer treatment along the last nine years. The study also identified the characteristics of the guidelines, including the types of cancer covered and the timeline of their publication. Ultimately, 33 guidelines were included and assessed, with 28 (85.0%) of those being considered “high quality” according to pre-defined criteria; however, their applicability was found to be poor. One of the main strengths of the guidelines is the domain “clarity of presentation”, in which it achieved the highest possible scores, whereas domain “applicability” was distinctively low (31.4), with only one of the CPGs scoring above 60.0%. SEOM’s guidelines did not include in their formulation the views and preferences of the target population. Moreover, they did not specify the updating methods either.

Our results are pointing out that the SEOM is producing CPGs that meet established standards for methodological rigor. Being high scored in “clarity of presentation” is encouraging because it suggests that the SEOM is effectively communicating its recommendations to clinicians and patients. This result is particularly noteworthy given that clear and understandable guidelines are essential for their effective implementation in clinical practice, ensuring that the recommendations are specific and unambiguous, the different options for management of the condition or health issues are clearly presented, and key recommendations are easily identifiable [73].

However, it is concerning that guidelines scored low in “applicability” because it suggests that may not be as useful for clinicians and patients as they could be. Previous research indicates that for clinical guidelines to have an actual impact on processes and ultimately outcomes of care, they need to be not only well developed and based on scientific evidence but also disseminated and implemented in ways that ensure they are actually used by clinicians [71, 72]. Implementation science frameworks have been used to address challenges in implementing clinical practice guidelines [72]. Also, SEOM's guidelines did not include the views and preferences of the target population or specify the updating methods indicates that there is room for improvement in the guidelines development process. Incorporating patient perspectives into guidelines development can improve the relevance and applicability of guidelines to clinical practice [73]. Specifying updating methods is also important to ensure that guidelines remain up-to-date and reflect the latest evidence [5].

Our critical review used both AGREE II and AGREE-REX, tools used to evaluate practice guidelines, but with different focuses. AGREE II is designed to assess the methodological quality and transparency of practice guidelines, while AGREE-REX is designed to evaluate the clinical credibility and implementability of practice guidelines [17, 20]. AGREE-REX is a complement to AGREE II, rather than an alternative, and provides a blueprint for practice guideline development and reporting. Although scoring high in AGREE II is essential, it does not guarantee that recommendations are optimal for targeted users nor the optimal implementation of the recommendations [19]. In our review, we found that recommendations from guidelines scored as “high quality” on AGREE II, did not align with the values and preferences of their target users, whether they be patients or policy makers, according to AGREE-REX. This result has been previously reported elsewhere [20]. In this context, we consider that guidelines that are developed without considering the values and preferences may not be relevant or applicable to their needs, which can lead to poor adherence and outcomes.

Finally, there have been several critical appraisals of the quality of guidelines in cancer [20, 74,75,76], but there are no search results that indicate whether ASCO or ESMO have conducted critical appraisals of the quality of guidelines in cancer. One study used the AGREE tool to assess the quality of oncology guidelines developed in different countries [77]. Another study used the AGREE II tool to assess the methodological quality of clinical practice guidelines with physical activity recommendations for people diagnosed with cancer [76]. The results of these studies showed a heterogeneous quality of existing guidelines in cancer, indicating a need for improvement in the development and reporting of guidelines.

Strengths and limitations

Our research has multiple strengths. We implemented a thorough search strategy to locate SEOM guidelines, and utilized a standardized and globally recognized guidelines appraisal tool (AGREE II). While our study is not the first to critically appraise guidelines [11, 20], it is noteworthy that we are one of the few studies to use the AGREE-REX tool (developed in 2019) for assessing cancer guidelines. Furthermore, to the best of our knowledge, this is one of the first independent evaluations of the quality of cancer treatment guidelines from a scientific society, which adds to the significance of our findings.

Nevertheless, there are also some limitations to our study that need to be acknowledged. First, our study only assessed the methodological quality of the SEOM guidelines and did not evaluate their impact on clinical practice or patient outcomes. Second, while AGREE II and AGREE-REX are recognized appraisal tools, they do have limitations, and therefore, their application alone may not fully encompass all aspects of guidelines quality. Third, we should not be assumed that having a rigorous methodology means that all issues have been dealt with exhaustively and accurately. Some recommendations could not be sufficiently detailed to guide treatment decisions in specific situations, such as advanced cancer, end of life, elderly, and comorbidities. Fourth, we cannot assume that clinicians’ adherence to these guidelines is high, so having high-quality guidelines does not necessarily mean that clinicians are making the right decisions, as many studies previously reported [78,79,80]. Finally, due to the nature of the study design, our results are limited to the time period of guidelines publication and do not account for any subsequent updates or changes to the guidelines.

It is worth considering the potential redundancy with other cancer treatment guidelines developed by international organizations or societies. While our study focuses on the SEOM guidelines, it is important to acknowledge that other guidelines exist that may provide valuable insights and recommendations. It is worth reflecting on the extent to which these guidelines, taken individually, contribute different nuances and perspectives on cancer treatment and management. In light of this, it is worth asking whether a policy of adapting guidelines [81] might be a more efficient approach than the development of new guidelines from scratch. Such an approach could help to reconcile the differences between guidelines and promote the uptake of best practices across multiple contexts.

Implications for practice and research

Overall, this review emphasizes the importance of producing high-quality and applicable CPGs in oncology to guide clinical practice and improve patient outcomes. The findings provide insights into the strengths and limitations of SEOM's guidelines and highlight areas where improvement can be made to enhance their relevance and usefulness.

Conclusions

SEOM guidelines on cancer treatment have been developed with acceptable methodological rigor although they have some drawbacks that could be improved in the future, such as clinical applicability and items regarding patient views and preferences as well as unmeeting policies.