Introduction

Atherosclerotic cardiovascular disease (ASCVD) is a chronic disorder that develops gradually during life, and its progress can be seen in the appearance of symptoms in patients1. Despite recent therapeutic advances, ASCVD remains the leading cause of death worldwide2. ASCVD is a pattern of atherosclerosis wherein the artery wall develops abnormalities called lesions. Depending on which artery is affected, ASCVD can lead to coronary heart disease, cerebrovascular disease and other peripheral vascular diseases, congestive heart failure, carotid heart disease, aneurysm, or kidney problems3,4,5. Ischemic heart disease and cerebrovascular disease are the first and third leading causes of death worldwide5,6.

ASCVD deaths are a main burden of death, accounting for more than 80% of all cases7. In China, ASCVD is the leading cause of death, with over 40% of deaths attributable to the disease, with increasing incidence8. Despite a decrease in the incidence of coronary heart disease in the United States over the past 30 years, ASCVD remains the leading cause of death among US residents, affecting approximately 5.2% of the population9. In Europe, ASCVD is responsible for approximately 3.9 million deaths annually, which accounts for 45% of all deaths10. In Iran, ASCVD is one of the main causes of death and disability, with the prevalence of risk factors for the disease increasing over the past few decades11,12. A global survey found that approximately 422.7 million people were living with ASCVD, with 17.9 million deaths due to the disease in 201513. The global mortality rate for ASCVD is primarily due to population growth and aging and a lack of recognition and appropriate treatment of those with ASCVD risk factors13,14.

ASCVD is influenced by various risk factors, including dyslipidemia, inflammation, hypertension, diabetes, excessive consumption of sugar-sweetened beverages and alcohol, smoking, obesity, lifestyle, and associated conditions15,16,17. Of these, dyslipidemias, obesity, high glucose levels, high blood pressure, and insulin resistance are the most significant and common physiological and metabolic changes contributing to increased ASCVD mortality18,19,20,21,22. Long-term complications and mortality from ASCVD can be reduced by controlling modifiable risk factors such as maintaining a healthy diet, regular exercise, not smoking, and maintaining a healthy weight23,24,25. As such, taking a multifactorial approach to control ASCVD risk factors and modifying several overall risk factors, rather than just individual risk factors, is a more effective way to reduce ASCVD risks26,27.

Bayesian networks (BNs) constitute a pivotal facet of probabilistic models, embodying a robust probabilistic basis essential for managing uncertainty within artificial intelligence. Historically known as probabilistic causal networks28, BNs intricately utilize graph theory and probability theory to delineate associations among a given set of variables (depicted as nodes within the graph) and their conditional probabilities (CPs). While primarily recognized for their capacity to model and predict causal relationships, BNs extend their functionality to encapsulate and portray intricate probabilistic interdependencies among variables28,29. These interrelations manifest within directed acyclic graphs (DAGs), elucidating the directed dependencies and conditional associations amid variables28,30,31. Notably, BNs possess the capability to infer the probabilities of latent variables predicated upon known variables, thereby engendering a systemic alteration in the probabilities of all variables consequent to a change in a single variable's state. The continual evolution of BNs enables the systematic integration of data or domain-specific knowledge within the healthcare realm, and in particular for the case of our data, the risk factors of ASCVD, showcasing a burgeoning array of applications spanning diverse scientific domains, encompassing but not limited to medicine and the social sciences32,33,34,35.

On the other hand, the prevention of ASCVD mortality is a global public health priority2,27, and this necessitates the application of BNs for cardiovascular disease (CVD) risk prediction, diagnosis, evaluation, and clinical decision-making36,37,38. In this study, we aim to explore the most dominant risk factors of ASCVD that affect a population of healthcare providers at Tabriz University of Medical Sciences through BNs.

Methods

Study population

The data were collected as a part of the larger AZAR Cohort Study, prospective epidemiological research conducted by the liver and gastrointestinal diseases research center of Tabriz University of Medical Sciences (TBZMED) in Iran39. The cohort aims to evaluate 3000 participants, including healthcare providers in hospitals, schools, and health networks of TBZMED. In 2020, a total of 500 persons participated in this study in a cross-sectional manner40.

Data collection

Our study involved conducting face-to-face health interviews and health examinations of full-time and long-term healthcare providers aged 18–75 years in hospitals, schools, and health networks of TBZMED. We excluded individuals who were pregnant or breastfeeders, or planned to retire within the next five years, and those with a history of debilitating psychiatric disorders or physical illnesses reported by a health professional. Participants provided information on their demographic characteristics, such as age, sex, marital status, and education level, as well as behavioral factors, including smoking status, and self-reported family history. All eligible healthcare providers, official staff, and lecturers of TBZMED were invited to participate in the study.

Ethical approval

All participants were informed of the study purpose before they gave consent to participate, and then filled and signed the informed consent and assent. The Institutional Review Board (IRB) of TBZMED approved the study (ethics code: IR.TBZMED.REC.1400.1006).

Measurements

Anthropometric measurements were performed according to the international standards for anthropometric assessment41 by trained technicians. Body weight was measured to the nearest 0.1 kg using an electronic scale (Seca 700 scale, Seca gmbh, Hamburg), while height was measured to the nearest 0.5 cm using a stadiometer (Seca 220 Telescopic Height Rod for Column Scales, Seca gmbh, Hamburg). Waist circumference (WC) was measured at the halfway point between the lower costal border and the iliac crest using a flexible steel tape (Lufkin Executive Thinline W 606, precision 1 mm)42. Body mass index (BMI) was calculated as weight (kg) divided by height (m) squared (kg/m2), and BMI values were categorized according to the World Health Organization's (WHO) criteria43. Serum samples were collected to assess total cholesterol (T-C), high-density lipoprotein cholesterol (HDL-C), fasting blood sugar (FBS), and triglycerides (TG). These samples were analyzed using Miura One automated equipment (I.S.E., Rome, Italy) and a commercial DiaSys kit (DiaSys Diagnostic Systems, Hamburg, Germany)44. Low-density lipoprotein cholesterol (LDL-C) was calculated using the Fried Ewald equation45. Metabolic syndrome (Mets) is a cluster of risk factors, including central obesity, high blood pressure, high blood sugar, and abnormal cholesterol levels. The diagnosis of Mets is based on the presence of three or more of these risk factors. The metabolic health status of each participant was defined based on the adult treatment panel-3 definition of Mets46.

Study variables

The participants were classified based on various risk factors level, including (non-ASCVD vs with ASCVD)47, (sex male vs female), (age under45 vs with over45)48, (non-smokers vs Past or current smokers), (without diabetes vs with diabetes), (without hypertension vs with hypertension), and (without Mets vs with Mets). Participants were classified depending on their BMI: normal weight (BMI ranges from 18.5 to 25 kg/m2), overweight (BMI ranges from 25.01 to 30 kg/m2), and obesity (BMI ranges from ≥ 30.01 kg/m2)49. LDL-C (mg/dL), FBS (mg/dL), T-C (mg/dL), and TG (mg/dL) levels were categorized as follows: LDL-C level49: normal (< 130 mg/dL), and high (≥ 160 mg/dL), FBS level51: normal (≤ 99 mg/dL), and high (≥ 100 mg/dL), TG level51: normal (< 150 mg/dL), and high (≥ 150 mg/dL), and T-C level49: normal (< 200 mg/dL), and high (≥ 200 mg/dL). Also, HDL-C (mg/dL) was separated into the following three levels: HDL-C level: low (< 45 mg/dL), normal (45–55 mg/dL), and high (> 55 mg/dL)50.

Bayesian networks

The BN models were developed and evaluated using a two-stage process, including (1) structural learning to determine the topology of the BN or DAG, and (2) parametric learning or estimation of CPs among the nodes, once the network topology was established. In our study, BN provided insight into how a group of ASCVD risk factors can influence the probability of occurring ASCVD, independent of sample size53. Our BNs are graphs with arcs linking nodes and no directed cycles, where our ASCVD risk factors and outcome variables are represented as nodes, and conditional dependencies between them are represented by directed edges or arrows52. Each node is associated with a CP table, which specifies the CP of each of its values for each combination of the parents' values53,54. Our procedure in BN modelling was to learn a BN structure by amalgamating algorithmic potency and expert insights with empirical evidence obtained through a systematic literature review. This rigorous approach ensures the relevance and significance of chosen variables in capturing intricate relationships and dependencies within the modeled system. This aligns with established research, as exemplified in studies by Ordovas et al.32 and Huang et al.55, which advocate for incorporating prior expert knowledge and comprehensive literature reviews in BN variable selection processes32,55. In other words, we decided to train two BNs; Bayesian search through an algorithmic approach and knowledge-based BN.

Structures from the literature

We aimed to predict the most suitable causal model for analyzing variables related to ASCVD, which models underlying risk factors of ASCVD, including age, sex, diabetes, smoking status, hypertension, BMI, FBS, T-C, LDL-C, HDL-C, and TG. To achieve this, we utilize BN and select the probabilistic models for our purposes by an amalgamation of algorithmic search and knowledge-based models32.

The BN structure in Fig. 1 illustrates the interconnections between these variables and their impact on ASCVD risk. The obtained structure of the algorithmic search network, as depicted in Fig. 1A, reveals the factors that influence ASCVD. Notably, age, smoking, and diabetes have a significant impact on ASCVD probability, Fig. 1A includes direct links between these predictors and ASCVD. Additionally, Age → Hypertension → FBS, since hypertension is non-conditional, then FBS and age would be d-connected.

Figure 1
figure 1

BN structures and probabilities for ASCVD.

The knowledge-based network structure depicted in Fig. 1B provides a concise overview of the factors influencing ASCVD via a systematic search in the literature. The network highlights the connections between modifiable and non-modifiable risk factors in predicting various ASCVD conditions. Notably, TC, TG, HDL-C, and FBS are indirectly associated with ASCVD through other risks14,29,56. However, sex is indirectly associated with ASCVD through diabetes (Sex → Diabetes → ASCVD). Also, the path between TG and ASCVD is blocked after conditioning on HDL-C, a d-separation. TG is conditionally independent of ASCVD given HDL-C (TG → HDL-C → ASCVD), another example of a d-separation in this model. On the other hand, ASCVD is directly associated with smoking, hypertension, diabetes, BMI, HDL-C, LDL-C, cholesterol, and FBS57,58. Ultimately, our goal is to provide a precise and easily understandable prediction of ASCVD risk by analyzing the relationships between variables29,62.

By models we assumed d-separation, and the model consists of a set of independent predictors that lead to the outcome variable. D-separation is a criterion used in BN to determine whether two sets of variables are independent of each other given a third set of variables, and conditional independence between variables can be directly inferred from the graph using the d-separation criterion59,60,61.

Statistical analysis

The Bayes search BN was built using GeNIe Academic Version 4.1.3402.0 (Built on 2023-10-03; License ID: 6c8hwje30dfnjbukdej30zg76), and the knowledge-based BN was built utilizing Netica 6.05 (Norsys software corp, USA)63, and the BNs were drawn using Netica. Categorical data were presented as count (percentage), and P-values were computed using Fisher’s exact test. We compared our 2 different structures using Akaike Information Criteria (AIC), and Bayesian Information Criteria (BIC) values. A smaller AIC and BIC value indicate a better structure. Furthermore, diagnostic indices including sensitivity, accuracy, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR−), and particularly area under the ROC curve (AUC) were calculated for comparing the BNs. We used the leave-one-out cross-validation procedure, as a standard method, to compute the AUC, accuracy, and diagnostic indices. Finally, based on the best-suited BN structure, CPs ASCVD and non-ASCVD were calculated in the datasets.

Consent to participate

All participants filled out and signed the informed consent and assent. The participants' privacy was preserved. All methods were carried out according to relevant guidelines and regulations.

Results

Out of 500 participants, 491 (98.2%) completed the study with a mean age of 43.2 (SD: 7.2, min–Max: 24.0–67.0) years and a prevalence of ASCVD equals to 7.7% (95%CI: 5.5–10.5). ASCVD was exclusively observed in males (100%) and patients aged over 45 years (89.4%). The majority of participants did not have dyslipidemia, with 50.0%, 52.6%, and 55.3% showing normal LDL-C, T-C, and TG. Most risk factors were within normal range with 78.9%, 63.2%, 42.1%, 84.6%, and 81.6% of participants having normal FBS, hypertension, and diabetes. Fisher’s exact tests revealed significant differences (P < 0.05) between the ASCVD and non-ASCVD groups for sex, age, diabetes, smoke, Mets, TG, T-C, LDL-C, and HDL-C, but not for hypertension, BMI, and FBS (all p < 0.05). Also, older individuals had 21.7% (95%CI: 15.3–29.1) higher ASCVD rates compared to younger ones. Similarly, males exhibited 12.1% (8.3–16.2) higher rates compared to females. Individuals with past or current smoking demonstrated 14.4% (4.9–28.0) higher ASCVD rates compared to individuals who have never smoked (all p < 0.05). Individuals with diabetes had 28.4% (11.3–50.2) higher ASCVD rates compared to those without diabetes. In individuals with Mets, the ASCVD rate was found to be 10.5% (4.3–18.5). Additionally, individuals with normal levels of TG, T-C, HDL-C, and LDL-C had 8.5%, 7.3%, 9%, and 11.3% (2.6–16.1, 1.8–14.1, 0.4–4.1, and CI: 5.1–19.3) higher ASCVD rates, respectively, compared to those with high levels of TG, T-C, HDL-C, and LDL-C (All p < 0.05). Refer to Table 1 for more detailed participant information.

Table 1 Sociodemographic and clinical characteristics in the study population ASCVD cases and non-ASCVD.

We evaluated the quality of the BN models using AIC and BIC measures, and the results are summarized in Table 2. Lower values of AIC and BIC are indicative of a better model fit. The results suggest that knowledge-based BN with lower AIC and BIC could be considered an appropriate representation of the data.

Table 2 AIC and BIC values for comparing the different BN structures.

Figure 2 shows ROC curves of various BN models under different methods: (1) BN constructed by Algorithmic search network structure; (2) BN constructed by Knowledge-based network structure.

Figure 2
figure 2

Receiver operating characteristics curves of the Bayesian Networks.

Predictive performance of BN models

The diagnostic indices showed better performance for the knowledge-based BN (AUC = 0.78, Accuracy = 76.6, Sensitivity = 62.5, NPV = 96.0, LR− = 0.48) compared to Bayesian search (AUC = 0.76, Accuracy = 72.4, Sensitivity = 17.5, NPV = 93.2, LR− = 0.83). However, knowledge-based BN performed subordinately compared to Bayesian search in terms of specificity (77.8 vs 98.9), PPV (19.7 vs 58.3), and LR + (2.8 vs 16.1) (Fig. 3).

Figure 3
figure 3

Model comparisons.

Generally, we decided on knowledge-based BN as the better-performing model regarding its better diagnostic indices and lower values of AIC and BIC compared to the Bayesian search. Additionally, as the arrows for knowledge-based BN were obtained based on a systematic search from the literature, clinical interpretability of the relationships in knowledge-based BN is guaranteed (Table 3).

Table 3 Diagnostic indices of BN models.

Conditional probabilities of the Bayesian network

CPs of ASCVD based on knowledge-based BN structure are shown in Table 4. The CP of developing ASCVD using smoke given hypertension is equal to 18.4%. The findings revealed a varying CP of ASCVD occurrence associated with different BMI levels, conditioned on diabetes equal to 10.3%. Also, the CP of ASCVD occurrence about age, conditional on hypertension equal to 11.7%. Notably, the CP of ASCVD for FBS given hypertension is equal to 13.1%. Also, CPs of other variables based on knowledge-based BN structure are shown in Table S1.

Table 4 Conditional probabilities of BN for ASCVD & non-ASCVD once variables are instantiated to different values.

Strength of influence of the relationship

The strength of the relationship between variables in Table 5 indicates that in knowledge-based BN, variables such as diabetes (0.017), hypertension (0.016), FBS (0.016), and LDL-C (0.016) have the greatest influence on the ASCVD variable. For more details on the strength of the relationships among BNs, refer to Table S2.

Table 5 Strength of influence of the relationship in knowledge-based BN.

Discussion

In our study of 491 participants, ASCVD was exclusively observed in males and patients aged over 45 years, with a prevalence of 7.7%. Our BN models showed a good fit, and their predictive performance for ASCVD risk factors was accurate. The CPs revealed that being male, aged over 45, having diabetes, Mets, and other risk factors increased ASCVD risk, while high HDL-C reduced it. These results provide valuable insights into ASCVD risk factors and can aid in developing effective prevention, predicting various conditions, supporting health research, and determining relevant findings.

BN models have been increasingly used in the field of cardiovascular risk prediction due to their ability to model complex relationships among risk factors and incorporate prior knowledge into the model. Several studies have demonstrated the effectiveness of BN models in predicting cardiovascular risk, such as predicting coronary heart disease risk in Korean adults64, identifying important risk factors for stroke in the Chinese population65, and predicting major cardiovascular events in patients with hypertension66. These studies highlight the potential of BN models in improving CVD risk prediction and helping clinicians make more informed decisions.

In one study, two BN models were developed to predict ASCVD risk using data from a large population-based cohort. The model included demographic factors, ASCVD risk factors, and their inter-relationships. The performance of the models was evaluated using various measures, including sensitivity, specificity, accuracy, PPV, NPV, LR+, LR−, and AUC. The results showed that the knowledge-based BN model had good predictive performance, and identified several risk factors associated with ASCVD, such as age, sex, smoking status, hypertension, and lipid levels. Overall, the BN approach provides a promising tool for predicting cardiovascular risk and can aid in the development of personalized prevention strategies67.

The finding that ASCVD was exclusively observed in males and patients aged over 45 years is consistent with previous research on cardiovascular risk factors. Multiple studies have identified being male and higher age as independent risk factors for ASCVD68,69,70. This may be due to hormonal differences between men and women, as well as changes in the cardiovascular system that occur with aging, such as endothelial dysfunction and arterial stiffening71,72. It is important to note, however, that the present study did not identify age or sex as significant predictors of ASCVD in the BN models. This may be due to the complex interplay between multiple risk factors and the non-linear relationships between them. Further research is needed to fully understand the relative contributions of age, sex, and other risk factors to the development of ASCVD.

The results of the study suggest that several traditional risk factors, including diabetes and Mets, are associated with an increased risk of ASCVD. This finding is consistent with previous research that has identified diabetes as a strong predictor of CVD73. Mets, which is characterized by a cluster of metabolic abnormalities including abdominal obesity, dyslipidemia, and insulin resistance, has also been shown to be a strong predictor of CVD74. In addition to these risk factors, the study found that high HDL-C was associated with a reduced risk of ASCVD. This is consistent with previous research that has identified HDL-C as a protective factor against CVD75. The findings of this study underscore the importance of identifying and managing traditional risk factors for ASCVD, as well as the potential benefit of interventions to increase HDL-C levels.

Strengths and limitations

Strengths of this study include employing the BN models, which allow for the modelling of complex relationships among various risk factors and the incorporation of prior knowledge into the model. Additionally, the study identified several traditional risk factors associated with an increased risk of ASCVD, such as diabetes and Mets, as well as a protective factor, high HDL-C. The study provides valuable insights into ASCVD risk factors and can aid in developing effective prevention and management strategies, and facilitate treatment.

One potential limitation of the study is the small sample size of 491 participants, which may limit the generalizability of the findings to larger populations. Future studies with larger sample sizes could help confirm the results and identify additional risk factors associated with ASCVD. The absence of ASCVD among female participants in this study can be attributed to several factors, including the low number of women in this study limiting the generalizability of the findings. Additionally, this study exclusively focused on healthcare providers, which may have influenced the lack of ASCVD cases among female participants. Another limitation is the lack of inclusion of certain risk factors, such as family history, which may be important predictors of ASCVD. Future studies could incorporate additional risk factors into the model to improve its accuracy. Another shortcoming of the study is having to discretise continuous variables into categorical variables for the BN models. Though clinical guidelines are much better written with categorical variables, however, this brings some loss of information in the model when discretising continuous variables. Future direction is recommended to assess how this process affects the statistical information provided in the BN models. As well as finding ways to incorporate the continuous variables into BN models.

Conclusion

In conclusion, the study provides valuable insights into ASCVD risk factors and demonstrates the potential of BN models in predicting cardiovascular risk in a large population-based cohort. The BN models showed good fit and accurate predictive performance for ASCVD risk factors including age, sex, smoking status, hypertension, lipid levels, diabetes, and Mets. The study found that high HDL-C was associated with a reduced risk of ASCVD. The findings underscore the importance of identifying, preventing, and managing traditional risk factors for ASCVD, as well as the potential benefit of interventions to increase HDL-C levels. Overall, the BN approach provides a promising tool for predicting cardiovascular risk and can aid in the development of personalized prevention strategies and health policymaking. However, the study's limitations, including a small sample size, and the complexity of the interplay between risk factors should be taken into account when interpreting the results. Further research is needed to fully understand the complex interplay between multiple risk factors and non-linear relationships between them, as well as to validate the study findings and improve our understanding of ASCVD risk factors.