1 Introduction

Coronary artery disease (CAD) is a condition when the coronary arteries become narrow or blocked. It is developed when bad cholesterols and plaques (fatty droplets) deposit inside the wall of arteries. The process is termed as atherosclerosis which means clogging of arteries, and reduces blood flow inside the heart muscle. Blood carries oxygen and essential nutrients to the heart [1]. Lack of sufficient blood supply can cause angina (chest pain), and lead to a heart attack by injuring heart muscle. The death toll due to heart disease is 16.3 million in America each year which has made it the leading cause of death in the United States. According to the American Heart Association (AHA), one person is suffered from a heart attack in every 40 s. Having zero risk factors of heart disease, any male has 3.6% and any female has less than 1% chance of getting cardiovascular disease in his/her lifetime. Moreover, the chances are 37.5% and 18.3 respectively [2] for having 2 risk factors. In Bangladesh, CAD is responsible for a 17% mortality rate [3]. The regular diagnostic approach of CAD relies on coronary angiogram test [4], echo-cardiogram ram (ECG) [5, 6], nuclear scan test and exercise stress test. ECG and exercise stress do not produce sustainable results for CAD prediction due to their non-invasiveness properties and numerous biases. Moreover, walking on a trade mill in a stress test makes the patient discomfort heart function than normal condition. Nowadays, support vector machine (SVM) [7, 8] and artificial neural network (ANN) [5, 8,9,10,11,12,13,14,15,16,17] based Clinical Decision Support Systems (CDSS) [18,19,20] are developed for CAD prediction. Unfortunately, SVM and ANN have no direct impact on the reasoning process due to their black-box- type modeling approaches. As a result, the degree of significance of individual factors cannot be resolved. So, human judgment and clinical data are both two essential factors for CAD diagnosis. For this purpose, CDSS combines both historical data and doctors’ domain specific knowledge. But clinical data, like clinical domain knowledge, signs, and symptom, contain various uncertainties [21,22,23], and pose challenges for selecting domain knowledge to construct knowledge base. Moreover finding the reasoning under uncertainty requires an excellent computational algorithm. To mitigate the challenges, researches introduced different CDSSs, based on the fuzzy interface system and Bayesian interface system, which also has limitations [10, 24,25,26]. In this paper, the proposed expert system can predict CAD by five classifications according to the severity. They are as follows:

Class A: (Normal or zero sign of heart disease).

Class B: (Unstable angina) - when new symptoms are introduced beside regular stable angina, and appears frequently (mostly when at rest), last long with more severity, and can lead to a heart attack. It can be treated with oral medications (such as nitroglycerine).

Class C: (Non-ST segment elevation myocardial infarction) - echocardiogram does not indicate the symptom of this type of myocardial infarction (MI) but chemical markers in the blood show the damage of heart muscles. The damage may not be significant and artery blockages are usually partial or temporary.

Class D: (ST-segment elevation myocardial infarction) - this type of MI is occurred quickly due to sudden blockage by blood clogging. It can be detected by ECG and chemical markers in the blood, and causes damage to vast heart muscles.

Class E: (Silent ischemia)- Patient with heart disease can be suffered from a sudden heart attack (called silent ischemia) without any prior or early warning, and diabetic patients are common victims of this type [1].

2 Related Research

Researchers recently worked on machine learning and rule-based systems for different purposes [30,31,32,33]. Many researchers developed belief-rule-based interference methodology by using evidential reasoning (RIMER) for CAD diagnosis [18, 27]. The RIMER process uses belief-rule-base for modeling clinical domain knowledge, and applies an evidential reasoning approach for implementing reasoning. Studies show that RIMER based clinical decision support systems are highly efficient in supporting and interacting with clinical domain knowledge under uncertainty. In [28], Multi-Criteria Decision Making Methods were presented for accessing CAD under uncertainty where presence and absence of CAD is predicted through using symptom and signs of CAD. But these approaches report neither the number of blocked arteries nor the significance of severity of the disease [8, 16, 26, 28, 29]. Weak parameters, like signs and symptoms, are used for predicting CAD as well as for predicting the similar types of diseases like mitral regurgitation, dilated cardiomyopathy, congenital heart disease, hyper-tropic cardiomyopathy, myocardial infarction etc. Some researchers developed the Medical Decision Support System (MDSS) to predict CAD. Other proposer polygenic risk scores (PRS), a nonlinear, for CAD prediction with accuracy an 0.92 under the receiver operating curve (AUC) [8].

Experimental analysis reveals that CAD diagnosis and its severity can be predicted significantly through clinical features along with pathological and demographic features [23, 25, 26, 28]. In this paper, we consider all these parameters, and proposed a cooperative-belief-rule based prototype (CDSS) to assist doctors for CAD analysis under uncertainty.

3 Methodology

3.1 Proposed Rule Based Expert System for CAD

In this paper, five separate BRBs are developed based on five distinct feature sets of patients such as i) patients’ pathological features, ii) patients’ physiological features, iii) patients’ demographic features, iv) patients’ behavioral features, and v) patients’ non-modifiable risk factors. The BRBs are as follows:

$$ D_{A} = f_{A} \left( {{\text{S}}, P_{A} } \right) $$
(1)
$$ D_{B} = f_{B} \left( {{\text{T}}, P_{B} } \right) $$
(2)
$$ D_{C} = f_{C} \left( {{\text{X}}, P_{C} } \right) $$
(3)
$$ D_{D} = f_{D} \left( {{\text{Y}}, P_{D} } \right) $$
(4)
$$ D_{E} = f_{E} \left( {{\text{Z}}, P_{E} } \right) $$
(5)

Here, S = {\( a_{1} ,a_{2} , \ldots ..a_{l} \) }, T = {\( b_{1} ,b_{2} , \ldots ..b_{m} \) }, X = {\( c_{1} ,c_{2} , \ldots ..c_{n} \) }, Y = {\( d_{1} ,d_{2} , \ldots ..d_{o} \) }, Z = {\( e_{1} ,e_{2} , \ldots ..e_{p} \) } represent the demographic, physiological, clinical, behavioral, and Non-modifiable features respectively (where l, m, n, o, and p indicate attributes` number for factors).

Suppose that \( P_{A} ,P_{B} , P_{C} ,P_{D} \, {\text{and}}\,P_{E} \) are the corresponding vectors for the five BRBs, and ω = [\( \upomega_{1} ,\upomega_{2} ,\upomega_{3} ,\upomega_{4} ,\upomega_{5} \)] represent the weight coefficients to the relative BRB where \( f_{A} , f_{B} ,f_{C} , f_{D} \,{\text{and}}\, f_{E} \) functions are for demographic, physiological, clinical, behavioral, and non-modifiable factors. To calculate the individual matching degree for each rule, the following equation is used:

$$ \alpha_{i,j} = \frac{{u\left( {A_{i,j\, + \,1} } \right) - x_{i} }}{{u\left( {A_{i,j\, + \,1} } \right) - u\left( {A_{i,j} } \right)}} $$
(6)

Where u is utility value, aij is individual matching degree, Aij is jth referential value for ith attribute, and xi is the input for ith antecedent (Fig. 1).

Fig. 1.
figure 1

Rule Based Expert System to assess CAD

To calculate activated weight to each rule the following equation is used:

$$ w_{k} = \frac{{\theta_{k} \alpha_{k} }}{{\mathop \sum \nolimits_{i = 1}^{L} \theta_{i} \alpha_{i} }} $$
(7)

Where wk is the kth rule’s activation weight and ak is the interrelation between attributes. To calculate ak, the following equations is used:

$$ \alpha_{k} = \prod\nolimits_{i = 1}^{M} {\left( {\alpha_{i}^{k} } \right)^{{\bar{\delta }_{i}^{k} }} } $$
(8)
$$ \overline{\delta }_{i} = \frac{{\delta_{i} }}{{max_{i = 1, \ldots ,M} \left( {\delta_{i} } \right)}} $$
(9)

Where \( \bar{\delta }_{i} \) is the antecedent weight and \( \alpha_{i}^{k} \) represents individual matching degrees for ith attribute. Five separate BRBs to predict CAD are BRB_P, BRB_PH, BRB_D, BRB_B, and BRB_N. BRB_PH considers physiological factors like blood pressure and stress. BRB_P considers pathological factors like blood sugar level, low density lipoprotein, and triglyceride level. BRB_D considers factors like age and body mass index. BRB_B considers behavior factors like diet, smoking, and physical activities. BRB_N considers non-modifiable risk factors like gender, family history, and residential Area.

3.2 Uncertainties in the Attribute

Attributes like blood pressure, stress, blood sugar, lipoprotein, triglyceride, age, body mass index, unhealthy diet, smoking, family history, and race are categorized into five classes, namely Physiological, Pathological, Demographical, Behavioral, and Non-modifiable risk factors. All the attributes have uncertainties at some level except gender attribute (Table 1).

Table 1. Uncertainties in the attributes

3.3 Explanation of Antecedent Attributes

Five different types of attributes have been considered in this research. Explanations of the numerical values of each attribute are as follows:

Physiological Factor

Blood Pressure (BP)

The blood pressure which creates heartbeats is known as BP. For BP, several numerical points namely Usual, Elevated, Hypertension Stage 1, Hypertension Stage 2, Hypertension Stage 3 are reflected and shown in Table 2.

Table 2. Numerical values for blood pressure

Here, the referential numerical points are presented as in the Eq. (10).

$$ PH1\;\upepsilon \;\left\{ {{\text{U}},\;{\text{E}},\;{\text{H}}1,\;{\text{H}}2,\;{\text{H}}3} \right\} $$
(10)

Stress Score (SS)

Intermediate risk of heart problems can be expressed in the SS score. It can be distributed into several referential numerical points, namely regular, mildly irregular, moderately irregular and severely irregular are shown in Table 3.

Table 3. Numerical value for stress score

The referential numerical values can be presented as in Eq. (11).

$$ {\text{PH}}2\;\upepsilon \;\left\{ {{\text{R}},{\text{M}},\;{\text{MI}},\;{\text{S}}} \right\} $$
(11)

Pathological Factor

Blood Sugar Level.

It is the amount of sugar in the blood. Five referential points are shown in Table 4 and expressed by the Eq. (12).

Table 4. Numerical points for blood sugar
$$ {\text{P}}1\;\upepsilon \;\left\{ {{\text{FA}},\;{\text{BM}},\;{\text{A}},\;{\text{B}},\;{\text{BT}}} \right\} $$
(12)

Its measures triglycerides amount in blood. Four referential points are described in Table 5 and Eq. (13).

Table 5. Numerical point for triglyceride
$$ {\text{P}}2\;\upepsilon \;\left\{ {{\text{H}},{\text{BH}},{\text{H}},{\text{EH}}} \right\} $$
(13)

Low Density Lipoprotein (LDL)

It contains both lipid and protein, and carries cholesterol to body tissues. Five referential values related to LDL are shown in Table 6 and Eq. (14).

Table 6. Numerical point for LDL
$$ P3\;\upepsilon \;\left\{ {{\text{D}},{\text{NO}},{\text{BH}},{\text{H}},{\text{VH}}} \right\} $$
(14)

Demographic Factor

Age

Older people are more likely to be victims of coronary artery disease, especially, after the age of 65 years. Usually, aged people have higher chance of getting CAD (Table 7).

Table 7. Numerical values for age

Four referential values, namely, young (< 35 years), mature (35–49 years), old (50–65), and extremely old (E), have been considered in the following equation from the above table.

$$ {\text{D}}1\;\upepsilon \;\left\{ {{\text{Y}},\;{\text{M}},\;{\text{O}},\;{\text{E}}} \right\} $$
(15)

Body Mass Index (BMI)

It indicates the amount of fat ratio. It is applicable for the age range from 18 to 65. It is the ratio of weight to height (Table 8).

Table 8. Numerical points for BMI

Four referential values, namely, healthy weight (18.5–24.9), overweight (25–29.9), obese (30–39.9), and morbidly obese (>=40), have been considered in the following equation from the above table.

$$ {\text{D}}2\;\upepsilon \;\left\{ {{\text{H}},\;{\text{O}},\;{\text{OB}},\;{\text{MO}}} \right\} $$
(16)

Behavior

Unhealthy Diet

Mediterranean diet can reduce the risk of CAD by 30%. It is mainly plant based food and categorized into four sections shown in Table 10 and expressed by Eq. (17) (Table 9).

Table 9. Numerical values of diet
$$ {\text{B}}1\;\upepsilon \;\left\{ {{\text{L}},\;{\text{H}},\;{\text{M}},\;{\text{ED}}} \right\} $$
(17)

Smoking

Smokers or exposers to smoke have a high risk of CAD. Smoking is categorized into four sections, and shown in Table 10 and expressed by Eq. (18).

Table 10. Numerical values for smoking
$$ {\text{B}}2\;\upepsilon \;\left\{ {{\text{NS}},\;{\text{S}},\;{\text{MS}},\;{\text{CS}}} \right\} $$
(18)

Physical Activities

Inactive and less active people are at high risk to develop CAD. Physical activities are categorized into four sections which are shown in Table 11 and expressed by Eq. (19).

Table 11. Numerical values for physical activities
$$ {\text{B}}3\;\upepsilon \;\left\{ {{\text{I}},\;{\text{LA}},\;{\text{A}},\;{\text{VA}}} \right\} $$
(19)

Non-modifiable risk Factors

Gender

Male has higher risk of CAD than female. Besides, male suffers from CAD in earlier age than female. But after the age of 70 years, both males and females have similar chances of getting heart disease (Table 12).

Table 12. Numerical values for gender
$$ {\text{N}}1\;\upepsilon \;\left\{ {{\text{M}},\;{\text{F}},\;{\text{O}}} \right\} $$
(20)

Family History

If parents have histories of heart disease, children have a high risk of developing CAD. The risk is even higher if parents have suffered before early 50 years of age. The numerical points for the family history are represented by 0 (No history of parent’s heart disease), 1 (History of parent’s heart disease), and 2 (History of parent’s heart disease before age of 50), and expressed in Table 13 and by Eq. (21).

Table 13. Numerical values for family history
$$ {\text{N}}2\;\upepsilon \;\left\{ {{\text{L}},\;{\text{M}},\;{\text{H}}} \right\} $$
(21)

Residential Area

People from mega-cities are more prone to CAD. This is because of a higher rate of diabetes and obesity. On the other hand, people from hill track areas are less likely to develop heart disease. The numerical points for the residential areas are 0(Mega City), (Rural Area), and 2 (Hill track area), and expressed in Table 14 and by Eq. (22).

Table 14. Numerical values for residential area
$$ {\text{N}}3\;\upepsilon \;\left\{ {{\text{L}},\;{\text{M}},\;{\text{H}}} \right\} $$
(22)

3.4 Rule Base

All attributes from Eqs. (10) to (22) are applied as input variables to predict the CAD class. Sub rules 1 to 20 are expressed in Table 15 for the two Physiological factors from Eqs. (10) and (11).

Table 15. Attributes’ sub rule-base for physiological factors

A sub rule of the CAD can be shown as:

R3: IF blood pressure is usual AND stress score is significantly irregular

THEN Overall Prediction is

{(Stage 1, 0.6), (Stage 2, 0.3), (Stage 3, 0.1), (Stage 4, 0.0), (Stage 5, 0.0)}

In the R3, the antecedent attributes are and the consequence attributes are. The rule shows that patient with usual blood pressure and significantly irregular stress score has the probability of developing CAD are (Stage 1 is 60%), (Stage 2 is 30%), (Stage 3 is 10%), (Stage 4 is 0%), (Stage 5 is 0%). The summation of all referential values for R3 is (0.6 + 0.3 + 0.1 + 0.0 + 0.0 =) 1. If the summation of all referential values for a particular rule is 1, we can say that the rule is competed. For some missing attributes or ignorance, the summation may be less than 1 and the rule is incomplete [34].

3.5 Data Set Description

Dataset was collected from the National Heart Foundation, Bangladesh with proper authorization. The data set description is shown in Table 16.

Table 16. Summary of patients’ data

4 Result and Discussion

In the binary diagnostic test, a positive or negative diagnosis is made for each patient. When the result of diagnosis is compared to the true condition, we find four possible outcomes: true positive, true negative, false positive, false negative.

4.1 Success Rate

It is the ratio of correctly identified patient’ numbers and total patients. Equation (23) is used to calculate the success rate and average success rate.

$$ {\text{Success}}\;{\text{Rate}}\,{ = }\,\frac{{{\text{Number}}\;{\text{of }}\;{\text{correctly}}\;{\text{identified}}\;{\text{patients}}}}{{{\text{Total}}\;{\text{patients}}}}\,{\text{X }}\,100\% $$
(23)

4.2 Error Rate

It is the ratio of incorrectly identified patients’ numbers and total patients. Equation (24) is used to calculate the error rate and average error rate.

$$ {\text{Error}}\;{\text{Rate}}\, = \,\frac{{{\text{Number}}\;{\text{of}}\;{\text{patients}}^{{\prime }} \;{\text{incorrectly}}\;{\text{identified}}}}{{{\text{Total}}\;{\text{number}}\;{\text{of}}\;{\text{patients}}}}\, *\, 100\% $$
(24)

4.3 Failure Rate

It is the ratio of the number of non-recognized patients to total patients. Equation (25) is used to calculate the failure rate and average failure rate.

$$ {\text{Failure}}\;{\text{Rate}}\, = \, \frac{{{\text{Number}}\;{\text{of}}\;{\text{non}}\;{\text{recognised}}\;{\text{patients}}}}{{{\text{Total}}\;{\text{patients}}}}\,{\text{X }}\,100\% $$
(25)

4.4 False Omission Rate (fOR)

False omission rate is the ratio the of number of patients identified to a class A to the total number of patients belongs to a particular class except for class A. Eq. (26) is used to calculate the false omission rate and average false omission rate.

$$ FOR\, = \,\frac{{{\text{Number}}\;{\text{of}}\;{\text{patients}}^{{\prime }} \;{\text{identified}}\;{\text{to}}\;{\text{a}}\;{\text{class}}\;{\text{A}}}}{{{\text{Total}}\;{\text{number}}\;{\text{of}}\;{\text{patients}}\;{\text{belong}}\;{\text{to}}\;{\text{a}}\;{\text{particular}}\;{\text{class}}\;{\text{except}}\;{\text{class}}\;{\text{A}}}}\,{\text{X }}\,100\% $$
(26)

Table 17 explains the results obtained by the equation number (23), (24), (25), and (26). Class A is considered as CAD negative patients and the remaining classes are CAD positive patients. It is observed that the success rate of predicting class C type heart disease is the highest (94.08%) among five classes. On the other hand, the success rate of class E prediction is the lowest (50% only). Class E is very hard to predict as most of the time it does show any symptoms.

Table 17. Success rate, failure rate, error rate, false omission rate by expert system

5 Conclusion

Heart disease is one of the major threats to public health and the reason for the main cause of death worldwide. Although numerous researches are carried out in this area, still there are challenges to diagnose CAD for treatment. In this paper, the proposed expert system results in an average accuracy rate of 89.90% which is the highest among other existing CDSS. The average false omission rate (3.54%) is also the lowest in this system than that of other CDSS. Our test results satisfy one of the main goals of this research. The average failure rate (11.10%) and average error rate (8.81%) also remain as marginal. Class E (silent Ischemia) success rate is the lowest among all classes. The reason is that Class E occurs suddenly without showing any warning signs of heart problems. It was noted that Class E is common to people with diabetes. It requires further research work to investigate whether or not diabetes influences the success rate in Class E type patients. Apart from this, our research concludes that RBES has a higher success rate and false omission rate than other existing CDSS.