Machine learning models can predict the presence of variants in hemoglobin: artificial neural network-based recognition of human hemoglobin variants by HPLC

Süheyl Uçucu; Talha Karabıyık; Fatih Mehmet Azik

doi:10.1515/tjb-2022-0093

Open Access Published by De Gruyter November 7, 2022

Machine learning models can predict the presence of variants in hemoglobin: artificial neural network-based recognition of human hemoglobin variants by HPLC

Süheyl Uçucu , Talha Karabıyık and Fatih Mehmet Azik

From the journal Turkish Journal of Biochemistry

https://doi.org/10.1515/tjb-2022-0093

Abstract

Objectives

This article presents the use of machine learning techniques such as artificial neural networks, K-nearest neighbors (KNN), naive Bayes, and decision trees in the prediction of hemoglobin variants. To the best of our knowledge, this is the first study using machine learning models to predict suspicious cases with HbS or HbD Los Angeles carriers state.

Methods

We had a dataset of 238 observations, of which 128 were HbD carriers, and 110 were HbS carriers. The features were age, sex, RBC, Hb, HTC, MCV, MCH, RDW, serum iron, TIBC, ferritin, HbA2, HbF, HbA0, retention time (RT) of the abnormal peak, and the area under the peak of the abnormal peak. KNN, naive Bayes, decision tree models, and artificial neural network models were trained. Model performances were estimated using 7-fold cross-validation.

Results

When RT, the key point of differentiation used in high-performance liquid chromatography (HPLC), was included as a feature, all models performed well. When RT was excluded (eliminated), the deep learning model performed the best (Accuracy: 0.99; Specificity: 0.99; Sensitivity: 0.99; F1 score: 0.99), while the naive Bayes model performed the worst (Accuracy: 0.94; Specificity: 0.97; Sensitivity: 0.90; F1 score: 0.93).

Conclusions

Deep learning and decision tree models have demonstrated high performance and have the potential to be integrated into medical laboratory work practices as a tool for hemoglobinopathy detection. These outcomes suggest that when machine learning models are fed enough data, they can detect a wide range of hemoglobin variants. However, more comprehensive studies with data from a larger number of patients and hemoglobinopathies will be useful for validating our models.

Keywords: artificial neural network (ANN); deep learning; Hb D Los Angeles; k-nearest neighbors (KNN); sickle cell carrier

Introduction

The world is changing rapidly. Machine learning algorithms are increasingly important due to their success and benefit in biomedical applications. Machine learning was used successfully in many biomedical applications, such as the prediction of the 3D structure and function of proteins from DNA sequences, the decoding of motor cortex activity to send signals to neuroprosthetic devices, the prognosis estimation of diseases, and the determination of the life expectancy of cancer patients. We witness how essential it is in shaping the future. Big data and machine learning have broadened new horizons in health and many other fields [1, 2].

Machine learning is a branch of artificial intelligence that uses data to improve predictions and decisions. It builds models based on training data, which are typically collected from various sources. Through machine learning, a computer can perform simple tasks without explicitly being programmed to do them. A machine learning algorithm tries to estimate the pattern in the data. An error function is a type of test that a machine learning algorithm has to perform to evaluate its predictions. If the model can fit better to the data points in the training, then weights are adjusted to improve its accuracy. The algorithm repeats these evaluations and optimization processes to obtain the best model under the constraints of the learning phase. Artificial neural networks (ANN), a type of machine learning, are built by modeling human neural networks structurally and functionally. Signals (data) transmitted to neurons can be transferred to each other via connections between neurons, much like a biological network of neurons. Artificial neural networks can learn through experience and achieve more specific results as the neural network deepens. ANN and machine learning algorithms outperform the human mind in detecting patterns in data having complex and non-linear relationships [1, 3].

Machine learning could be promising, particularly for hemoglobinopathies, which affect more than 300 million people worldwide each year and are frequently misdiagnosed due to clinical, complete blood count (CBC), and electrophoretic similarities [4, 5]. Every year, hundreds of thousands of babies are born with sickle cell trait worldwide [6]. The sickle cell trait, which affects five percent of the world’s population, and Hb D-Los Angeles (HBB: c.364G>C) carriers have clinical, CBC, chromatographic and electrophoretic similarities [6, 7]. Hb D-Los Angeles variant has similar alkaline electrophoretic mobility (electrophoresis method), concentration, and high-performance liquid chromatography (HPLC) profiles to Hb S (HBB: c.20A>T) [7]. As a result, differential diagnosis of these hemoglobinopathies is difficult. Furthermore, genotype determination is not usually performed because they are regarded as carriers and are considered harmless. However, recent research has shown that the Hb S carrier is not completely harmless and can cause various clinical complications, including acute pain crisis, venous thromboembolism, shock, chronic kidney disease, spleen infarcts, and stroke [].

Besides, different co-inherited mutations that may accompany carrier status can significantly worsen the clinical phenotype. Misdiagnosis may also have an impact on future generations. Whether heterozygous or homozygous, Hb D inheritance does not result in a clinically significant phenotype. On the other hand, its association with beta-thalassemia or Hb S results in a variable clinical phenotype ranging from mild to severe hemolytic anemia [12].

HPLC, classical electrophoresis, and capillary electrophoresis are frequently used methods for identifying hemoglobin variants. Molecular diagnostic methods such as multiplex-polymerase chain reaction (PCR) tests and DNA sequencing are used for definitive diagnosis. However, these tests are not available in small centers, are more expensive, and must be interpreted by trained professionals [7]. When DNA sequencing is impossible or HPLC is insufficient, neural networks or other machine learning models can be used as fast and accurate aids in interpreting hemoglobinopathy. Machine learning models can assist laboratory professionals in the detection of hemoglobinopathies.

Generally, HPLC is more effective at separating Hb S and Hb D-Los Angeles, but this is not always possible. Both diseases, which have similar CBC, can be distinguished using HPLC retention time (RT). The gradient program, which allows for elution patterns, makes HPLC superior. It alters hemoglobin’s total charge and conformation by constantly changing the pH during analysis. It is possible to determine the solubility of hemoglobin and RT in this way [13, 14]. However, due to the gradient program software differences and the discrepancies between HPLC models, Hb S and Hb D cannot always be separated from each other reliably [7, 14].

Red blood cell indices, Hb A, Hb A2, Hb F, abnormal hemoglobin values, and RT values can be used to train machine learning models to predict hemoglobin variants based on these data. The effects of RT on the machine learning model prediction performance can be evaluated by training machine learning models without this data. The aim of the study is to create machine learning models that will assist medical laboratories in differentiating between sickle cell and Hb D Los Angeles carriers using artificial neural networks and other machine learning methods. Here, we report the performance metrics of our trained models.

Materials and methods

In this retrospective study, 90 (38%) women and 148 (62%) men between the ages of 11 and 67 who applied to Mugla Sitki Kocman University Training and Research Hospital Thalassemia Diagnosis, Treatment, and Research Center between 01.01.2015 and 01.06.2021 were included. Patients with a history of surgery, chemotherapy, having any infective disease, cholestasis, thyroid dysfunction, acute-chronic hepatitis, and liver or other organ failures were excluded from the study. The Ethics Committee of Mugla Sitki Kocman University Training and Research Hospital granted ethics approval with the decision no. 111 on 01.06.2021. The study was conducted in accordance with the principles of the Declaration of Helsinki.

The patients’ data were obtained from the database of Mugla Sitki Kocman University Thalassemia Diagnosis, Treatment, and Research Center. Sysmex XN 1000 (Sysmex Diagnostics, Japan) was used to measure the red blood cell index parameters. Primus Ultra II (Trinity Biotech Diagnostic, Ireland) was used to analyze hemoglobin variants using HPLC. Serum iron, total iron binding capacity (TIBC), and ferritin levels were measured using spectrophotometric or ECLIA immunoassay methods in a Cobas 601 (Roche Diagnostics, Germany) analyzer.

Our dataset contained 238 observations, of which 128 were suspicious cases with HbD carriers and 110 were suspicious cases with HbS carriers state. The dataset was balanced. There was no missing data in our dataset. The features were age, sex, RBC, Hb, HTC, MCV, MCH, RDW, serum iron, TIBC, ferritin, HbA2, HbF, HbA0, RT of the abnormal peak, and the area under the peak of the abnormal peak. Sex was coded as 0 (female) and 1 (men).

To estimate the relevance of the features, we used the Boruta feature selection. The Boruta feature selection algorithm applies random forest classification at its core. First, it extends the original dataset by adding the random probes (shadow features) whose values are obtained by shuffling the original features’ values across instances. The importance of the feature in question is compared with a threshold value using z scores. Then, the algorithm iteratively removes features that are statistically less relevant than a shadow feature. Insignificant features are removed from consideration in the subsequent iterations. The algorithm classifies the features into three types: confirmed, tentative, and rejected.

The features were standardized by removing the mean and scaling to unit variance. K-fold cross-validation was performed. The chosen k for cross-validation was 7. We used the same training and testing data for all the models. We trained our models with and without RT data to understand the impact of RT data on the models. Euclidean distance was used for k-nearest neighbors (KNN) models. The number of neighbors for the model trained on data with RT was 75. For data without RT, the number of neighbors was 61. We used Gaussian naive Bayes algorithm for naive Bayes models. The classification and regression tree (CART) algorithm was used for decision tree models. Information gain was used to measure the quality of a split in decision tree training [12]. For the deep learning models, we constructed fully connected multilayer perceptrons (MLP) with two hidden layers. The nodes in the input layers were 16 and 15 for the entire dataset and data without the RT feature, respectively. Hidden layers had ten nodes with ReLU (Rectified Linear Unit) activation function. The output layer had one node with a sigmoid activation function. Adam optimizer, binary cross-entropy loss function, and batch size of 32 and 100 epochs were used to train deep learning models [15, 16]. Performances of the models were evaluated according to the accuracy, sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and F1 scores.

We used TensorFlow 2.5.0, scikit-learn 0.22.2, and Python 3.7.10 to create the machine learning models trained on the data [17, 18].

Results

Table 1 shows the five-point summaries of the features. Figure 3 shows the spearman correlation coefficients of the features.

Table 1:

Characteristics and hematological parameters among sickle cell carriers and Hb D Los Angeles carriers in the modeling group (n=238).

	Carrier status	Age, years	RBC, 10ˆ⁶/μL	HGB, g/dL	Hct, %	MCV, fL	MCH, pg	RDW, %	Serum iron, μg/dL	TIBC, μg/dL	Ferritine, ng/mL	Hb A2, %	Hb F, %	RT, min	The area under the peak, %
Minimum	HbD	14	4.13	9.2	30.1	61.5	17.9	11.0	15.5	234	2.5	0.3	0.0	0.94	32.2
Minimum	HbS	11	3.80	7.1	25.9	64.4	17.7	11.7	14.1	265	2.3	2.0	0.0	1.00	24.6
25th percentile (Q1)	HbD	19	4.91	13.2	39.3	77.4	26.0	12.5	56.0	311	23.4	2.2	0.0	0.95	37.0
25th percentile (Q1)	HbS	21	4.75	13.0	38.6	79.1	26.7	12.6	52.0	318	21.9	4.0	0.0	1.02	36.7
Median (Q2)	HbD	27	5.24	14.4	42.5	80.7	27.9	13.1	82.8	330	53.7	2.4	0.0	0.96	38.5
Median (Q2)	HbS	30	5.12	14.2	41.3	82.3	28.2	13.1	76.3	340	46.3	4.2	0.0	1.03	37.7
75th percentile (Q3)	HbD	32	5.59	15.9	45.5	84.5	29.3	13.9	115.0	366	81.2	2.6	0.0	0.97	39.7
75th percentile (Q3)	HbS	38	5.45	15.3	44.7	84.4	29.2	13.7	103.0	367	85.0	4.4	0.0	1.03	38.5
Maximum	HbD	67	6.60	18.1	51.9	95.4	33.1	20.7	290.0	464	392.0	3.2	1.0	0.99	43.5
Maximum	HbS	64	7.11	18.0	50.3	90.8	32.4	20.9	190.0	473	391.0	5.4	2.1	1.06	44.6

RBC, red blood cell; Hb, hemoglobin; Hct, hematocrit; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; RDW, red blood cell distribution width; TIBC, total iron-binding capacity; HbA2, hemoglobin A2; HbF, hemoglobin F; HbA0, hemoglobin A0.

In our study, naive Bayes models showed the worst performances, albeit having similar performance metrics with those of KNN models. Deep learning models were the best performers among all models. The performance differences between deep learning models and decision tree models were minor (Table 2).

Table 2:

Performance metrics of the trained models.

Model	Accuracy	TP	FP	TN	FN	Sensitivity	Specificity	F1 score	Positive predictive value	Negative predictive value
KNN	0.99	108	0	128	2	0.98	1.00	0.99	1.00	0.98
KNN without RT	0.94	102	6	122	8	0.93	0.95	0.94	0.94	0.94
Naive Bayes	0.98	105	0	128	5	0.95	1.00	0.98	1.00	0.96
Naive Bayes without RT	0.94	99	4	124	11	0.9	0.97	0.93	0.96	0.92
Decision tree	1.00	110	0	128	0	1.00	1.00	1.00	1.00	1.00
Decision tree without RT	0.99	108	1	127	2	0.98	0.99	0.99	0.99	0.98
Deep learning	1.00	110	0	128	0	1.00	1.00	1.00	1.00	1.00
Deep learning without RT	0.99	109	1	127	1	0.99	0.99	0.99	0.99	0.99

KNN, K-nearest neighbors; RT, retention time; TP, true positive; FP, false positive; TN, true negative; FN, false negative.

When RT, the key point of differentiation used in HPLC, was included for model training, all methods performed well. When RT was excluded (eliminated), deep learning performed the best, while the naive Bayes model performed the worst (Figure 1, Table 2).

Figure 1:

The architecture of the artificial neural networks (ANN) trained on data without retention time.

Figure 2 shows the attributes’ individual impact analysis. Green indicates confirmed features, yellow represents tentative features, red represents rejected features, and blue represents shadows.

Figure 2:

Boruta plot. Green indicates confirmed features, yellow represents tentative features, red represents rejected features, and blue represents shadows.

Boruta performed 99 iterations in 4.712265 s (Figure 2). Five attributes confirmed important: APeak, HbA0, HbA2, RT, serumFe; nine attributes confirmed unimportant: age, ferritin, HbF, Hct, MCH, and four more; two tentative attributes left: Hb, RDW; Boruta performed 99 iterations in 4.712265 s. tentatives rough fixed over the last 99 iterations. Seven attributes confirmed important: APeak, Hb, HbA0, HbA2, RDW, and two more; nine attributes confirmed unimportant: age, ferritin, HbF, Hct, MCH, and four more; Hb, RDW, serumFe, HbA2, HbA0, RT, APeak.

Discussion

In this study, we trained k-nearest neighbors models, naive Bayes models, decision tree models, and deep learning models as binary classifiers. We used four different machine learning algorithms to train as binary classifiers of Hb S and Hb D carriers. This was one of the strengths of our approach.

Well-prepared data is of utmost importance for machine learning model training. Our dataset was devoid of any missing data, albeit a small one. We performed 7-fold cross-validation to understand how generalizable our models were. K-fold cross-validation was also a necessity due to the small size of our dataset. Well-prepared and featured engineered data and hyperparameters tuning were behind the success of our machine learning models.

The worst-performing models were naive Bayes models (Figure 3). The naive Bayes algorithm is accepted as naive because it assumes all variables are independent of each other, atypical of real-world examples. Features in our dataset were not independent of each other either. In our study, decision tree and deep learning models showed similar performances. Deep neural networks require too much effort to select the right combinations of hyperparameters. Due to this requirement, it is advisable to use tree-based models for tabular data such as ours. It is a known fact that for prediction and regression problems with tabular data, tree ensemble models (like XGBoost) outperform deep learning models [17]. Essentially, neural networks are continuous models. Neural networks can approximate discontinuities via non-linear activation functions like ReLU. By contrast, tree-based models are fundamentally discontinuous, and this gives them some advantage for tabular data. Nevertheless, a new deep learning method is claiming to outperform gradient boosting methods, including XGBoost, CatBoost, and LightGBM, on average over various benchmark tasks [17].

Figure 3:

Spearman correlation coefficients of the features.

In addition to hematological data, the effect of RT was investigated in our study. While all of the machine learning models we used performed well when RT, the main distinction point used in HPLC, was also used in training, the models’ performance declined when RT was excluded. However, the deep learning model (accuracy: 0.99; specificity: 0.99; sensitivity: 0.99; F1 score: 0.99) and decision tree (accuracy: 0.99; specificity: 0.99; sensitivity: 0.98; F1 score: 0.99) models maintained their high performance. The deep learning and the decision tree models produced the best results, while the naive Bayes models produced the worst. In this way, the learning and performance of models based solely on hematology-related data were evaluated. It has been demonstrated that Hb S and Hb D Los Angeles can be distinguished from hematological data relationships without the need for RT using machine learning.

Machine learning methods have been shown to be successful in the differential diagnosis of beta-thalassemia minor and iron deficiency anemia, breast cancer metastasis prediction, heart diseases, and the life expectancy prediction of intensive care patients [1, 19, 20].

Setsirichok et al. used decision trees, naive Bayes, and MLP machine learning approaches to predict alfa and beta thalassemia trait, HbE, HbH, hereditary persistent fetal hemoglobin (HPFH), beta thalassemia major patients. They concluded that naive Bayes and MLP are excellent screening tools for thalassemia [21]. Borah et al. attempted to differentiate patients with beta-thalassemia, thalassemia major, Hb E, and sickle cell anemia. They achieved huge success with their machine learning methods [22].

Piroonratana et al. attempted to predict thalassemia types by analyzing HPLC chromatograms with decision trees and artificial neural networks. As a result, they concluded that machine learning algorithms could be used to guide thalassemia typing [23].

Chy et al. tried detecting sickle cell anemia with hematological images using machine learning methods and achieved great performance [24].

In the study of Magen et al. ANN was used in the differential diagnosis of beta-thalassemia and iron deficiency. Both diseases were accurately predicted with a specificity of 0.96 and a sensitivity of 0.99, and they stated that population screenings could be performed easily using ANN: [25].

To the best of our knowledge, this is the first study to use machine learning models to distinguish between Hb S carriers and Hb D-Los Angeles carriers.

The deep learning and decision tree models we trained were able to distinguish two hemoglobin types, very similar hematologically, electrophoretically, and chromatographically, with high performance.

These results suggest that when the machine learning models are fed with enough data, machine learning models may detect a wide range of hemoglobin variants.

Conclusions

Deep learning and decision tree models have demonstrated high performance and have the potential to be integrated into medical laboratory work practices as a tool for hemoglobinopathy detection aid. These outcomes suggest that when our machine-learning models are fed enough data, machine-learning methods may predict a wide range of hemoglobin variants. However, more comprehensive studies with data from a larger number of patients and hemoglobinopathies will be useful for validating our models.

Corresponding author: Süheyl Uçucu, Department of Medical Biochemistry, Muğla Public Health Care Laboratory, Muğla, Turkiye, Phone: 0 (555) 306 28 74, E-mail: suheyllucucu@gmail.com

Acknowledgments

I would like to thank the statistics teacher Sanem Şehirbanoğlu, who gave us an idea about statistics during the preparation of this article.

Research funding: None declared.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors state no conflict of interest.
Informed consent: Informed consent was not obtained from all individuals included in this study.
Ethical approval: The Ethics Committee of Mugla Sitki Kocman University Training and Research Hospital granted ethics approval with the decision no. 111 on 01.06.2021.

References

1. Obermeyer, Z, Emanuel, E. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med 2016;375:216. https://doi.org/10.1056/NEJMp1606181.Search in Google Scholar PubMed PubMed Central

2. Bouton, CE, Shaikhouni, A, Annetta, NV, Bockbrader, MA, Friedenberg, DA, Nielson, DM, et al.. Restoring cortical control of functional movement in a human with quadriplegia. Nature 2016;533:247–50. https://doi.org/10.1038/nature17435.Search in Google Scholar PubMed

3. Mullainathan, S, Spiess, J. Machine learning: an applied econometric approach. J Econ Perspect 2017;31:87–106. https://doi.org/10.1257/jep.31.2.87.Search in Google Scholar

4. Ashorobi, D, Ramsey, A, Yarrarapu, SN, Bhatt, R. Sickle cell trait. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2021.Search in Google Scholar

5. Piel, FB, Hay, SI, Gupta, S, Weatherall, DJ, Williams, TN. Global burden of sickle cell anaemia in children under five, 2010–2050: modeling based on demographics, excess mortality, and interventions. PLoS Med 2013;10:1001484. https://doi.org/10.1371/journal.pmed.1001484.Search in Google Scholar PubMed PubMed Central

6. Hazzazi, AA, Ageeli, MH, Alfaqih, AM, Jaafari, AA, Malhan, HM, Bakkar, MM, et al.. Epidemiology and characteristics of sickle cell patients admitted to hospitals in Jazan region, Saudi Arabia. J Appl Hematology 2020;11:10. https://doi.org/10.4103/joah.joah_67_19.Search in Google Scholar

7. Bain, BJ. Haemoglobinopathy diagnosis, 3rd ed. London, UK: Blackwell Publishing, Inc.; 2020:448 p.10.1002/9781119579977Search in Google Scholar

8. Xu, JZ, Thein, SL. The carrier state for sickle cell disease is not completely harmless. Haematologica 2019;104:1106. https://doi.org/10.3324/haematol.2018.206060.Search in Google Scholar PubMed PubMed Central

9. Naik, RP, Smith-Whitley, K, Hassell, KL, Umeh, NI, De Montalembert, M, Sahota, P, et al.. Clinical outcomes associated with sickle cell trait: a systematic review. Ann Intern Med 2018;169:619–27. https://doi.org/10.7326/m18-1161.Search in Google Scholar PubMed PubMed Central

10. Goodman, J, Hassell, K, Irwin, D, Witkowski, E, Nuss, R. The splenic syndrome in individuals with sickle cell trait. High Alt Med Biol 2014;15:468-71.https://doi.org/10.1089/ham.2014.1034.Search in Google Scholar PubMed PubMed Central

11. Austin, H, Key, KS, Benson, JM, Lally, C, Dowling, NF, Whitsett, C, et al.. Sickle-cell trait and the risk of venous thromboembolism among African. Blood 2007;11:57–60. https://doi.org/10.1182/blood-2006-11-057604.Search in Google Scholar PubMed

12. Randolph, TR. Hemoglobinopathies (structural defects in hemoglobin). In: Rodak’s hematology: clinical principles and application. St Louis, MO: Elsevier; 2019:394–423 pp.10.1016/B978-0-323-53045-3.00033-7Search in Google Scholar

13. Cummins, PM, Rochfort, KD, O’Connor, BF. Ion-exchange chromatography: basic principles and application, in protein chromatography. Methods Mol Biol 2017;1485:209–23. https://doi.org/10.1007/978-1-4939-6412-3_11.Search in Google Scholar PubMed

14. Ou, CN, Rognerud, CL. Diagnosis of hemoglobinopathies: electrophoresis vs. HPLC. Clin Chim Acta 2001;313:187–94. https://doi.org/10.1016/s0009-8981(01)00672-6.Search in Google Scholar PubMed

15. Breiman, L, Friedman, J, Olshen, R, Stone, C. Classification and regression trees. Routledge 2017;15:246. https://doi.org/10.1201/9781315139470.Search in Google Scholar

16. Kingma, DP, Ba, J. Adam: a method for stochastic optimization. ArXiv preprint arXiv. 2014;1412:6980.Search in Google Scholar

17. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, et al.. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–30.Search in Google Scholar

18. Abadi, M, Agarwal, A, Barham, P, Brevdo, E, Chen, Z, Citro, C, et al.. Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv. 2016;1603:04467.Search in Google Scholar

19. Ayyıldız, H, Tuncer, SA. Systems, determination of the effect of red blood cell parameters in the discrimination of iron deficiency anemia and beta-thalassemia via neighborhood component analysis feature selection-based machine learning. Chemom Intell Lab Syst 2020;196:103886.10.1016/j.chemolab.2019.103886Search in Google Scholar

20. Takada, M, Sugimoto, M, Naito, Y, Moon, HG, Han, W, Noh, DY, et al.. Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model. BMC Med Inf Decis Making 2012;12:1–10. https://doi.org/10.1186/1472-6947-12-54.Search in Google Scholar PubMed PubMed Central

21. Setsirichok, D, Piroonratana, T, Wongseree, W, Usavanarong, T, Paulkhaolarn, N, Kanjanakorn, C, et al.. Prediction of complete blood count and hemoglobin typing data by a C4. 5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassemia screening. Biomed Signal Process Control 2012;7:202–12. https://doi.org/10.1016/j.bspc.2011.03.007.Search in Google Scholar

22. Borah, MS, Bhuyan, BP, Pathak, MS, Bhattacharya, P. Machine learning in predicting hemoglobin variants. Int J Mach Learn Comput 2018;8:140–3. https://doi.org/10.18178/ijmlc.2018.8.2.677.Search in Google Scholar

23. Piroonratana, T, Wongseree, W, Assawamakin, A, Paulkhaolarn, N, Kanjanakorn, C, Sirikong, M, et al.. Prediction of hemoglobin typing chromatograms by neural networks and decision trees for thalassemia screening. Chemometr Intell Lab Syst 2009;99:101–10. https://doi.org/10.1016/j.chemolab.2009.07.014.Search in Google Scholar

24. Chy, TS, Rahaman, MA. A comparative analysis by KNN, SVM & elm prediction to detect sickle cell anemia. ICREST 2019;455–9.10.1109/ICREST.2019.8644410Search in Google Scholar

25. Barnhart-Magen, G, Gotlib, V, Marilus, R, Einav, Y. Differential diagnostics of thalassemia minor by artificial neural networks model. J Clin Lab Anal 2013;27:481–6. https://doi.org/10.1002/jcla.21631.Search in Google Scholar PubMed PubMed Central

Received: 2022-04-18

Accepted: 2022-09-22

Published Online: 2022-11-07

This work is licensed under the Creative Commons Attribution 4.0 International License.

Machine learning models can predict the presence of variants in hemoglobin: artificial neural network-based recognition of human hemoglobin variants by HPLC

Abstract

Objectives

Methods

Results

Conclusions

Introduction

Materials and methods

Results

Discussion

Conclusions

Acknowledgments

References

Journal and Issue

Articles in the same Issue