Introduction

The significant emission of greenhouse gases like CO2 has led to two significant global challenges, which are called “Global Warming” and “Climate Change”1. During the last decade, the presence of CO2 gas in the atmosphere has exceeded the acceptable limits (i.e. 350 ppm)2,3,4. Hence, an extensive effort is required to eliminate CO2 gas from the atmosphere. There are some advanced technologies for decreasing CO2 emission, like Carbon Capture and Storage (CCS). CCS technologies are mainly categorized into three groups: pre-combustion, post-combustion (PCC), and Oxy-combustion (oxy-fuel)5. Among these methods, the PCC method is more practical and economical. It is still necessary to solve several economic, technological, environmental, and safety challenges such as (i) improving the efficiency of CO2 capture, (ii) reducing the costs of the process, and (iii) ensuring CO2 storage is environmentally sustainable6. Applying aqueous Alkanolamine solvents (e.g. MEA) in the PCC method is conventional because of its high reactivity with CO2, availability, low cost, and low viscosity. However, there are still several flaws in using such kinds of solvents, including high loss of the solvent, degradation, corrosion, high energy consumption during the regeneration process, environmental problems, and high regeneration costs7,8,9. As a result, it is essential to develop new green and cheap solvents for CO2 capture processes.

Research in recent years have increasingly focused on the development of novel solvents such as Ionic Liquids (ILs), and Deep Eutectic Solvents (DESs) to replace the conventional volatile organic compounds (VOC) in different chemical and industrial processes10,11,12.

Compared to conventional CO2 capture solvents (i.e., amines), ILs are more capable due to their attractive intrinsic properties such as low volatility, high thermal stability, and excellent CO2 solubility13,14. It is well known that ILs are efficient physical sorbents of CO2, that their specifications can be tuned by selecting the proper cations and anions. Despite these advantages, using ILs for CO2 capture in industrial applications has several drawbacks, including their high viscosity, complicated and expensive synthesis and purification processes, and high cost. A growing concern exists regarding the toxicity of several ILs15. There are new classes of solvents known as DESs, which have additional merits of low cost, low toxicity, biodegradability, easy preparation, and no need to purification16. The DESs can be synthesized by mixing a hydrogen bond donor (HBD) (e.g. carboxylic acids, amides, amines, alcohol, or metal halides) with a hydrogen bond acceptor (HBA) (e.g. quaternary phosphonium or ammonium salts) in the appropriate molar ratios17. The most promising property of DESs is the diversity of structures. Because of their inherent benefits, including low vapor pressure, high thermal and chemical stability, non-flammability, and a wide range of adjustability, DESs have gained considerable attention18,19. In particular, choline-based DESs have been investigated intensively. Since the choline-based DESs are mainly constituted natural compounds; therefore, they have no harmful environmental influences. Among the widely used choline salts, choline chloride (ChCl) is a non-toxic, biodegradable, and inexpensive material either synthesized from product or by-product from fossil reserves(i.e. petroleum) or extracted from the biomass19.

The theoretical and experimental studies of DESs have been conducted in different applications such as CO2 capture20, desulfurization21,22, and separation process23. There are several experimental studies in the literature on the measurement of the CO2 solubility in different types of DESs.

In the first experimental study, Li et al.24 examined ChCl-based DESs as CO2 absorbents at various pressures (0.85 − 12.52 MPa), temperatures (313.15 − 333.15 K), and molar ratios (1:1.5, 1:2, or 1:2.5). Their results demonstrated that the CO2 solubility decreased by increasing temperature and increased by increasing pressure. Besides, it was confirmed that the molar ratio has a great effect on the CO2 solubility in DESs. Therefore, the ChCl/Urea (1:2) (the so-called reline system) indicated a higher CO2 solubility value compared to other DESs from a ChCl/Urea molar ratio of 1:1.5 and 1:2.5. Li et al.25 also studied a series of ChCl-based DESs, including ChCl/Phenol (1:2, 1:3, 1:4), ChCl/Triethylene glycol (1:3 and 1:4) and ChCl/Diethylene glycol (1:3 and 1:4). It was found that the solubility of CO2 in ChCl/Triethylene glycol (1:4) is the highest compared to other DESs. In another work, Leron et al.26 measured the solubility of CO2 in the reline system at the expanded temperature range of 303.15 to 343.15 K. Leron et al.27,28 also reported the CO2 solubility in DESs containing ChCl/Glycerol (1:2) or ChCl/Ethylene glycol (1:2) and presented higher CO2 solubility compared to the imidazolium-based ILs. Sarmad et al.20 reported 209 data points on the CO2 solubility in 35 different DESs at 298.15 K and pressure lower than 2 MPa. Chen et al.29 studied the CO2 solubility in the ChCl/1,2-Propanediol, 1,4-Butanedioland, and 2,3-Butanediol (1:3 and 1:4). Their results approved that ChCl/2,3-Butanediol (1:4) had the highest capacity of CO2 absorption. Lu et al.30 investigated the CO2 solubility in the ChCl/Levulinic acid or ChCl/Furfuryl alcohol (1:3, 1:4, and 1:5). According to their results, ChCl/Levulinic acid (1:5) indicated a higher capacity of CO2 absorption compared to Furfuryl alcohol. Therefore, it can be concluded that the presence of a specific HBD along with a fixed HBA (i.e., ChCl) can influence the CO2 solubility in DES significantly. It was verified that the CO2 solubility in DESs depends on the type of HBD and HBA, the HBA to HBD molar ratio, viscosity, and water content of DESs, and operating pressure and temperature31.

It should be mentioned that numerous DESs can be synthesized by combining different HBA and HBD. It is challenging to select the most suitable DESs for the CO2 capture processes based on the experimental studies. Therefore, an efficient theoretical method is needed to predict the CO2 solubility in DESs.

One of the most popular methods of Machine Learning (ML) to complement experimental, analytical techniques is the Quantitative Structure–Property Relationship (QSPR). To develop QSAR/QSPR models, the chemical structures are fragmented into structural groups, and mathematical algorithms are applied to the data. The general idea is to derive an expression in Property = f (X1, X2, X3, …Xn), where each variable can be a chemical structure feature (i.e. molecular descriptors) or physicochemical property. The derived function can significantly help to gain the deeper molecular insights into the relationships between the process-relevant properties of molecules and predict property relationships for new but related materials, and also help to explain the measured characteristics32.

Lemaoui et al. presented new QSPR models to predict the viscosity, density, and electrical conductivity of DESs by a multilinear regression (MLR) analysis. Their results confirmed that the developed models for the studied DESs properties were able to predict the density, viscosity, and electrical conductivity of the DESs with a satisfactory accuracy (i.e. R2 values of 0.9839, 0.9874, and 0.985, respectively)33,34. Balali et al.22 presented QSPR models to take into account the effect of HBD structures on the thiophene distribution (β2) between hydrocarbon phases and ChCl-based DESs in the ternary systems. Table S1 in the supporting information file shows the available QSPR models in the literature for the prediction of different properties of DESs.

Numerous studies have been reported in the literature for predicting the CO2 solubility in DESs using different approaches. The CO2 solubility in choline chloride-based DESs has been predicted successfully using thermodynamic modeling approaches such as modified Peng-Robinson (PR) EoS35,36, density functional theory (DFT)37,38, and molecular dynamics (MD) simulation38,39,40. Zubeir et al.41 applied the Perturbed-Chain Statistical Associating Fluid Theory (PC-SAFT) to express the CO2 solubility in DESs in the pressures up to 2 MPa and temperature ranges 298.15 and 318.5 K using two pseudo-pure components and strategies of individual-component. Kamgar et al.42 employed COSMO-RS and NRTL models to predict the solubility of CO2, CH4, CO, N2, and H2 in the reline. Their results indicated that the models could only predict the solubility at high temperatures and low pressures. Recently, Alioui et al.43 have combined the MD methods and COSMO-RS to study the solubility of CO2 in seven ChCl and phosphonium-based DESs. Liu et al.44 assessed COSMO-RS to predict the CO2 solubility and Henry’s constants of CO2 in DESs based on the experimental data. Different thermodynamic methods developed for the solubility of CO2 in different DESs are summarized in Table S2 in the supporting information file.

As can be observed in Table S2, few QSPR models have been developed to predict the CO2 solubility in DESs.

In the first QSPR study on the prediction of CO2 solubility in DESs, Wang et al.45 developed both linear and non-linear models using COSMO-RS-derived descriptors of HBA and HBD structures, temperature, pressure and molar ratio of HBA to HBD. Aside from the numerous benefits of their work, a few drawbacks were also observed. Although they used large number of descriptors for each component of DESs (i.e. HBA and HBD), their developed linear model have limited prediction capability. Besides, their developed model was not descriptive due to the application of sigma profile descriptors, which are not interpretable. Furthermore, they used molar ratio as an independent variable in their linear model. The relationship between HBA to HBD molar ratio and solubility is not linear (see Fig. S1 in the supporting information file). Therefore, in the present study, it was tried to find the most important interpretable descriptor of HBD in the presence of the fixed HBA (i.e. Choline chloride).

Kumar et al.46 presented 12 QSPR models for the prediction of the CO2 capture capacity of DESs considering the effect of HBA and HBD structures, HBA to HBD molar ratio, temperature and pressure. The Monte Carlo method was used to determine the appropriate coefficients of each quasi-SMILES descriptors for 72 different DESs (including 19 different HBAs and 20 different HBDs). Their developed models included four random splits from datasets as well as three target functions with and without criterion of the predictive potential examination (i.e. index of ideality of correlation (IIC) and correlation intensity index (CII)). Then, they introduced the model with the highest accuracy according to different statistical parameters. Although their work was very comprehensive and valuable because of using diverse dataset and also high prediction accuracy of model, it seems that the parameters of their model cannot be interpreted and the effect of each parameter in the CO2 absorption mechanism cannot be investigated. In other words, it seems that they paid more attention to the predictability of the model instead of describing why and how each of the variables in the developed model affect the CO2 capture capacity. Therefore, in the present study, it has been tried to develop descriptive and predictive QSPR models with meaningful and interpretable descriptor.

Halder et al.47 performed multicriteria decision techniques to develop multi-objective models to investigate two properties (i.e. viscosity, and CO2 uptake capacity), simultaneously. Their work is valuable because the viscosity of DESs plays a significant role in the final solvent choice. They developed two linear QSPR models for predicting the CO2 uptake capacity and viscosity of DESs, separately. Then, they used the Derringer’s desirability function to integrate these two models for identification of the DESs with high CO2 absorption capacity and low viscosity. Although their work was very innovative and comprehensive, there are few flaws in their work. First of all, according to the MD simulation performed by Alizadeh et al.48, there is a strong effect of HBD structures and anion part of HBA and a slight effect of cation part of HBA on CO2 solubility in the DESs. Meanwhile, at a lower pressure, the HBD–CO2 interaction is dominant, and at a higher pressure, it is the anion–CO2 interaction. In another word, HBD structures have a greater effect on the CO2 absorption at low pressures and HBA structures at high pressures. However, Halder et al.47 have considered the effectiveness of HBA (both cation and anion parts) and HBD in all conditions to be the same. Second, temperature and pressure variables were not present in their model and the prediction was made only by structural variables. While, it has been proven that temperature and pressure have a significant effect on the CO2 absorption. Thus, in the present study, an effort has been made to investigate the effect of HBD structures on the CO2 solubility in low pressure (i.e. physical absorption) while considering the key parameters of temperature and pressure in the developed model. Therefore, in this study, it was tried to fill the observed gaps in the recent invaluable researches.

In this study, the QSPR method is applied as a robust tool to develop predictive models for solubility of CO2 in the DESs with a fixed HBA (i.e. ChCl) with the molar ratio of HBA to HBD equal to 1:3 and 1:4. At first, some QSPR models are developed, which can consider the effect of the HBD structures and pressure at fixed temperature (i.e. 293, 303, 313, or 323 K). Then, the CO2 solubility dependence on temperature was considered along with the pressure and HBD descriptor. This approach can efficiently predict the CO2 solubility for new ChCl-based DESs in new temperatures. Moreover, two additional datasets were applied for further external validation to confirm the robustness of the unfixed temperature models.

The QSPR method

Dataset

The available experimental data of CO2 solubility in ChCl-based DESs with molar ratios of 1:3 and 1:4 were collected from the literature, at first. The range of P, T, and CO2 solubility for each dataset was shown in Table 1. The total number of CO2 solubility data points is 390. As can be seen in Table 1, the variation of the involved HBD in DESs was nine. In the present study, the values of CO2 solubility (x: mole of CO2 per mole of DES) have been converted into the form of the natural logarithm (i.e., ln(x)) for the model development. A common technique used to ensure the reliability of the developed QSPR models is to divide the datasets into two separate sets called “train” and “test”. It should be mentioned that the QSPR model was developed using the train set, and the internal validation technique can be applied to this set. The developed QSPR model should be externally validated by taking some HBDs out of the datasets and putting them into the test set. Through this work, the prediction capability and accuracy of the developed model can be assessed. In order to increase the robustness of the external validation, it was tried to select the test set in such a way to consist of some HBD structures, which are different from the involved structures in the train set. In addition, datasets no (11) and (12) have been used for further external validation of the developed models in the unfixed temperature status and applied the models at new temperatures, pressures and HBD structures. Furthermore, the applicability domain of the constructed models has also been checked for both the train and test sets, which indicates that both of them contain DESs with considerable differences from a molecular structure viewpoint.

Table 1 The variation ranges of pressure and solubility for each studied dataset in the present study.

Optimization of HBD structures and descriptors calculation

Before calculating the descriptors of each HBD, it is essential to optimize their molecular structures. The 3D structures of 9 HBD molecules were drawn using gauss-view software51 and then were submitted to geometry optimization using the density functional theory (DFT) at the level of B3LYP and 6–31 + G (d,p)52. Afterward, Dragon software53 was applied to calculate the different kinds of 1D, 2D, and 3D descriptors. In order to reduce the number of computed descriptors (i.e. 3224), constant and semi-constant descriptors, and the descriptors with high intercorrelation (> 98%) were eliminated. Therefore, the remaining 444 molecular descriptors of the HBD structures were used for the model construction.

Basic theory and model construction procedure

Basic theory

CO2 solubility in the gas–liquid systems (i.e. CO2 in DES) is defined as follows:

$$x=\frac{mole\, {CO}_{2}}{mole\, DES}.$$
(1)

According to Li et al.24, the CO2 solubility is dependent on the temperature and pressure as well as the HBA to HBD molar ratio.

In a constant HBA to HBD molar ratio, the relationship between ln(x) and ln(P) can be considered as follows (see Fig. S2 in the supplementary file):

$$\mathrm{ln}\left(x\right)=a\times \mathrm{ln}\left(P\right)+b,$$
(2)

where a, and b represent the adjustable parameters. As it is clear, the molecular structure of HBDs can play a key role in different processes such as desulfurization22 and CO2 solubility20. In this study, the QSPR method will be used to correlate ln(x) to ln(P) and a relevant molecular descriptor of HBDs by the replacement of the “b” parameter. In order to investigate the effect of HBD molecular structure on the CO2 solubility, eight separate datasets have been applied with fixed temperature considering Eq. (3):

$$\mathrm{ln}\left(x\right)=a\times \mathrm{ln}\left(P\right)+F\left(HBD \,descriptors\right)+c.$$
(3)

The CO2 solubility values can be predicted only in the fixed temperature (i.e., 293, 303 313, or 323 K) using Eq. (3). In order to take into account the effect of temperature along with the descriptor and ln(P), Eq. (4) has been considered by the replacement of the "c" parameter in Eq. (3) with “ \(b\times T\)” term. According to the observed trend for the CO2 solubility with temperature (see Fig. S3 in the supplementary materials), T was considered as a linear variable in the developed models taking into account the effect of temperature:

$$\mathit{ln}\left(x\right)=a\times \mathit{ln}\left(P\right)+b\times T+F\left(HBD\, descriptors\right)+d$$
(4)

Model development strategy

In the present study, two types of QSPR models have been developed. Equation (3) is applied for the development of the model for the fixed temperature datasets. Equation (4) is applied for the development of the model taking into account the temperature effect on the CO2 solubility. Using Eq. (4), the multiple linear regression (MLR) model with three variables (i.e., ln(P), T, and the molecular descriptor of HBDs) was used to derive a predictive and descriptive QSPR model. It is important to note that the suitable descriptor of HBDs should be selected from a set containing various different HBD descriptors (i.e., 444), the ln(P), and T variables. Variable selection for QSPR models can be performed following several approaches54. In this study, the Genetic Algorithm (GA) was applied to select the variables of the QSPR model. Further information on the genetic algorithm-multiple linear regression (GA-MLR) can be seen elsewhere55,56. It should be noted that the GA-MLR models were built using QSARINS software57.

Validation of developed models

The estimation capability of all QSPR models should be assessed by implementing internal predictive performance and external predictive performance evaluations. The training set is used for the internal validation, while the test set is used to conduct the external validation. There are several statistical parameters that can be applied to examine the capability of the constructed QSPR model, including the coefficient of determination (R2), adjustable coefficient of determination (R2adj), the standard error (S), the Fisher criterion (F), the Root Mean Square Error (RMSE), Leave One Out Cross-Validated coefficient of determination (Q2 LOO-CV) and the average absolute relative deviation (AARD%). More details on the statistical parameters are provided in the supporting information file (i.e. Table S3 in the supplementary file). In the present study, both internal and external validation methods have been applied. The outcome of such analysis is presented in the following section.

Results

The developed QSPR models

Table 2 shows the developed models for unfixed temperature (datasets no. (1) and (2)) and fixed temperature (datasets no. (3)–(10)).

Table 2 The obtained QSPR models for fixed and unfixed temperatures datasets after train and test categorization.

It was surprising that the same descriptor (i.e. “EEig02d”) appears in all developed models at the fixed and unfixed temperatures. The descriptor “EEig02d” is a molecular descriptor derived from the edge adjacency matrix of a molecule that is weighted by dipole moments. The “EEig02d” descriptor is related to the molar volume of the molecule58.

As can be seen in Table 2 and for datasets no. (3)–(10), the best combinations of the ln(P) variable and selected descriptor have been obtained for each fixed temperature (i.e. 293, 303, 313, or 323 K) with their corresponding molar ratio (i.e., 1:3 and 1:4). Besides, the models containing three variables (i.e. ln(P), T, and selected descriptor) have been developed for the unfixed temperature datasets.

It should be mentioned that the developed models (i.e., Eqs. (11)–(26)) for fixed temperature datasets can be applied for the related temperature 293, 303, 313, or 323 K. While the unfixed temperatures models (i.e. Equations (6), (7), (9), (10)) can be used to take into account the effect of temperature on the CO2 solubility.

Validation of the models and statistical evaluation

According to Sarmad et al.20, the correlation between ln(x) and ln(P) has been tested for each involved system in any datasets (Please see Table S4 and Fig. S2 in the supplementary file).

In order to evaluate the performance of the developed QSPR models, external validation should be performed. First, data splitting into training and test sets have been created by the Principal component analysis (PCA) method59. According to the PCA analysis, for all datasets, the test sets should be chosen in such a way to contain some new structures compared to the train set.

Regarding the datasets no. (3) and no. (4), all related data of one structure of HBD (i.e., Diethylene glycol) was set aside in the test set due to the shortage of the structural variations. Unlike the datasets no. (3) and no. (4), it should be added that all related data of two structures of HBD (i.e. Furfuryl alcohol and Diethylene glycol) were considered as the test set for the other datasets (i.e. (5)–(10)). It should be added that all available datapoints in two remaining datasets (i.e. no. (11) and (12)) were considered only for further external validation. Then, it was tried to choose the most appropriate molecular descriptor of HBD as Eq. (3) for fixed temperature datasets and Eq. (4) for unfixed temperature datasets. As shown in Table 2, the obtained models with one or two variables, and one, two, or three variables have been presented for fixed and unfixed temperature datasets, respectively. The appeared descriptor in each developed QSPR model was the same (i.e. “EEig02d”). The values of statistical parameters for either fixed or unfixed models are given in Table 3 for the train and test sets.

Table 3 Statistical parameters of the obtained models for each dataset in logarithm and non-logarithm scale separately.

According to Table 3, the prediction capability of the developed models with two-variables (i.e. ln(P) and “EEig02d”), which considered the effect of HBD structures (i.e. Eqs. (12), (14), (16), (18), (20), (22), (24), and (26)), is superior compared to one-variable (i.e. ln(P)) models (i.e. Eqs. (11), (13), (15), (17), (19), (21), (23), and (25)) considering the fixed temperature datasets. Moreover, the one-variable (i.e. ln(P)) and two-variables (i.e. ln(P) and T) models (i.e. Eqs. (5), (6), (8), (9)) are not appropriate for unfixed temperature datasets because they cannot take into account the effect of HBD structure on the CO2 solubility. Then, it is essential to add a molecular variable along with other variables to distinguish the effect of different structures of the HBDs on the CO2 solubility (Eqs. (7) and (10)). It can be concluded that considering the effect of HBD structure using the “EEig02d” descriptor improved the estimation of CO2 solubility significantly. It should be mentioned that the values of statistical parameters in the non-logarithm scale have been reported along with the logarithm scale in Table 3.

The experimental versus the predicted values of CO2 solubility are shown in Figs. 1 and 2 for dataset no. (1) with variable temperature and dataset no. (5) with fixed temperature, respectively. These figures for other datasets can be found in the supporting information file (Figs. S4a–S13a).

Figure 1
figure 1

Predicted versus experimental values of CO2 solubility for unfixed temperatures dataset (i.e. dataset no. (1)) using (a) Eq. (5) and (b) Eq. (7).

Figure 2
figure 2

Predicted versus experimental values of CO2 solubility for a fixed temperature dataset (i.e. dataset no. (5)) using (a) Eq. (15) and (b) Eq. (16).

As can be seen in Figs. 1a and 2a, the prediction capability of models using Eqs. (5) and (15) is not acceptable because these models only consider the effect of pressure on the CO2 solubility. However, according to Figs. 1b and 2b, taking into account the HBD structural effect in Eqs. (7) and (16) lead to a considerable enhancement in the estimation of CO2 solubility for both train and test sets.

Figures 3a and 4a show the experimental versus the residual values of CO2 solubility for dataset no. (1) using Eq. (7) and dataset no. (5) using Eq. (16), respectively. As can be observed, a normal distribution of the residual values for train and test data is achieved. Figures 3b and 4b show the standard error versus leverage values (i.e. William’s plot) for dataset no. (1) with variable temperature and dataset no. (5) with fixed temperature. As can be observed, there is no outlier data for these datasets. These figures can be used to identify the applicability domain of the constructed models. Additional figures corresponding to the remaining datasets are available in the supplementary file (part b and c of Figs. S4S13a).

Figure 3
figure 3

Residuals versus experimental values of CO2 solubility (a) and Standard residuals versus leverage (b) for unfixed temperatures dataset (dataset no. (1)) using Eq. (7).

Figure 4
figure 4

Residuals versus experimental values of CO2 solubility (a) and Standard residuals versus leverage (b) for fixed temperature dataset (dataset no. (5)) using Eq. (16).

According to the developed models, the “EEig02d” descriptor is the appropriate structural variable for the prediction of solubility of CO2. It is clear that the “EEig02d” descriptor appeared in all models, so it can be concluded that it was not selected randomly. The values of the predicted CO2 solubility by the QSPR models mentioned in Table 2 for each data point of all datasets are available in the supporting Excel file. Table 3 presents the outcome of the statistical examination of the constructed models. As can be observed in Table 3, the models including the EEig02d descriptor, showed the best statistical parameters in both logarithmic and non-logarithmic scales considering both internal and external validations.

In order to investigate the applicability of the unfixed temperature models in new temperatures and pressures, datasets no. (11) and (12) were used. In other words, these datasets contain some new HBDs (i.e. Glycerol in dataset no (11) and Urea and Ethylene glycol in dataset no. (12)). Moreover, both datasets have some new temperatures (i.e., 298 and 333 K) and pressures (i.e. 10 bar) which were different comparing the datasets no. (1) and (2) applied for the model development. According to Fig. S14 in the supplementary word file, all datapoints in these two new datasets were in the domine of applicability, Therefore, Eq. (7) and (10) for dataset no. (11) and (12) can be applied, respectively. Figure 5 shows the experimental versus the predicted values of CO2 solubility for dataset no. (11) using Eq. (7) and dataset no. (12) using Eq. (10), respectively. Surprisingly, the proposed models showed very good capability for the prediction of solubility at low pressure (i.e. low solubility). At high pressure (i.e., high solubility) the prediction of solubility shows an acceptable deviation, which confirms the robustness and applicability of the models at different temperatures and pressures even for new structures.

Figure 5
figure 5

Predicted versus experimental values of CO2 solubility for (a) dataset no. (11)) using Eq. (7) and (b) dataset no. (12) Eq. (10).

Discussion

It should be proven that the selected descriptor has the best performance for the prediction of the CO2 solubility. In this regard, some sub-datasets have been selected randomly from the datasets no. (1) and (2) in such a way that in each sub-dataset temperature, pressure and molar ratio was almost constant and only the structure of HBDs was variable. Then, some models with only one variable (i.e., structural descriptor) have been developed and compared statistically. For instance, Fig. 6 shows the values of R2 and Q2 for one of these sub-datasets consisting data with pressure approximately 5 bar, temperature of 313 K and HBA to HBD molar ratio of 1:4. The figures corresponding to other sub-datasets are shown in the supplementary word file.

Figure 6
figure 6

R2 and Q2 for sub-dataset with P = 5 bar, T = 313 K and molar ratio 1:4.

As it is clear from Fig. 6 and Fig. S15, there are several models such that their statistical parameters satisfy the Golbraikh criterion (R2 > 0.6 and Q2 > 0.5)60. The values of descriptors with acceptable statistical parameters are indicated in Table 4. The values of some descriptors (i.e. H6m and RDF065u) are zero for several HBDs. It means that these descriptors are not appropriate for the model development because these descriptors cannot distinguish between some structures. Apart from this point, it is obvious that it is better to choose a descriptor that is not only repeated in all of the sub-datasets, but have acceptable statistical parameters. Therefore, it is confirmed that the selected descriptor (i.e., EEig02d) is an appropriate molecular descriptor in the developed models.

Table 4 Values of descriptors for each involved HBDs.

After model development, the molecular descriptor that appeared in the QSPR models (i.e. “EEig02d”) should be interpreted to explain why it is related to the CO2 solubility in DESs. The “EEig02d” descriptor, developed by Estrada et al.58,61, corresponds to the second eigenvalue of the edge adjacency matrix of the molecule, which is weighted by dipole moments of atoms. The edge adjacency matrix is ​​obtained through a hydrogen-depleted molecular graph, a graph whose nodes are related to the atoms of the molecule and edges are related to the chemical bonds. The molecular graphs are converted into mathematical expression like matrices to correlate the structure and properties quantitatively. The edge-adjacency matrix (EA(G)) of a graph G is defined as follows62:

$${(\mathrm{EA})}_{\mathrm{ij}}=\left\{\begin{array}{l}1\, \quad if\, edges\, {\mathrm{e}}_{\mathrm{i}}\,and\, {\mathrm{e}}_{\mathrm{j}}\, are\, adjacent\\ 0\, \quad otherwise\end{array}\right.$$
(27)
$${(\mathrm{EA})}_{\mathrm{ii}}=0$$
(28)

For the adjacency matrix of a weighted graph, Eq. (27) should be modified as Ref.62:

$${(EA)}_{ij}=\left\{\begin{array}{l}K\,\quad if\, edges\, {\mathrm{e}}_{\mathrm{i}}\, and\, {\mathrm{e}}_{\mathrm{j}}\, are\, adjacent\, and \,{\mathrm{e}}_{\mathrm{j}}\, is\, K-weighted\\ 0\,\quad otherwise\end{array}\right.,$$
(29)

where ei and ej are the chemical bonds, and K is the weights of edges.

Table 5 shows the values of EEig02d along with the molar volume and the molecular structure of all HBDs involved in the datasets. It should be mentioned that the EEig02d descriptor can be related to the molar volume of the molecule58.

Table 5 The values of the EEig02d and molar volume of the different HBDs involved in the datasets.

It is plausible that the values of the EEig02d increase by increasing the length of the alkyl chain of HBD. For example, the value of EEig02d for 1,2-Propanediol with three carbons in its alkyl chain and 1,4-Butanediol and 2,3-Butanediol with four carbons in their alkyl chains are 1.054 and 1.519, respectively. It is also observed that the presence of the ether group also increases the value of the EEig02d descriptor. In this regard, the values of the EEig02d for guaiacol are higher compared to phenol (1.983 versus 1.521), due to the presence of the ether group in guaiacol structure. It should be noted that increasing the length of the alkyl chain results in an increment in the molecular free volume in the DESs. Also, the presence of ether groups increases the flexibility of the alkyl chain and thus leads to an increase in the free volume, and consequently enhances the solubility of CO2 in DES because of the physical nature of absorption (i.e. free volume mechanism)16,20.

Moreover, according to Li et al.24, the increment of pressure and temperature have a positive and a negative effect on the CO2 solubility, respectively. These findings are consistent with the developed models indicated in Table 2 since EEig02d and the pressure have appeared with a positive sign, and the temperature has appeared with a negative sign. The enhancement in CO2 solubility by increasing the length of the alkyl chain group was also demonstrated by experimental works.

Conclusion

In the current study, QSPR approach was employed to develop linear models for predicting the CO2 solubility in the DESs. The main aim was to investigate the effect of the structure of HBDs on the solubility of CO2 in the ChCl-based DESs. The main findings are as follows:

  • It was noteworthy that the same descriptor (i.e. EEig02d) along with ln(P) appeared in all developed models, independent of the effect of temperature. It was found that the EEig02d descriptor is related to the molar volume and dipole moment of a molecule. Examination of the models indicated that the solubility increases with increasing the values of the EEig02d descriptor because there is a direct relationship between physical absorption and the free volume of the molecule.

  • Two general models in HBA to HBD molar ratios equal to 1:3 and 1:4 were constructed by the combination of ln(P), T, and EEig02d as the structural descriptor variable to predict the CO2 solubility in ChCl-based DESs at any desired temperature. These models were examined by further external validation using two additional datasets containing new HBD structures.

  • This study provided reliable and simple QSPR models for predicting the CO2 solubility in ChCl-based DESs, which can be applied in the preliminary screening of the DESs in the PCC processes.