Acessibilidade / Reportar erro

Structure-flammability relationship study of phosphoester dimers by MLR and PLSa Dedicated to the 150th anniversary of the Romanian Academy.

Abstract

Polyphosphonates and polyphosphates having good flame retardancy represent an important class of organophosphorus based polymer additives. In this analysis the flammability of 28 previously synthesized polyphosphoesters, modelled as dimmers, was explored using the multiple linear regression (MLR) and Partial Least Square (PLS) methodology. The statistical quality of the final MLR and PLS models was estimated using the following parameters: the squared correlation coefficient (rtraining2= 0.917 and 0.976), the training root-mean-square errors (RMSEtr = 0.029 and 0.016) and the leave-seven-out cross-validation correlation coefficient (qL7O2= 0.748 and 0.881), respectively. External validation was checked for a test set of seven compounds using several criteria. The MLR models had somewhat inferior fitting results. The final MLR and PLS models can be used for the estimation of limiting oxygen index (LOI) values of new polyphosphoester structures. The presence of phosphonate groups and increasing molecular branching in an isomeric series favour the dimer flammability.

Keywords:
quantitative structure-property relationships; polyphosphonate; polyphosphate; limiting oxygen index; flame retardancy

1 Introduction

An important feature of most commercial polymers is to be non-flammable or flame retardant[11 Irvine, D. J., McCluskey, J. A., & Robinson, I. M. (2000). Fire hazards and some common polymers. Polymer Degradation & Stability, 67(3), 383-396. http://dx.doi.org/10.1016/S0141-3910(99)00127-5.
http://dx.doi.org/10.1016/S0141-3910(99)...
]. Other polymer properties, like as: glass transition temperature, thermal decomposition temperature, etc., have been previously studied by quantitative structure-property relationships[22 Le, T., Epa, V. C., Burden, F. R., & Winkler, D. A. (2012). Quantitative structure–property relationship modeling of diverse materials properties. Chemical Reviews, 112(5), 2889-2919. http://dx.doi.org/10.1021/cr200066h. PMid:22251444.
http://dx.doi.org/10.1021/cr200066h...
,33 Barbosa-da-Silva, R., & Stefani, R. (2013). QSPR based on support vector machines to predict the glass transition temperature of compounds used in manufacturing OLEDs. Molecular Simulation, 39(3), 234-244. http://dx.doi.org/10.1080/08927022.2012.717282.
http://dx.doi.org/10.1080/08927022.2012....
].

Flame retardant polymeric materials containing phosphorus, like poly(alkyl or aryl)phosphonates, display good flame retardancy[44 Troev, K. D. (2012). Polyphosphoesters: chemistry and application (pp. 263-320). London: Elsevier Insights.].

Different polyphosphoesters with fire retardant properties were reported in the literature, being included in materials like: polycarbonates, polyamides, thermosets, etc[55 Chen, L., & Wang, Y. Z. (2010). Aryl polyphosphonates: useful halogen-free flame retardants for polymers. Materials, 3(10), 4746-4760. http://dx.doi.org/10.3390/ma3104746.
http://dx.doi.org/10.3390/ma3104746...
]. The flammability of phosphorous polymers was investigated in order to determine structural–property relationships, too[66 Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7.
http://dx.doi.org/10.1007/s11224-014-047...
,77 Iliescu, S., Avram, E., Visa, A., Plesu, N., Popa, A., & Ilia, G. (2011). New technique for the synthesis of polyphosphoesters. Macromolecular Research, 19(11), 1186-1191. http://dx.doi.org/10.1007/s13233-011-1111-6.
http://dx.doi.org/10.1007/s13233-011-111...
]. Two types (R and S) of chirality were found for the monomer polyphosphoesters, which were geometry optimized using the MMFF94s force field[66 Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7.
http://dx.doi.org/10.1007/s11224-014-047...
]. Multiple linear regression (MLR), artificial neural networks (ANNs) and support vector machines (SVMs) were applied to correlate the limiting oxygen index (LOI) values to the structural calculated descriptors. Good fitting results and predictable models were obtained using the MLR and ANN approaches, the SVM modelling providing the poorest results. It was concluded that the monomer geometry is important for flame retardancy.

Our goal was to develop robust multiple linear regression (MLR) and the partial least squares (PLS) models that select a set of variables that efficiently predict the limiting oxygen index (LOI) values and guide new information on the flammability mechanism of polyphosphoesters[66 Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7.
http://dx.doi.org/10.1007/s11224-014-047...
] dimers. This parallel approach gives the opportunity to compare the quality of results supplied by the two methodologies.

2 Materials and Methods

2.1 Data set

We used a series of 28 previously synthesized polyphosphoesters[66 Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7.
http://dx.doi.org/10.1007/s11224-014-047...
], which were modelled in the present study as dimers. The dataset in this investigation consisted of 28 RR, RS, SR and SS phosphoester dimers for compounds 1 to 14; compounds 15 to 28 had only one chiral centre, at the P2 phosphorous atom (see Figure 1).

Figure 1
Dimer phosphoester structure. RR series: R chiral centre at P1, R chiral centre at P2; RS series: R chiral centre at P1, S chiral centre at P2; SR series: S chiral centre at P1, R chiral centre at P2; SS series: S chiral centre at P1, S chiral centre at P2; compounds 15 to 28 had only one chiral centre, at the P2 phosphorous atom.

Experimental data for the limiting oxygen index (LOI), expressed in % (Table 1), and used as dependent variable in this study, was previous reported in references[66 Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7.
http://dx.doi.org/10.1007/s11224-014-047...
] and[77 Iliescu, S., Avram, E., Visa, A., Plesu, N., Popa, A., & Ilia, G. (2011). New technique for the synthesis of polyphosphoesters. Macromolecular Research, 19(11), 1186-1191. http://dx.doi.org/10.1007/s13233-011-1111-6.
http://dx.doi.org/10.1007/s13233-011-111...
]. Dimer molecular structures were built using the Marvin program[88 ChemAxon. (2015). Marvin sketch 15.2.16 software. Záhony: ChemAxon. Retrieved in 27 April 2015, from http://www.chemaxon.com
http://www.chemaxon.com...
], which was used for drawing, displaying and characterizing chemical structures. Dimer conformers were pre-optimized using the 94s variant of the MMFF (Merck Molecular force field)[99 Halgren, T. A. (1999). MMFF VI.MMFF94s option for energy minimization studies. Journal of Computational Chemistry, 20(7), 720-729. http://dx.doi.org/10.1002/(SICI)1096-987X(199905)20:7<720::AID-JCC7>3.0.CO;2-X.
http://dx.doi.org/10.1002/(SICI)1096-987...
] with coulomb interactions and the attractive part of the van der Waals interactions, included in the OMEGA software[1010 OpenEye Scientific2013OMEGA version 2.5.1.4 softwareSanta FeOpenEye ScientificRetrieved in 29 April 2015, from http://www.eyesopen.com
http://www.eyesopen.com...

11 Hawkins, P. C. D., Skillman, A. G., Warren, G. L., Ellingson, B. A., & Stahl, M. T. (2010). Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. Journal of Chemical Information and Modeling, 50(4), 572-584. http://dx.doi.org/10.1021/ci100031x. PMid:20235588.
http://dx.doi.org/10.1021/ci100031x...
-1212 Hawkins, P. C. D., & Nicholls, A. (2012). Conformer generation with OMEGA: learning from the data set and the analysis of failures. Journal of Chemical Information and Modeling, 52(11), 2919-2936. http://dx.doi.org/10.1021/ci300314k. PMid:23082786.
http://dx.doi.org/10.1021/ci300314k...
]. The following parameters were used for the conformer generation: a maximum of 400 conformers per compound, an energy cut-off of 10 kcal/mol relative to a global minimum identified from the search. SMILES notation was used as program input. The stereoisomers were generated using the ‘Flipper’ utility inside the Omega program. To avoid redundant conformers, any conformer having a RMSD fit outside 0.5 Å to another conformer was removed.

Table 1
Experimental and predicted LOI values, structural descriptors included in the final MLR_RR model.

2.2 Molecular descriptor calculation

Molecular descriptors were calculated for the optimized dimer structures, using the DRAGON[1313 Talete SRL2007Dragon professional 5.5 softwareMilanoTalete SRLRetrieved in 4 May 2015, from http://www.talete.mi.it
http://www.talete.mi.it...
] and InstantJchem (which was used for structure database management, search and prediction)[1414 ChemAxon2015Instant JChem 15.2.23 softwareZáhonyChemAxonRetrieved in 4 May 2015, from http://www.chemaxon.com
http://www.chemaxon.com...
] software. The 1511 Dragon molecular descriptors were divided into twenty-two logical blocks, as follows: constitutional descriptors, topological descriptors (MSD-mean square distance index (Balaban), PW4-path/walk 4 - Randic shape index), walk and path counts, connectivity indices, information indices (IC5-information content index (neighborhood symmetry of 5-order)), 2D autocorrelations (Gats5e-Geary autocorrelation - lag 5/weighted by atomic Sanderson electronegativities), edge adjacency indices (EEig09d-Eigenvalue 09 from edge adj. matrix weighted by dipole moments), BCUT descriptors, topological charge indices (GGI1-topological charge index of order 1, JGI2-mean topological charge index of order2), eigenvalue based indices, Randic molecular profiles, geometrical descriptors, RDF descriptors, 3D-MoRSE descriptors (Mor15e-3D-MoRSE - signal 15/weighted by atomic Sanderson electronegativities, Mor13p-3D-MoRSE - signal 13/weighted by atomic polarizabilities Mor13m-3D-MoRSE - signal 13/weighted by atomic masses), WHIM descriptors, GETAWAY descriptors (R2m+ - R maximal autocorrelation of lag 2/weighted by atomic masses), functional group counts (nP(=O)O2R-number of phosphonates), atom-centered fragments, charge descriptors, molecular properties, 2D binary fingerprints, and 2D frequency fingerprints. Then the molecular descriptors were verified and constant or near-constant variables were eliminated. The calculated molecular descriptors play a fundamental role in transforming the chemical information into a numerical code suitable for application in computation[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.].

2.3 Training and test set selection

The series of phosphoester dimers were divided into training and test set using several approaches: the partition against medoids (PAM) algorithm[1616 Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.] (“cluster” package available in R[1717 R Development Core Team2011R: A language and environment for statistical computing. Version 2.13.1ViennaR Foundation for Statistical ComputingRetrieved in 11 May 2015, from www.r-project.org] based on the Euclidian distance), the decreasing response order and the random splitting. In order to use same test set in both MLR and PLS approaches, seven out of twenty eight (25%) phosphoester dimers (compounds 2, 10, 11, 15, 17, 19 and 22, see Figure 1) were chosen as test set to validate the final models. The data structures and the LOI range values (in %), comprised in the test set (0.22-0.50) and the training set (0.18-0.55), are commensurate.

2.4 Multiple Linear Regression (MLR) and Partial Least Square (PLS)

Multiple linear regression (MLR)[1818 Wold, S., &Dunn, W. J.3rd (1983). Multivariate quantitative structure-activity relationships (QSAR):conditions for their applicability. Journal of Chemical Information and Computer Sciences, 23(1), 6-13. http://dx.doi.org/10.1021/ci00037a002.
http://dx.doi.org/10.1021/ci00037a002...
] has been applied after variable selection carried out by means of a genetic algorithm included in the QSARINS v. 2.2 program[1919 Chirico, N., Papa, E., Kovarich, S., Cassani, S., & Gramatica, P.2012QSARINS, software for QSAR MLR model development and validationVareseUniversity of Insubria/QSAR Res Unit in Environ Chem and Ecotox., DiSTARetrieved in 11 May 2015, from http://www.qsar.it
http://www.qsar.it...
,2020 Gramatica, P., Chirico, N., Papa, E., Cassani, S., & Kovarich, S. (2013). A new software for the development, analysis, and validation of QSAR MLR models. Journal of Computational Chemistry, 34(24), 2121-2132. http://dx.doi.org/10.1002/jcc.23361.
http://dx.doi.org/10.1002/jcc.23361...
] using the RQK fitness function, with leave-one-out cross-validation correlation coefficient, which constrained the function to be optimized. In MLR, the number of 1549 calculated descriptors is too high compared to the number of compounds (N = 28) and an appropriate variable selection method was required. MLR calculations were carried out separately for each dataset: RR, RS, SR, SS.

In MLR calculations the structural data was normalized based on the autoscaling method, which can be described as:

X T m j = X m j X ¯ m S m (1)

where for each variable m, XTmj and Xmj are the j values for the m variable after and before scaling, respectively, X¯mis the mean and Sm the standard deviation of the variable.

The PLS methodology is a generalization of the MLR one, having as main advantage the possibility to analyze the data with correlated, noise, and large number of independent variables[2121 Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. http://dx.doi.org/10.1016/S0169-7439(01)00155-1.
http://dx.doi.org/10.1016/S0169-7439(01)...
]. In the PLS equation the latent variables were transformed as function of the original Xij (i =1, 2,..., N; j=1, 2,..., K) variables, resulting following equation:

Y ^ i = b 0 + b 1 X i 1 + b 2 X i 2 + ... + b j X i j + ... + b k X i k (2)

where Ŷi represents the calculated dependent variable, and bj the PLS coefficients. The obtained models were optimized by a procedure of outlier detection and based on variables with significant coefficients different from zero. When the variable selection was achieved, only the significant descriptors with coefficients different from zero were preserved in the final models (for noise elimination).

Both methologies have as main goal to find out a mathematical model with minimum number of parameters and with good estimation capability.

2.5 Model validation

For the external validation of the MLR and PLS models several parameters were calculated: QF12[2222 Shi, L. M., Fang, H., Tong, W., Wu, J., Perkins, R., Blair, R. M., Branham, W. S., Dial, S. L., Moland, C. L., & Sheehan, D. M. (2001). QSAR models using a large diverse set of estrogens. Journal of Chemical Information and Modeling, 41(1), 186-195. http://dx.doi.org/10.1021/ci000066d. PMid:11206373.
http://dx.doi.org/10.1021/ci000066d...
],QF22[2323 Schüürmann, G., Ebert, R. U., Chen, J., Wang, B., & Kuhne, R. (2008). External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean. Journal of Chemical Information and Modeling, 48(11), 2140-2145. http://dx.doi.org/10.1021/ci800253u. PMid:18954136.
http://dx.doi.org/10.1021/ci800253u...
],QF32[2424 Consonni, V., Ballabio, D., & Todeschini, R. (2009). Comments on the definition of the Q2 parameter for QSAR validation. Journal of Chemical Information and Modeling, 49(7), 1669-1678. http://dx.doi.org/10.1021/ci900115y. PMid:19527034.
http://dx.doi.org/10.1021/ci900115y...
] (models with values higher than 0.7 were considered acceptable), the CCCext (the concordance correlation coefficient, with satisfactory values higher than 0.85)[2525 Chirico, N., & Gramatica, P. (2011). Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. Journal of Chemical Information and Modeling, 51(9), 2320-2335. http://dx.doi.org/10.1021/ci200211n. PMid:21800825.
http://dx.doi.org/10.1021/ci200211n...
], RMSEext (root-mean-square errors) and MAEext (mean absolute error)[2626 Goodarzi, M., Deshpande, S., Murugesan, V., Katti, S. B., & Prabhakar, Y. S. (2009). Is feature selection essential for ANN modeling? QSAR & Combinatorial Science, 28(11-12), 1487-1499. http://dx.doi.org/10.1002/qsar.200960074.
http://dx.doi.org/10.1002/qsar.200960074...
] and Rpred2(a higher limit than 0.5 was considered as acceptable)[2727 Roy, P. P., Paul, S., Mitra, I., &Roy, K. (2009). On two novel parameters for validation of predictive QSAR models. Molecules, 14(5), 1660-1701. http://dx.doi.org/10.3390/molecules14051660. PMid:19471190.
http://dx.doi.org/10.3390/molecules14051...
]. The comparable thresholds used in this study for different validation criteria have been rigorously previously determined[2525 Chirico, N., & Gramatica, P. (2011). Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. Journal of Chemical Information and Modeling, 51(9), 2320-2335. http://dx.doi.org/10.1021/ci200211n. PMid:21800825.
http://dx.doi.org/10.1021/ci200211n...
,2828 Chirico, N., & Gramatica, P. (2012). Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. Journal of Chemical Information and Modeling, 52(8), 2044-2058. http://dx.doi.org/10.1021/ci300084j. PMid:22721530.
http://dx.doi.org/10.1021/ci300084j...
]. Other statistical parameters[2929 Tropsha, A., & Golbraikh, A. (2010). Predictive quantitative structure–activity relationships modeling: development and validation of QSAR models. In J. L. Faulon & A. Bender (Ed). Handbook of chemoinformatics algorithms (pp. 213-233). London: Chapman & Hall/CRC.] were used for the external test set: (i) squared correlation coefficient (rtest2) between the predicted and observed activities as well as squared correlation coefficient by cross-validation (q2); (ii) coefficient of determination for linear regressions with intercepts set to zero, i.e. r02 (predicted versus observed activities), and r0'2 (observed versus predicted activities); (iii) slopes k and k’ of the above mentioned two regression lines. The following conditions should be satisfied for a model with acceptable predictive ability:

q 2 > 0.5 (3)
r t e s t 2 > 0.6 (4)
( r 2 r 0 2 ) r 2 < 0.1 a n d 0.85 k 1.15 (5)
( r 2 r 0 ' 2 ) r 2 < 0.1 a n d 0.85 k ' 1.15 (6)
| r 0 2 r 0 ' 2 | < 0.3 (7)

For the internal validation of the final models other parameters were employed: rtraining2 (determination coefficient), qL7O2 (leave-seven-out cross-validation coefficient; values higher than 0.7 were considered as acceptable), qLOO2 (leave-one-out cross-validation coefficient), RMSEtr, MAEtr and CCCtr, calculated for the training set.

In the mean time higher rtraining2 values must be accompanied by q2 values as close to the rtraining2ones as possible[3030 Gramatica, P. (2007). Principles of QSAR models validation: internal and external. QSAR & Combinatorial Science, 26(5), 694-701. http://dx.doi.org/10.1002/qsar.200610151.
http://dx.doi.org/10.1002/qsar.200610151...
] (to avoid over fitting, which was, also, checked by the RMSE and MAE values).

The risk of chance correlation was, also, verified by the Y-scrambling procedure (rScr2 andqScr2) and must have lower values than the original model. For calculation of rScr2 and qScr2 this process was repeated 999 times in case of PLS calculations and 2000 times in the MLR ones.

After the check of all validation parameters, the applicability domain for the models is required, because robust and validated models cannot be expected to reliably predict the modelled property for any type of compounds. The applicability domain is a theoretical region in physicochemical of response and chemical structure space for which a QSAR model should make predictions with a given reliability[3030 Gramatica, P. (2007). Principles of QSAR models validation: internal and external. QSAR & Combinatorial Science, 26(5), 694-701. http://dx.doi.org/10.1002/qsar.200610151.
http://dx.doi.org/10.1002/qsar.200610151...
]. In the applicability domain only the predictions for those compounds that fall within this domain can be considered as reliable, not extrapolations of the model. In the Williams plot the standardized residuals versus the leverages (hi) was exploited to visualize the applicability domain for our final MLR models.

3 Results and Discussions

The major objective of this paper was the estimation of limiting oxygen index (LOI) of phosphoester dimers using molecular descriptors that can be computed directly from molecular structure and guide new information on the flammability mechanism.

3.1 MLR results

The relationship between the molecular descriptors and LOI values of the dimer derivatives is illustrated by the following Equations 8-11:

RR model

L O I = 0.56 ( ± 0.03 ) - 0.23 ( ± 0.03 ) M S D - 0.19 ( ± 0.03 ) E E i g 09 d + 0.20 ( ± 0.03 ) R 2 m + + 0.05 ( ± 0.02 ) n P ( = O ) O 2 R S E E = 0.03 r a d j 2 = 0.896 F = 44.09 q L O O 2 = 0.864 (8)

RS model

L O I = 0.55 ( ± 0.05 ) + 0.18 ( ± 0.04 ) P W 4 - 0.26 ( ± 0.05 ) G G I 1 + 0.16 ( ± 0.06 ) J G I 2 - 0.35 ( ± 0.06 ) M o r 13 m S E E = 0.05 r a d j 2 = 0.787 F = 19.48 q L O O 2 = 0.745 (9)

SR model

L O I = 0.38 ( ± 0.02 ) + 0.19 ( ± 0.04 ) I C 5 + 0.13 ( ± 0.05 ) M o r 15 e - 0.33 ( ± 0.04 ) M o r 13 p S E E = 0.04 r a d j 2 = 0.839 F = 35.60 q L O O 2 = 0.790 (10)

SS model

L O I = 0.52 ( ± 0.04 ) + 0.18 ( ± 0.04 ) I C 5 - 0.15 ( ± 0.04 ) G A T S 5 e - 0.29 ( ± 0.03 ) M o r 13 p S E E = 0.04 r a d j 2 = 0.870 F = 45.41 q L O O 2 = 0.831 (11)

where SEE represents the standard error of estimates, radj2- the adjusted r2, F- the Fischer test, qLOO2-leave-one-out cross-validation coefficient. Other statistical results of models 8-11 are included in Tables 2, 3, 4.

Table 2
Internal validation parameters of the MLR and PLS models (training set).
Table 3
External validation parameters of the MLR and PLS models (test set).
Table 4
Golbraikh and Tropsha criteria[2929 Tropsha, A., & Golbraikh, A. (2010). Predictive quantitative structure–activity relationships modeling: development and validation of QSAR models. In J. L. Faulon & A. Bender (Ed). Handbook of chemoinformatics algorithms (pp. 213-233). London: Chapman & Hall/CRC.] calculated for external validation of the MLR and PLS models (test set).

The Williams (of the standardized residuals versus the leverage) plot was used to visualize the applicability domain of the final best MLR_RR model (Figure 2). This plot confirms the absence of outliers and influential points. All compounds were located within the applicability domain and were predicted accurately.

Figure 2
Williams plot: standardized residuals of the MLR_RR model versus leverages, predicted by fitting. Training compounds are marked by white circles and test compounds by black circles.

The MLR_RR model is completely satisfactory in the fitting and has high predictive power. The LOO (leave-one-out) cross-validation highlights that the model is stable, not obtained by chance, in fact the difference between rtraining2 and qLOO2 is small: 5.3%. This model is internally predictive with differences between qLMO2 and qLOO2 of -4.5%, and between rtraining2 and rLMO2 of 9.8%.

The risk of chance correlation was, also, verified by the Y-scrambling procedure. The extremely low calculated rScr2and qScr2 scrambling values (Table 2) indicate no chance correlation for the chosen models.

The RMSE (root-mean-square error) values for the training and validation sets are similar. The chosen MLR_RR model demonstrate a satisfactory stability in internal validation, has high fitting, internal and external predictive power.

The high values of QF12, QF22, QF32and CCCext external validation parameters (see section 2.5) included in Table 3 and all calculated terms of Golbraikh and Tropsha (Table 4) confirm the predictive power of all MLR models.

Better statistical results and a more stable model to simulate polymer flammability were noticed in case of the RR dataset model compared to the others.

The edge adjacency matrix encodes information about the connectivity between graph edges[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.]. EEig09d (eigenvalue 09 from edge adj. matrix weighted by dipole moments) takes into account the molecular polarity, being unfavourable for dimer flame retardancy.

The mean square distance index, denoted as MSD[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.], is calculated from the second-order distance distribution moment[3131 Balaban, A. T. (1983). Topological indices based on topological distances in molecular graphs. Pure and Applied Chemistry, 5(2), 199-206. http://dx.doi.org/10.1351/pac198855020199.
http://dx.doi.org/10.1351/pac19885502019...
]. The MSD index decreases with increasing molecular branching in an isomeric series, which is favourable for dimer flammability.

GETAWAYs (Geometry, Topology, and Atom-Weights Assembly) are geometrical descriptors encoding information on the effective position of substituents and fragments in the molecular space[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.]. Moreover, they are independent of molecule alignment and they, also, account to some extent for information on molecular size and shape as well as for specific atomic properties. Increased R2m+ (R maximal autocorrelation of lag 2/weighted by atomic masses) values favour the dimer flammability. Compounds containing phosphonate groups are favourable for the dimer flame retardancy.

3.2 PLS results

PLS calculations were performed with SIMCA-P+12[3232 Umetrics AB2013SIMCA-P+ version 12.0 softwareUmeaUmetricsRetrieved in 18 May 2015, from http://www.umetrics.com
http://www.umetrics.com...
] program using 21 stereoisomers as a training set and 7 stereoisomers as a test set with the taken ratio of 75% for training set and 25% for test set in whole series of compounds. The large difference between the rtraining2 and qL7O2 values of the first calculated PLS model (lower than 0.3 is accepted) demonstrated the model over fit, and suggested the need for enhancement of the model quality. Therefore, the noise variables (with insignificant coefficient values) have been removed. Several PLS models were developed for the RR, RS, SR and SS datasets to increase their predictive power. In the final PLS_SS model compound 5 was omitted, being found as outlier, in accordance to the Hotelling’s T2 range plot[3232 Umetrics AB2013SIMCA-P+ version 12.0 softwareUmeaUmetricsRetrieved in 18 May 2015, from http://www.umetrics.com
http://www.umetrics.com...
].

The final (four-components for the RR, RS, SR datasets and two-components for the SS dataset) PLS models are satisfactory in the fitting (Table 2). The over fitting of the models was exceeded by the remarkable high and close values of rtraining2 and qL7O2, and was, also, checked by the RMSE and MAE (mean absolute error) parameters. In the same time similar RMSE values for the training and validation sets are observed (Tables 2 and 3).

PLS models with predictive power were obtained (see Tables 3 and 4), except the PLS_SS one, as seen from the values of QF12, QF22, QF32 and CCCext parameters. The predicted LOI values for the RR dataset are given in Table 1.

The PLS models were internally validated using, also, 999 permutations in Y-scrambling. The calculated rScr2and qScr2 scrambling values (Table 2) indicate no chance correlation for the chosen models.

In the PLS modelling the terms having VIP values greater than 1 are the most relevant for explaining the dependent variable, and usually only these descriptors were interpreted. The descriptors showing the largest VIP values can simulate polymer flammability and are discussed below.

For all models higher values of the Randic shape index (-path/walk 4 and path/walk 5 - PW4 and PW5) are favourable for the flammability, while the MSD (Balaban mean square distance index) descriptor is unfavourable for flammability. They are topological descriptors obtained from molecular graph[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.].

Another group of significant descriptors is the class of 2D autocorrelation descriptors, which are computed from molecular graph as the sum of products of atom weights of the terminal atoms of all the paths for the considered path length (the so called lag)[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.]. The most important 2D autocorrelation descriptors involved in our model are the Geary parameters. The positive coefficients of GATS6m - Geary autocorrelation of lag 6 weighted by mass, increase the flame retardancy of RR, RS and SR series, while for SS dimers, the same effect was observed for descriptor GATS5v - Geary autocorrelation of lag 5 weighted by van der Waals volume.

The 3D-MoRSE descriptors provide 3D information from atomic coordinates using the same transform as in electron diffraction (which uses them to prepare theoretical scattering curves)[1515 Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.]. For the RR and SR datasets, Mor13m- signal 13/weighted by mass, decrease the flame retardancy, for the RS dataset Mor15m- signal 15/weighted by mass is favourable for flammability, while for SS dataset these descriptors are insignificant.

Class of topological and frequency fingerprints descriptors are expressed as sum of topological distances between two elements or frequency of two atoms at a topological distance. Descriptors T(O..P) - the sum of topological distances between O..P, F07[O-S] – the frequency of O - S at topological distance 7, and F10[C-S] - the frequency of C - S at topological distance 10, with negative coefficients are unfavourable for flammability for RR, RS and SS datasets.

Three GETAWAY descriptors: one in the RR set: R5m- the R autocorrelation of lag 5/weighted by mass, and two in the SR set: HATS6v – the leverage-weighted autocorrelation of lag 6/weighted by van der Waals volume and HATS6p- leverage-weighted autocorrelation of lag 6/weighted by polarizability increase the dimer flame retardancy.

Better fitting and predictivity results were obtained by PLS calculations compared to the MLR ones. From MLR and PLS models better statistical results were observed in case of the RR series. Therefore R chirality of phosphorous atom is significant for dimer flammability. The final selected structural descriptors included in the MLR_RR model have VIP values > 1 in the PLS_RR model: EEig09d, VIP = 1.358, CoeffCS = -0.0086 (±0.0022), MSD, VIP = 1.670, CoeffCS = -0.0096 (±0.0021), R2m, VIP = 1.058, CoeffCS = 0.0025 (±0.0022) and nP(=O)O2R, VIP = 0.994, CoeffCS = 0.0059 (±0.0030).

Compared to the MLR previously published monomer models[66 Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7.
http://dx.doi.org/10.1007/s11224-014-047...
], the statistical results for fitting are improved in case of MLR and PLS dimer models. Additional structural information which influences the flame retardancy was included in the final dimer MLR (e.g. the number of phosphonates) and PLS (e.g. 2D frequency fingerprints) models.

4 Conclusions

The MLR and PLS models developed for this series of dimer phosphoesters will be helpful to predict the LOI values of new untested compounds. Better statistical results and a more stable model to simulate polymer flammability were noticed in case of the RR dataset compared to the others, the presence of R chiral centre at the phosphorous atom being important for the dimer flammability. The mean square distance index and GETAWAY descriptors favour the dimer flammability, as well as increased number of phosphonates included in the dimer structure, as derived from both MLR and PLS methodologies. Better PLS fitting and predictivity results were obtained compared to the MLR ones for all datasets, except for the SS one.

Dimers including structures with R chiral centres gave more stable and predictive models compared to the previously published MLR monomer ones.

New structural information which influences the flame retardancy was included in the final MLR and PLS dimer models.

5 Acknowledgements

This project was financially supported by Project 1.1 of the Institute of Chemistry Timisoara of the Romanian Academy. The authors are indebted to Chemaxon Ltd., OpenEye Ltd. and Prof. Paola Gramatica from The University of Insubria (Varese, Italy) for giving access to their software.

6. References

  • 1
    Irvine, D. J., McCluskey, J. A., & Robinson, I. M. (2000). Fire hazards and some common polymers. Polymer Degradation & Stability, 67(3), 383-396. http://dx.doi.org/10.1016/S0141-3910(99)00127-5
    » http://dx.doi.org/10.1016/S0141-3910(99)00127-5
  • 2
    Le, T., Epa, V. C., Burden, F. R., & Winkler, D. A. (2012). Quantitative structure–property relationship modeling of diverse materials properties. Chemical Reviews, 112(5), 2889-2919. http://dx.doi.org/10.1021/cr200066h PMid:22251444.
    » http://dx.doi.org/10.1021/cr200066h
  • 3
    Barbosa-da-Silva, R., & Stefani, R. (2013). QSPR based on support vector machines to predict the glass transition temperature of compounds used in manufacturing OLEDs. Molecular Simulation, 39(3), 234-244. http://dx.doi.org/10.1080/08927022.2012.717282
    » http://dx.doi.org/10.1080/08927022.2012.717282
  • 4
    Troev, K. D. (2012). Polyphosphoesters: chemistry and application (pp. 263-320). London: Elsevier Insights.
  • 5
    Chen, L., & Wang, Y. Z. (2010). Aryl polyphosphonates: useful halogen-free flame retardants for polymers. Materials, 3(10), 4746-4760. http://dx.doi.org/10.3390/ma3104746
    » http://dx.doi.org/10.3390/ma3104746
  • 6
    Funar-Timofei, S., Iliescu, S., & Suzuki, T. (2014). Correlations of limiting oxygen index with structural polyphosphoester features by QSPR approaches. Structural Chemistry, 25(6), 1847-1863. http://dx.doi.org/10.1007/s11224-014-0474-7
    » http://dx.doi.org/10.1007/s11224-014-0474-7
  • 7
    Iliescu, S., Avram, E., Visa, A., Plesu, N., Popa, A., & Ilia, G. (2011). New technique for the synthesis of polyphosphoesters. Macromolecular Research, 19(11), 1186-1191. http://dx.doi.org/10.1007/s13233-011-1111-6
    » http://dx.doi.org/10.1007/s13233-011-1111-6
  • 8
    ChemAxon. (2015). Marvin sketch 15.2.16 software. Záhony: ChemAxon. Retrieved in 27 April 2015, from http://www.chemaxon.com
    » http://www.chemaxon.com
  • 9
    Halgren, T. A. (1999). MMFF VI.MMFF94s option for energy minimization studies. Journal of Computational Chemistry, 20(7), 720-729. http://dx.doi.org/10.1002/(SICI)1096-987X(199905)20:7<720::AID-JCC7>3.0.CO;2-X
    » http://dx.doi.org/10.1002/(SICI)1096-987X(199905)20:7<720::AID-JCC7>3.0.CO;2-X
  • 10
    OpenEye Scientific2013OMEGA version 2.5.1.4 softwareSanta FeOpenEye ScientificRetrieved in 29 April 2015, from http://www.eyesopen.com
    » http://www.eyesopen.com
  • 11
    Hawkins, P. C. D., Skillman, A. G., Warren, G. L., Ellingson, B. A., & Stahl, M. T. (2010). Conformer generation with OMEGA: algorithm and validation using high quality structures from the Protein Databank and Cambridge Structural Database. Journal of Chemical Information and Modeling, 50(4), 572-584. http://dx.doi.org/10.1021/ci100031x PMid:20235588.
    » http://dx.doi.org/10.1021/ci100031x
  • 12
    Hawkins, P. C. D., & Nicholls, A. (2012). Conformer generation with OMEGA: learning from the data set and the analysis of failures. Journal of Chemical Information and Modeling, 52(11), 2919-2936. http://dx.doi.org/10.1021/ci300314k PMid:23082786.
    » http://dx.doi.org/10.1021/ci300314k
  • 13
    Talete SRL2007Dragon professional 5.5 softwareMilanoTalete SRLRetrieved in 4 May 2015, from http://www.talete.mi.it
    » http://www.talete.mi.it
  • 14
    ChemAxon2015Instant JChem 15.2.23 softwareZáhonyChemAxonRetrieved in 4 May 2015, from http://www.chemaxon.com
    » http://www.chemaxon.com
  • 15
    Todeschini, R., Consonni, V., Mannhold, R., Kubinyi, H., & Folkers, G. (Eds.). (2009). Molecular descriptors for chemoinformatics. Weinheim: Wiley – VCH.
  • 16
    Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: an introduction to cluster analysis. New York: Wiley.
  • 17
    R Development Core Team2011R: A language and environment for statistical computing. Version 2.13.1ViennaR Foundation for Statistical ComputingRetrieved in 11 May 2015, from www.r-project.org
  • 18
    Wold, S., &Dunn, W. J.3rd (1983). Multivariate quantitative structure-activity relationships (QSAR):conditions for their applicability. Journal of Chemical Information and Computer Sciences, 23(1), 6-13. http://dx.doi.org/10.1021/ci00037a002
    » http://dx.doi.org/10.1021/ci00037a002
  • 19
    Chirico, N., Papa, E., Kovarich, S., Cassani, S., & Gramatica, P.2012QSARINS, software for QSAR MLR model development and validationVareseUniversity of Insubria/QSAR Res Unit in Environ Chem and Ecotox., DiSTARetrieved in 11 May 2015, from http://www.qsar.it
    » http://www.qsar.it
  • 20
    Gramatica, P., Chirico, N., Papa, E., Cassani, S., & Kovarich, S. (2013). A new software for the development, analysis, and validation of QSAR MLR models. Journal of Computational Chemistry, 34(24), 2121-2132. http://dx.doi.org/10.1002/jcc.23361
    » http://dx.doi.org/10.1002/jcc.23361
  • 21
    Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. http://dx.doi.org/10.1016/S0169-7439(01)00155-1
    » http://dx.doi.org/10.1016/S0169-7439(01)00155-1
  • 22
    Shi, L. M., Fang, H., Tong, W., Wu, J., Perkins, R., Blair, R. M., Branham, W. S., Dial, S. L., Moland, C. L., & Sheehan, D. M. (2001). QSAR models using a large diverse set of estrogens. Journal of Chemical Information and Modeling, 41(1), 186-195. http://dx.doi.org/10.1021/ci000066d PMid:11206373.
    » http://dx.doi.org/10.1021/ci000066d
  • 23
    Schüürmann, G., Ebert, R. U., Chen, J., Wang, B., & Kuhne, R. (2008). External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean. Journal of Chemical Information and Modeling, 48(11), 2140-2145. http://dx.doi.org/10.1021/ci800253u PMid:18954136.
    » http://dx.doi.org/10.1021/ci800253u
  • 24
    Consonni, V., Ballabio, D., & Todeschini, R. (2009). Comments on the definition of the Q2 parameter for QSAR validation. Journal of Chemical Information and Modeling, 49(7), 1669-1678. http://dx.doi.org/10.1021/ci900115y PMid:19527034.
    » http://dx.doi.org/10.1021/ci900115y
  • 25
    Chirico, N., & Gramatica, P. (2011). Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. Journal of Chemical Information and Modeling, 51(9), 2320-2335. http://dx.doi.org/10.1021/ci200211n PMid:21800825.
    » http://dx.doi.org/10.1021/ci200211n
  • 26
    Goodarzi, M., Deshpande, S., Murugesan, V., Katti, S. B., & Prabhakar, Y. S. (2009). Is feature selection essential for ANN modeling? QSAR & Combinatorial Science, 28(11-12), 1487-1499. http://dx.doi.org/10.1002/qsar.200960074
    » http://dx.doi.org/10.1002/qsar.200960074
  • 27
    Roy, P. P., Paul, S., Mitra, I., &Roy, K. (2009). On two novel parameters for validation of predictive QSAR models. Molecules, 14(5), 1660-1701. http://dx.doi.org/10.3390/molecules14051660 PMid:19471190.
    » http://dx.doi.org/10.3390/molecules14051660
  • 28
    Chirico, N., & Gramatica, P. (2012). Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. Journal of Chemical Information and Modeling, 52(8), 2044-2058. http://dx.doi.org/10.1021/ci300084j PMid:22721530.
    » http://dx.doi.org/10.1021/ci300084j
  • 29
    Tropsha, A., & Golbraikh, A. (2010). Predictive quantitative structure–activity relationships modeling: development and validation of QSAR models. In J. L. Faulon & A. Bender (Ed). Handbook of chemoinformatics algorithms (pp. 213-233). London: Chapman & Hall/CRC.
  • 30
    Gramatica, P. (2007). Principles of QSAR models validation: internal and external. QSAR & Combinatorial Science, 26(5), 694-701. http://dx.doi.org/10.1002/qsar.200610151
    » http://dx.doi.org/10.1002/qsar.200610151
  • 31
    Balaban, A. T. (1983). Topological indices based on topological distances in molecular graphs. Pure and Applied Chemistry, 5(2), 199-206. http://dx.doi.org/10.1351/pac198855020199
    » http://dx.doi.org/10.1351/pac198855020199
  • 32
    Umetrics AB2013SIMCA-P+ version 12.0 softwareUmeaUmetricsRetrieved in 18 May 2015, from http://www.umetrics.com
    » http://www.umetrics.com
  • Dedicated to the 150th anniversary of the Romanian Academy.

Publication Dates

  • Publication in this collection
    14 June 2016
  • Date of issue
    Apr-Jun 2016

History

  • Received
    20 July 2015
  • Reviewed
    23 Sept 2015
  • Accepted
    06 Nov 2015
Associação Brasileira de Polímeros Rua São Paulo, 994, Caixa postal 490, São Carlos-SP, Tel./Fax: +55 16 3374-3949 - São Carlos - SP - Brazil
E-mail: revista@abpol.org.br