Vibration frequencies using least squares and neural networks for 50 new s and p electron diatomics

https://doi.org/10.1016/j.jqsrt.2009.08.004Get rights and content

Abstract

Least-squares forecasts for vibration frequencies of diatomic molecules, most with 10–12 valence electrons, are combined with those obtained from neural networks, both trained on critical data. It is required that the standard deviation bounds of the one prediction lie within the bounds of the other; this requirement results in 69 molecules, 50 of which may not have been studied before. The composite standard deviations of the composite predictions average 5.9%, so there is a 68% chance that each of these 50 predictions will prove to be within 5.9% of its ultimately correct value. As a test, 28 literature values, for 12 of the molecules, were found; of these 28 values, 78.6% fall between the lower and upper composite standard deviation limits.

Introduction

Once the architecture of the Kronecker product periodic system for diatomic molecules was proposed [1], [2] it became obligatory to show its consistency with data for as many properties as possible and, furthermore, that it allows the prediction of new data. The predictions for neutral, gas-phase, ground-state species [3] thus far did not achieve the desired state where approximately 68% of independently obtained test data lay within the standard deviations of the predicted data.

The desired state has now been reached for ground-state vibration frequencies (ωe) by combining the results of a new least-squares (LS) analysis with a previous neural-network (NN) analysis. Least squares necessarily favor data that are smooth, whereas neural networks can fit irregularities like “data cliffs” [4]; in this study we use both methods. We require that the interval between the lower and upper σ limits of a NN prediction lie totally inside of the analogous interval of the LS prediction for the same molecule, or vice versa.

This paper presents composite predicted values for ωe of 69 molecules which satisfy the requirement. The average of their composite standard deviations (σc) is 9.4%. Twenty-eight values, for 12 of the molecules, were found in the literature; of these 28 values, 22 fall within the intervals between the lower and upper σc limits.

Section snippets

Theory and conjectures for the least-squares analysis

The construction and verification of the Kronecker-product periodic system of diatomic molecules have been presented in many publications [1], [2], [5], [6]. The essence of the system is that the atomic period and group numbers of the two atoms (R1, C1, R2, and C2) should be used as independent variables when systematics are done, not the two atomic numbers (Z1 and Z2).

The group numbers: Fig. 1 shows the prototype space (R1,R2,)=(2,2). Since only main-group atoms are being considered in this

Data for the least-squares analysis

Neutral ground-state main-group molecular ωe was taken from the 2009 Handbook of Physics and Chemistry (HPC) [9]. Molecules from periods 2 to 6 are included; those consisting of one or two rare-gas atoms have close to zero ωe and are excluded; those consisting of two alkali–earth atoms are also excluded because their ωe are in a local minimum; altogether 147 data are included. The data are arranged in a spreadsheet by R1, R2, C1, and C2. When R1=R2, then the rule C1C2 includes all molecules

Procedure for the least-squares analysis

Eq. (1) was used to compute ωe for each set of molecules with fixed (R1, R2); then the slope of the linear trend line passing through the origin of a graph like Fig. 2 was found. There were insufficient data points to define a good trend line for (R1, R2)=(5, 6) and (6, 6). The point for the alkali–metal pair (C1, C2)=(1, 1)—if present—is usually farthest from the line, so it is omitted (and the trend line adjusts itself). This omission is the second of several [C1 or C2=0 or 8 being the first,

Predictions and tests with HPC least-squares data set

A correlation of the 147 normalized predictions with the associated HPC entries is shown in Fig. 4. The percent errors (differences between normalized predictions and associated HPC data) for all the molecules were ranked in order of increasing magnitude and have the distribution shown in Fig. 5.

Table 1 presents the distribution of errors, averaged over all R1 and R2, on the C1, C2 plane. The average errors of the HPC predictions are shown in boldface.

The top entries are the numbers of data;

Testing the least-squares predictions with other data

The entire process was repeated using 176 HH79 data, showing that this compilation is quite comparable to the HPC compilation; indeed, the HPC makes use of HH79 data. The process was repeated again using 65 experimentally or computationally obtained literature data gathered by Ruette et al. The results are presented in Table 1, italicized and lower in the compartments. All but one of the standard deviations are smaller than the average percent errors, indicating that the large percent errors

Inclusion of the neural-network forecasts

A file with 230 records (for all molecules not excluded in LS study) was built. The NN procedures are detailed in Ref. [3] and the results are found in Table A (supplementary material); the neural-network analysis used HH79 data for training, validation, and testing, and it has just been shown that the entries in HPC and HH79 are practically equivalent. Portions of the records in Table A pertaining to the 230-record file were added to it. After this addition, the file includes: molecule; R1, C1

Discussion

The least-squares forecasting described above is not the first attempt. The previous attempt [13] used a formula in up to second order in R1, R2, and R1R2.

Had we insisted on higher precision, say 1.0% instead of 5.9%, then the confidence that any prediction will fall within the precision limits would be correspondingly reduced (in this case from 68% to 12%).

Acknowledgments

The author thanks Professor A. Boldyrev (Utah State University) for calling to his attention the possibly anomalous behavior of molecules with non-aufbau electronic configurations and Ms. Amy Beard (Southern Adventist University) for her assistance in massaging the data to obtain many versions of Fig. 2 and Table 1.

References (13)

  • R. Hefferlin et al.

    Periodic systems of N-atom molecules

    JQSRT

    (1984)
  • F. Ruette et al.

    Diatomic molecule data for parametric methods

    Int J Mol Struct THEOCHEM

    (2005)
  • R. Hefferlin

    Matrix-product periodic systems of molecules

    JQRST

    (1994)
  • W.B. Davis et al.

    An atlas of forecasted molecular data. 2. Vibration frequencies of main-group and transition-metal neutral gas-phase diatomic molecules in the ground state

    J Chem Inf Mol Model

    (2006)
  • G.M. Maggiora

    On outliers and activity cliffs why QSAR often disappoints

    J Chem Inf Model

    (2006)
  • Hefferlin R. In: Baird D, Scerri E, McIntyre L, editors. Philosophy of chemistry. Dordrecht: Springer; 2006. p. 221–43...
There are more references available in the full text version of this article.

Cited by (6)

  • The spectroscopic and transition properties of phosphor selenium: MRCI+Q study including spin-orbit coupling

    2021, Journal of Quantitative Spectroscopy and Radiative Transfer
    Citation Excerpt :

    The spectroscopic investigations of the heteronuclear diatomic formed by the group VA elements (P, N, As, Sb) and the group VIA elements (O, S, Se, Te) have attracted much theoretical and experimental research interest in the past. [1–34]

  • Chalcogen-Nitrogen Radicals

    2013, Comprehensive Inorganic Chemistry II (Second Edition): From Elements to Applications
  • Do triatomic molecules echo atomic periodicity?

    2015, AIP Conference Proceedings
  • Data location in a four-dimensional periodic system of diatomic molecules

    2012, Chemical Information and Computational Challenges in the 21st Century
View full text