Developing new electrocatalysts for oxygen evolution reaction via high throughput experiments and artificial intelligence

Xu, Shaomeng; Chen, Zhuyang; Qin, Mingyang; Cai, Bijun; Li, Weixuan; Zhu, Ronggui; Xu, Chen; Xiang, X.-D.

doi:10.1038/s41524-024-01386-4

Download PDF

Article
Open access
Published: 28 August 2024

Developing new electrocatalysts for oxygen evolution reaction via high throughput experiments and artificial intelligence

Shaomeng Xu^1,2,
Zhuyang Chen²,
Mingyang Qin ORCID: orcid.org/0000-0001-5341-7465²,
Bijun Cai²,
Weixuan Li²,
Ronggui Zhu²,
Chen Xu^3,4 &
…
X.-D. Xiang²

npj Computational Materials volume 10, Article number: 194 (2024) Cite this article

2891 Accesses
6 Citations
Metrics details

Subjects

Electrocatalysis

Abstract

The development of non-noble metal electrocatalysts for the Oxygen Evolution Reaction (OER) is advancing towards the use of multi-element materials. To reveal the complex correlations of multi-element OER electrocatalysts, we developed an iterative workflow combining high-throughput experiments and AI-generated content (AIGC) processes. An increased number of 909 (compared to 145 in previous literature) universal descriptors for inorganic materials science were constructed and used as Artificial Neural Network (ANN) input. A large number of statistical ensembles with each ANN individual ensemble having a reduced number of descriptors were integrated with a new Hierarchical Neural Network (HNN) algorithm. This algorithm addresses the longstanding challenge of balancing overwhelming descriptor numbers with insufficient datasets in traditional ANN approaches to materials science problems. As a result, the combination of AIGC and experimental validation significantly enhanced prediction accuracy, increase the R² values from 0.7 to 0.98 for Tafel slopes.

Machine learning-assisted dual-atom sites design with interpretable descriptors unifying electrocatalytic reactions

Article Open access 17 September 2024

Computational high-throughput screening of alloy nanoclusters for electrocatalytic hydrogen evolution

Article Open access 06 April 2021

Active learning guides discovery of a champion four-metal perovskite oxide for oxygen evolution electrocatalysis

Article 02 November 2023

Introduction

The electro-catalytic oxygen evolution reaction (OER) involves the four-electron transfer. It is a rate-limiting step in water splitting and metal-air batteries^1,2. Tremendous research efforts have been focused on non-noble metal OER electrocatalysts to replace the scarce and expensive noble metal catalysts (e.g., IrO₂ and RuO₂)^3,4,5. Furthermore, multi-element materials, including ternary materials, quaternary materials and even high-entropy materials, which possess multiple active sites and associated synergistic effects have gained attention recently^6,7. However, predicting the best performer among the vast compositional combinations as the number of the constituent elements increases exceeds the capability of human minds⁸. Hence, using artificial intelligence (AI) to build predictive models based on limited but expanding scales of data is an urgent demand from the materials research society.

Due to the improvement of computational power and statistical algorithms, AI has been employed in various scientific fields^9,10,11. Xu et al. asserted that the integration of AI approaches in the materials genome Initiative (MGI) is instrumental to empower the forthcoming generation of materials scientists and facilitating a fundamental paradigm shift in materials research¹². The accuracy of an AI model is highly dependent on the quantity and quality of data^13,14,15. Datasets should also obey the “Findable, Accessible, Interoperable, and Reusable” (FAIR) principles^16,17. Many public online resources contain a large amount of Density Functional Theory (DFT) results, and AI models have been trained based on DFT results instead of experimental data to predict material properties¹⁸. Especially in the study of multinary catalysts and even high-entropy alloys (HEAs) catalysts, which are characterized by their multiple and complex catalytic active sites, DFT combined with Machine Learning (ML) has proved effective in predicting the binding energies of various reaction intermediates on multiple catalytic active sites^19,20,21,22. By integrating ML with first-principles calculations, researchers have identified HEAs with catalytic activities comparable to ruthenium for ammonia decomposition¹⁹ and platinum for ORR²⁰. They have also uncovered local scaling relationships that constrain the optimization potential for multistep reactions and revealed CoMoFeNiCu alloys as stable and cost-effective HER catalysts^21,22. These findings underscore the powerful role of computational methods in advancing catalyst design.

Further studies have advanced from modeling indirect parameters (e.g., binding energies) towards direct property parameters (e.g., exchange current densities, overpotentials and Tafel slopes)^23,24,25. Saidi’s work introduces a novel electrochemical model that integrates computational and experimental approaches to enhance catalyst efficiency, leveraging computationally efficient hydrogen adsorption energy calculations²³. Additionally, by refining Nørskov’s kinetic model for hydrogen evolution with a metal-dependent rate constant, they align theoretical predictions with experimental data and suggests further enhancements through machine learning²⁴. Moreover, another work successfully grounds the Bell-Evans-Polanyi relation in hydrogen evolution kinetics by correlating activation energy with computed hydrogen adsorption free energy across multiple metal electrodes, thereby improving catalyst design prediction and accuracy²⁵. Nevertheless, some material properties, such as catalytic activity indicators (overpotentials, turnover frequencies) for amorphous materials, are challenging to be calculated by DFT, but are easier to be measured experimentally. Therefore, we decided to employ reliable and systematic experimental datasets to develop AI models with better predictive capability for amorphous systems.

Although gathering experimental data from publications provides a viable solution, some concerns have been raised about “data corruption” by mining material data from previous literature²⁶. Alexander J. Norquist et al. emphasized the importance of the inclusion of both successful and unsuccessful experiments in AI-assisted material discovery to achieve more accurate AI predictions²⁷. One should also be cautious that a large number of experiments on OER catalysts with identical compositions exhibit significant variations due to different testing conditions, such as electrodes, electrolytes and potentials. High-throughput experimental methods provide a solution since they are capable of producing systematic datasets within a specified test condition, eliminating uncertainty and inconsistency²⁸.

This study aims to take the combined advantages of a new Hierarchical Neural Network (HNN) algorithm-based AI model and systematic data to establish a catalytic performance predictive model for multi-element OER electrocatalysts. First of all, a total of 119 catalytic datasets were generated using high-throughput experiments under similar synthetic and testing conditions to train and test the HNN-based AI models. The model then generated Tafel slopes and onset overpotentials for a new ternary system. Further performance validation of the system by experiments was performed to improve the predictive capability of the model. The optimized model showed a predictive error of 2% and 4% for Tafel slopes and onset overpotentials, respectively, compared to the experiments.

Methods

The present research employed the AI model for inorganic materials science by following these steps: (1) data collection, (2) calculating the values of descriptors (features) for each data point, (3) determining the best dimension of descriptors for each individual neural network ensemble for a given dataset, and (4) constructing HNN algorithm-based AI models. The iterative workflow of this HNN-based AI model is applicable broadly for the discovery of advanced inorganic functional materials. The evaluation metrics for the model, R² and MAE, are detailed in Section 3 of the Supporting Information.

The collection of systematic experimental datasets

The OER electrocatalysts data was acquired using high-throughput aerogel synthesis techniques and systematic electrochemical characterization, as detailed in our previous work^29,30. Multiple composition variables were accurately controlled by preparing the metal precursor solutions via a multi-channel feeding system equipped with an in-line mixer. After parallel sol to gel transition and supercritical drying, a large number of amorphous samples with different compositions were synthesized and followed by systematic Linear Sweep Voltammetry (LSV) measurements. Ternary plots of correlation between composition and electrocatalytic performance were constructed. Two composition-Tafel slopes correlation plots of Fe_xCo_yNi_z and Fe_xCo_yCe_z electrocatalysts were obtained from high-throughput experiments with 52 data points (Fig. 4a, b). The material phases, morphologies and structures were greatly affected by the synthetic parameters (including temperature, pressure, etc.) in wet-chemical synthesis. The high-throughput synthesis ensures identical experimental conditions for all samples, i.e., the amorphous aerogels have similar morphologies and surface area to mass ratios, thereby ensuring the resultant variable electrochemical properties are primarily determined by the compositions.

Descriptor (Feature) constructions

In previous work, researchers selected a certain group of descriptors for AI models aiming at certain material properties³¹. In this work, we proposed to use a comprehensive and universal collection of descriptors for all different functionalities of inorganic materials. This approach will be helpful for future development of large materials AI models integrating large sets of functionalities.

Logan et al.³² pioneered the construction of universal descriptors by utilizing 22 elemental properties, which are depicted as the first 22 elemental properties in Table S1. Based on the 22 elemental properties and the statistical construction rules, a total of 145 descriptors were proposed, and then used to train and predict the formation energy and T_c of superconductors^33,34,35. In this study, additional 31 elemental properties were added to sum up a total number of 53 elemental properties, as listed in Table S1. Key thermal, physical and crystallographic information was retrieved from databases (OQMD, Mathematica, ICSD and nuclear-power) or literature^36,37,38. The descriptors construction rules are outlined in Table S2. A total of 901 descriptors were constructed based on the elemental properties and statistical construction rules. The configurational entropy \((\triangle {S}_{{con}})\), occupation state of valence electron and ionicity are also calculated and contributed to the other 8 descriptors to sum up to a total of 909 descriptors. Weights were allocated to the lowest, maximum, and range values according to the construction rules. It is noteworthy that \(\triangle {S}_{{con}}\) and Absolute Percentage (AP) error were added here as construction rules for descriptors due to their significant relevance in materials science. Configurational entropy serves as an indicator of the degree of disorders in the atomic distribution. The concept of AP is related to variations in atomic size and electronegativity among the different elements with a substantial impact on the material structure and properties.

The reduction of descriptor dimension for individual ANN ensemble

Due to the scarcity of experimental data in materials science, it is impossible to train all 909 descriptors using a single neural network. For example, in an Artificial Neural Network (ANN) with only two hidden layers and 909 input descriptors, more than one billion hyper-parameters would need to be determined by data training. This has posed a long-lasting challenge for an ANN AI model to solve materials science problems. A Genetic Algorithm (GA) was developed by Holland³⁹ and frequently employed to reduce the dimension of descriptors (feature extraction) for an ANN based ML process^40,41.

We used GA to reduce the dimension of descriptors for an ANN (Fig. 1) approach to the performance of electrocatalysts (including Tafel slopes and onset overpotentials). Figure 1 illustrates the progress of GA iterations on the x-axis, with the descriptor dimension (d) represented on the left y-axis and the testing R² on the right y-axis. The descriptors reach a best dimensionality of 15 in a single ANN for the given dataset after iterative GA selection for Tafel slopes and onset overpotentials learning, with a testing R² of 0.682 and 0.651, respectively.

Numerous ANN models, each based on different sets of reduced dimension descriptors, reached their upper limit of testing scores. The descriptors that appear most frequently during the GA iteration for Tafel slopes and onset overpotentials are shown in Table 1. Other studies have focused on identifying the most suitable set of descriptors^{33,34,35,36,37,38}, whereas in this work the top five most frequently appearing descriptors were defined as “main descriptors”.

Table 1 Five main descriptors of Tafel slopes and onset overpotentials

Full size table

We found that the main descriptors screened out by the GA obey the previously reported design rationales of electrocatalysts. The electronegativity differences of various elements (MDT1) affect the charge distribution and electron cloud around the catalytic active sites, thereby affecting the adsorption and desorption kinetics of reactants⁴². The MDT5 and MDO3 were found to affect the work function, and correlated with the interfacial charge transfer and activity in electrocatalysts^43,44. The MDO1 and MDO4 are related to the characteristics of d-electron orbitals and tuning the d-orbital electrons were found to be effective in regulating the reactant adsorption and formation of intermediate reactive species³⁴.

Hierarchical Neural Network integrating statistical ensembles

Among these 909 descriptors, besides of five main descriptors, the remaining descriptors were classified as other descriptors. By retaining 5 main descriptors to highlight their significance and choosing additional 10 descriptors from a total pool of 909 through a randomized combinatorial algorithm, we can construct a maximum number of \({{\rm{C}}}_{904}^{10}=2.5\times {10}^{23}\) individual neural network ensembles. The selection of the optimal descriptor dimension for a specific ANN structure and dataset is carried out using GA. In conventional ML methodologies, typically only a single set of optimal descriptor combinations is chosen, disregarding alternative combinations³⁴.

Extensive testing indicates that the various combinations of these 909 descriptors exhibit a range of performance variations from marginal declines to significant drops, but none are completely irrelevant. This observation leads us to infer that different combinations of 15 descriptors reveal distinct collective correlations between descriptors and labels (or properties). These correlations can be incrementally uncovered through training each individual neural network ensemble with 15 descriptors in parallel by the same set of data points. The full breadth of complex relationships between the 909 descriptors and various catalytic material labels is encapsulated across all individual ensembles.

To integrate the knowledge from these ensembles, a specialized statistical algorithm is necessary. Traditional algorithms, such as bagging, boosting, and stacking, often depend on straightforward geometric averaging of ensemble outputs^{4,35,45,46,47,48}. However, owing to the variable performances across ensembles with different descriptor sets, simple geometric averaging will not effectively capture all valuable insights.

To tackle this challenge, we have devised a novel statistical integration algorithm termed ‘Hierarchical Neural Network’’ (HNN), as illustrated in Fig. 2. Each dashed box in the schematic represents an individual ensemble capable of producing one output, labeled as \({{\rm{O}}}_{{\rm{m}}}^{{\rm{i}}}\), where m and i signify the ordering of ensemble within the same layer and the layer number, respectively. Outputs from ensembles of one layer, derived from random combinations, serve as inputs (descriptors) to the next, maintaining consistent dimensionality across layers for uniformity. Outputs of each layer are inputs for the subsequent layer, supporting continuous knowledge integration and refinement. Through iterative training, appropriate weights for each neural were determined, and performance of the model progressively enhances.

**Fig. 2: Structure of Hierarchical Neural Network.**

The overall underlying correlation of the 909 descriptors to the key property is trained, firstly by parallel training of more than 10⁴ such similar individual ensembles, each with a different combination of input descriptors by the same set of data points; secondly, the overall knowledge of this more than 10⁴ statistic ensembles is then integrated and trained by the HNN algorithm. Different ensembles contain both overlapping knowledge and unique information, resulting in a superior final integrated outcome.

With a different perspective, the quantity of datasets effectively delineates the boundary conditions of the mathematical problem. With a mere 119 boundary conditions, inputting all 909 descriptors into a neural network with three hidden layers leads to an excessively high number of hyper-parameters, surpassing 909³, which compromises model training. Each individual ensemble can be likened to an individual slice in a CT scan. The hierarchical architecture of the HNN captures these correlations by cutting more than 10⁴ slices under the same 119 boundary conditions. As the number of layers increases, the solution progressively converges to the true value. In this work, the accuracy of key properties of Tafel slopes and onset overpotentials saturate once the number of layers exceeds four, corresponding to more than thirty thousand individual ensembles.

We note that Saidi et al. introduced a ‘hierarchical convolutional neural network’’ that categorizes the data and then applied independent convolutional neural networks to each category⁴⁹. Differing from that, the term “hierarchical” in this work primarily refers to the architecture of the statistical integration of large number of ensembles.

Results & discussion

The Tafel slopes and onset overpotentials, both of which are considered the key material-dependent indicator of catalytic performance, predicted by different models are shown in Fig. 3.

**Fig. 3: Comparison of learning outcomes for Tafel slopes and onset potentials across various models and datasets.**

In Fig. 3, the numbers of 15, 145 and 909 represent the number of descriptors. Specifically, the number 15 refers to the top 15 most frequently appearing descriptors screened by GA (Fig. 2). ‘145’ is the number of universal descriptors employed in previous literature³². ‘909’ is the number of the universal descriptors proposed in this work. ANN, XGBoost and HNN indicate different AI algorithms. XGBoost⁵⁰ is widely recognized as a powerful and popular machine learning algorithm, leveraging tree boosting, also known as ensemble modeling. Dataset1 contains data from Fe_xCo_yNi_z (as shown in Fig. 4a) and Fe_xCo_yCe_z (as shown in Fig. 4b) composition-performance correlation diagram. Dataset2 contains Dataset1 and data from 30% Fe_xCo_yLa_z (as shown in Fig. S3d). Dataset3 contains Dataset1 and data from 100% Fe_xCo_yLa_z (as shown in Fig. 4c). Dataset4 contains Dataset3 and data of La-Co-Al, Li, K. Models trained using the 909-HNN algorithm on datasets 1-4 are named Model1T through Model4T, respectively.

**Fig. 4: The composition-Tafel slope relation diagrams.**

The analysis uncovers several noteworthy trends: First, the performance of each model improves as the dataset size expands. We observed that with only 52 data points from two relation diagrams, the model (Model1T) showed a tendency to overfit. However, expanding the dataset to 73 samples from three relation diagrams (Model2T) significantly enhanced the prediction accuracy.

Second, the performance of 909-ANN model performance fell below that of the 15-ANN model and the 909-XGBoost. This demonstrated the contradiction between a large number of descriptors and a small number of data points existing in the traditional ANN algorithm unsolved, more descriptors will make things worse.

Third, 909-HNN outperforms all other models with different algorithms. This demonstrates that HNN algorithm effectively resolves the long-lasting contradiction between a large number of descriptors and insufficient datasets in the traditional ANN approach to materials science problems. In the following content, our study extends beyond training data evaluation by predicting and validating Tafel slope values for 15 new binary and ternary electrocatalysts. This validation step is crucial as it tests the model’s predictive power on unseen data, which is a fundamental way to check for overfitting.

The optimized AI model was further employed to predict the full composition-performance relation diagram based on experimental datasets. The final predictions and experimental comparisons of the ternary composition-Tafel-slopes correlation diagrams for Fe-Co-Ni, Fe-Co-Ce, and Fe-Co-La by Model4T are shown in Fig. 4d–f.

The predictions made by the model are commonly known as generated content (GC). One approach is to consider the GC as novel data and employ an adversarial algorithm to ensure its consistency with the original data. However, we believe that in materials science, GC should beto regarded as a prediction, and it can only be considered true and usable as data when it is experimentally validated. To illustrate the importance of the iteration of prediction and validation process, we plotted Fig. S3 & S4, and described the process in Section 4 of the supporting information.

Based on Model3T, the Tafel slope values for 15 new binary and ternary electrocatalysts were predicted and validated by experiments shown in Table 2 and Fig. S4b. Considering the feasibility of experimental synthesis for further verification and performance regulation, we selected three categories of elements from the periodic table for this research: transition metals, rare-earth metals, and alkali metals. Transition metals (training and predicted data: Fe, Co, Ni; predicted data only: Cu, Mn) are ideal non-noble metal OER catalysts due to their partially filled d-orbital electrons that effectively participate in the multi-electron transfer process. Rare earth metals (training and predicted data: La, Ce) can enhance OER catalytic activity by modulating the electronic state of transition metals through their unique 4 f electronic structure. The incorporation of alkali metals (predicted data only: Al, Li, K) may further modulate the electronic structure of transition metals, elevate the O 2p bands, and stimulate the release of lattice oxygen to enhance OER activity. Halogens are excluded from consideration due to their high electronegativity and strong ionic bonding with metals, which result in no catalytic effect on OER.

Table 2 The AI predicted Tafel slope and onset overpotential values of 15 new binary and ternary aerogel OER electrocatalyst, comparing with the experimental verification

Full size table

Interestingly, Model3T is able to predict the behavior of these 15 new electrocatalysts well. This is due to the fact that the model proposed here is specifically tailored for non-noble metal OER electrocatalysts, predominantly comprising transition metals and rare earth elements. In the set of 15 newly predicted electrocatalysts, each material incorporates at least one previously encountered element, such as Fe, Co, Ce, Ni, or La. For materials that include non-transition metals and rare earth elements, such as Al, Li, and K, the model’s predictive accuracy is slightly lower, with errors ranging from 2.6% to 10.8%. For example, the Tafel slope prediction error for La₁Co₁Al₁ is 10.8%. Conversely, for materials that contain elements analogous to previously encountered transition metals, the prediction errors are lower, ranging from 0.9% to 6%. For instance, the Tafel slope prediction error for La₁Co₁Cu₁ is 0.9%. Additionally, the Tafel slopes and onset overpotentials of these 15 new datasets also fall within the previously mentioned ranges. This suggests that the model has effectively captured the interactions between transition metals and rare earth elements in OER electrocatalysis. However, for more complex scenarios, a dataset of this limited size is insufficient for the model to achieve optimal performance.

To further improve, we added Al, Li, and K containing compounds to Dataset3 to form Dataset4. The Model4T performance shows a minor enhancement with R² of 0.961(Fig. S4c). The difference between experiment and Model4T predictions for all 15 catalysts were shown in Fig. S4d with R² of 0.981. As shown in Table 2, Model4T demonstrates the largest error of 5.2% for the Ce₁Co₁Ni₁-based compounds and the smallest error of 0.01% for the La₁Ni₁Fe₁-based compounds. These results demonstrated great predictive power with the systematic “small datasets”.

In summary, we employed a new HNN algorithm-based AI model to predict the Tafel slopes and onset overpotentials for multi-element OER electrocatalysts, yielding several noteworthy points. First, we expanded the number of universal descriptors used for ML of inorganic materials from 145 to 909. Notably, none of the five main descriptors (most frequently used) in this study were among the originally proposed set of 145 descriptors, highlighting the importance to enrich the universal descriptors. Second, we developed a HNN algorithm to integrate a large set of statistical ANN ensembles with reduced dimension of descriptors, whichresolved the contradiction between the overwhelming number of descriptors and limited scientific datasets. Third, the substantial increase in the total number of descriptors combined with the HNN algorithm led to remarkably improved prediction accuracy. Fourth, we found that even a small amount of GC datasets can significantly enhance the predictive power of AI models. However, it is crucial to validate GC through scientific experiments before it is further used. This work demonstrates the capability to accurately predict the performance of multi-element non-noble metal electrocatalysts using small, systematic datasets, thereby accelerating the path to materials innovation.

Data availability

The Tafel slopes and onset overpotentials data are provided in the supplementary file named “dataset”. The code used to generate the results in this study is available from the corresponding author upon reasonable request.

References

Chu, S. & Majumdar, A. Opportunities and challenges for a sustainable energy future. Nature 488, 294–303 (2012).
Article CAS PubMed Google Scholar
Luo, J. et al. Water photolysis at 12.3% efficiency via perovskite photovoltaics and Earth-abundant catalysts. Science 345, 1593–1596 (2014).
Article CAS PubMed Google Scholar
Calvillo, L. et al. Insights into the durability of Co–Fe spinel oxygen evolution electrocatalysts via operando studies of the catalyst structure. J. Mater. Chem. A 6, 7034–7041 (2018).
Article CAS Google Scholar
Wang, M. et al. Recent advances in transition-metal-sulfide-based bifunctional electrocatalysts for overall water splitting. J. Mater. Chem. A 9, 5320–5363 (2021).
Article CAS Google Scholar
Zhang, B. et al. Homogeneously dispersed multimetal oxygen-evolving catalysts. Science 352, 333–337 (2016).
Article CAS PubMed Google Scholar
Yao, Y. et al. High-entropy nanoparticles: Synthesis-structure-property relationships and data-driven discovery. Science 376, eabn3103 (2022). p.
Article CAS PubMed Google Scholar
Nguyen, T. X. et al. Advanced high entropy perovskite oxide electrocatalyst for oxygen evolution reaction. Adv. Funct. Mater. 31, 2101632 (2021).
Article CAS Google Scholar
Stein, H. S. et al. Functional mapping reveals mechanistic clusters for OER catalysis across (Cu–Mn–Ta–Co–Sn–Fe) O x composition and pH space. Mater. Horiz. 6, 1251–1258 (2019).
Article CAS Google Scholar
Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).
Article CAS PubMed PubMed Central Google Scholar
Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).
Article CAS PubMed PubMed Central Google Scholar
King-Smith, E. et al. Probing the chemical ‘reactome’with high-throughput experimentation data. Nat. Chem. 1–11. (2024)
Xu, Y. et al. Artificial intelligence: A powerful paradigm for scientific research. The Innovation 2 (2021).
Gudivada, V., Apon, A. & Ding, J. Data quality considerations for big data and machine learning: going beyond data cleaning and transformations. Int. J. Adv. Softw. 10, 1–20 (2017).
Google Scholar
Liu, Y. et al. Data quantity governance for machine learning in materials science. Natl Sci. Rev. 10, nwad125 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jain, A. et al. Overview and importance of data quality for machine learning tasks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016).
Article Google Scholar
Jacobsen, A. et al. FAIR principles: interpretations and implementation considerations. Data Intell. 2, 10–29 (2020).
Article Google Scholar
Wei, J. et al. Machine learning in materials science. InfoMat 1, 338–358 (2019).
Article CAS Google Scholar
Saidi, W. A., Shadid, W. & Veser, G. T. Optimization of high-entropy alloy catalyst for ammonia decomposition and ammonia synthesis. J. Phys. Chem. Lett. 12, 5185–5192 (2021).
Article CAS PubMed Google Scholar
Saidi, W. A. Optimizing the catalytic activity of Pd-based multinary alloys toward oxygen reduction reaction. J. Phys. Chem. Lett. 13, 1042–1048 (2022).
Article CAS PubMed Google Scholar
Saidi, W. A. Emergence of local scaling relations in adsorption energies on high-entropy alloys. npj Comput. Mater. 8, 86 (2022).
Article CAS Google Scholar
Saidi, W. A., Nandi, T. & Yang, T. Designing multinary noble metal‐free catalyst for hydrogen evolution reaction. Electrochem. Sci. Adv. 3, e2100224 (2023).
Article CAS Google Scholar
Yang, T. T. & Saidi, W. A. Reconciling the volcano trend with the Butler–Volmer model for the hydrogen evolution reaction. J. Phys. Chem. Lett. 13, 5310–5315 (2022).
Article CAS Google Scholar
Yang, T. T. et al. Revisiting trends in the exchange current for hydrogen evolution. Catal. Sci. Technol. 11, 6832–6838 (2021).
Article CAS Google Scholar
Yang, T. T. & Saidi, W. A. The Bell-Evans-Polanyi relation for hydrogen evolution reaction from first-principles. npj Comput. Mater. 10, 98 (2024).
Article CAS Google Scholar
Hong, W. T., Welsch, R. E. & Shao-Horn, Y. Descriptors of oxygen-evolution activity for oxides: a statistical evaluation. J. Phys. Chem. C. 120, 78–86 (2016).
Article CAS Google Scholar
Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).
Article CAS PubMed Google Scholar
Chen, Z. et al. Development of high‐throughput wet‐chemical synthesis techniques for material research. Mater. Genome Eng. Adv. 1, e5 (2023).
Article Google Scholar
Chen, Z. et al. High-performance oxygen evolution reaction electrocatalysts discovered via high-throughput aerogel synthesis. ACS Catal. 13, 601–611 (2022).
Article Google Scholar
Cai, B. et al. Developing an Fe x Co y La z-based amorphous aerogel catalyst for the oxygen evolution reaction via high throughput synthesis. J. Mater. Chem. A 12, 1793–1803 (2024).
Article CAS Google Scholar
Gheyas, I. A. & Smith, L. S. Feature subset selection in large dimensionality domains. Pattern Recognit. 43, 5–13 (2010).
Article Google Scholar
Ward, L. et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2, 1–7 (2016).
Article Google Scholar
Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations. Phys. Rev. B 96, 024104 (2017).
Article Google Scholar
Stanev, V. et al. Machine learning modeling of superconducting critical temperature. npj Comput. Mater. 4, 29 (2018).
Article Google Scholar
Zhang, J. et al. An integrated machine learning model for accurate and robust prediction of superconducting critical temperature. J. Energy Chem. 78, 232–239 (2023).
Article Google Scholar
Miracle, D. et al. An assessment of binary metallic glasses: correlations between structure, glass forming ability and stability. Int. Mater. Rev. 55, 218–256 (2010).
Article CAS Google Scholar
Lide, D. R. CRC handbook of chemistry and physics. Vol. 85: CRC Press (2004).
De Boer, F. R. et al. Cohesion in metals. Transition metal alloys (1988).
Holland, J. H. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence: MIT press. (1992).
Michalewicz, Z. & Schoenauer, M. Evolutionary algorithms for constrained parameter optimization problems. Evolut. Comput. 4, 1–32 (1996).
Article Google Scholar
Gu, S., Cheng, R. & Jin, Y. Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 22, 811–822 (2018).
Article Google Scholar
Wang, X. et al. Electronic structure regulation of the Fe-based single-atom catalysts for oxygen electrocatalysis. Nano Energy. p. 109268 (2024).
Radinger, H. et al. Work function describes the electrocatalytic activity of graphite for vanadium oxidation. ACS Catal. 12, 6007–6015 (2022).
Article CAS Google Scholar
Qin, R. et al. Ru/Ir‐Based Electrocatalysts for Oxygen Evolution Reaction in Acidic Conditions: From Mechanisms, Optimizations to Challenges. Advanced Science. p. 2309364 (2024).
Mishra, A. et al. Ensemble-based machine learning models for phase prediction in high entropy alloys. Comput. Mater. Sci. 210, 111025 (2022).
Article Google Scholar
Wang, X. et al. ThermoEPred-EL: Robust bandgap predictions of chalcogenides with diamond-like structure via feature cross-based stacked ensemble learning. Comput. Mater. Sci. 169, 109117 (2019).
Article CAS Google Scholar
Nguyen, D.-N. et al. Ensemble learning reveals dissimilarity between rare-earth transition-metal binary alloys with respect to the Curie temperature. J. Phys.: Mater. 2, 034009 (2019).
Google Scholar
Sun, B. et al. Ensemble learning based on stacking and blending predicts glass forming ability. Mater. Today Commun. 37, 107385 (2023).
Article CAS Google Scholar
Saidi, W. A., Shadid, W. & Castelli, I. E. Machine-learning structural and electronic properties of metal halide perovskites using a hierarchical convolutional neural network. npj Comput. Mater. 6, 36 (2020).
Article CAS Google Scholar
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016).

Download references

Acknowledgements

We acknowledge the funding support from National Key R&D Program of China (Grant No. 2022YFB3807700), Shenzhen Fundamental Research Funding (No. JCYJ20220818100612027 and JCYJ20220818100613028), National Natural Science Foundation of China (Grant No. 22309075) and the Major Science and Technology Infrastructure Project of Shenzhen Material Genome Big-Science Facilities Platform.

Author information

Authors and Affiliations

School of Materials Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
Shaomeng Xu
School of Materials Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
Shaomeng Xu, Zhuyang Chen, Mingyang Qin, Bijun Cai, Weixuan Li, Ronggui Zhu & X.-D. Xiang
Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, 518055, China
Chen Xu
Shenzhen Polytechnic University, Shenzhen, 518055, China
Chen Xu

Authors

Shaomeng Xu
View author publications
You can also search for this author inPubMed Google Scholar
Zhuyang Chen
View author publications
You can also search for this author inPubMed Google Scholar
Mingyang Qin
View author publications
You can also search for this author inPubMed Google Scholar
Bijun Cai
View author publications
You can also search for this author inPubMed Google Scholar
Weixuan Li
View author publications
You can also search for this author inPubMed Google Scholar
Ronggui Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Chen Xu
View author publications
You can also search for this author inPubMed Google Scholar
X.-D. Xiang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

S.M.X. developed the AI algorithms, drafted the manuscript, and created the visualizations. Z.Y.C. provided guidance on the high-throughput experimental design and edited the manuscript. M.Y.Q. contributed to the manuscript revision. B.J.C., W.X.L. and R.G.Z. conducted the high-throughput experiments and managed the data collection. C.X. performed the formal analysis, drafted sections of the manuscript, and edited the final version. X.-D.X., the principal investigator, conceptualized the research, provided overall supervision, and secured the financial support for the project.

Corresponding authors

Correspondence to Chen Xu or X.-D. Xiang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supporting Information

dataset

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, S., Chen, Z., Qin, M. et al. Developing new electrocatalysts for oxygen evolution reaction via high throughput experiments and artificial intelligence. npj Comput Mater 10, 194 (2024). https://doi.org/10.1038/s41524-024-01386-4

Download citation

Received: 30 May 2024
Accepted: 05 August 2024
Published: 28 August 2024
DOI: https://doi.org/10.1038/s41524-024-01386-4

This article is cited by

Parametric Optimization of Transition Metal-Based Nanocomposite Electrocatalysts for Oxygen Evolution Reaction in Alkaline Media
- Vedasri Bai Khavala
- Abhijai Velluva
- Tiju Thomas
Electrocatalysis (2025)