Introduction

The process of selecting and deciding priorities of research and development (R&D) projects is crucially important for efficiently utilizing limited resources1, 2. In the case of government-funded R&D programs, which support a large number of projects with an immense budget, there is an even greater emphasis on the efficient allocation of resources, and many funding agencies devote a lot of effort to improving the process for selecting R&D projects3, 4. If one is only considering the efficiency of a program, one may select R&D proposals that are anticipated to yield stronger ex-post performance to be the beneficiary projects of the program5. In reality, however, the decision-making process of selecting R&D project is complex and requires consideration of various factors6,7,8. Project selection is complicated because it requires the ex-ante prediction of the performance expected to be achieved by implementing each candidate project, despite the uncertainties involved in such prediction9. To do so, we need to precisely define the concept of performance or project success and adopt a commonly accepted method of measuring it. We also need to determine, through theoretical analysis or practical experience, which of the multiple input factors will have an impact on a project’s performance6. Furthermore, we need to understand the lag time and uncertainties involved in the process by which R&D activity manifests as performance9.

Until now, expert evaluation has been widely utilized to help make complex decisions in the R&D project selection process10, 11. Expert evaluation has been considered one of the most rational means of decision-making, since they provide reliable evaluations from a group of experts knowledgeable in a specific field12, 13. The expert evaluation method is especially useful for evaluating new ideas regarding which there is no other reference data. We focused on public R&D funding for small and medium enterprises (SMEs), and the R&D proposals by SMEs are also evaluated by experts. R&D projects performed by SMEs are more heterogeneous than science-oriented projects and tend to be application/development-oriented projects. Therefore, in practice, it is operationally worthwhile for experts to review whether the R&D grant is approved or not. However, there have also been several observations regarding the limitations of expert evaluation as a reference for R&D project selection. First of all, the presence of various types of bias, including optimism or pessimism bias, cognitive bias, academic bias, and institutional particularism, may influence subjective judgments, leading to unfair or irrational results14, 15. Second, since research disciplines are growing more specialized into sub-areas of expertise while at the same time converging in many aspects, it has become challenging to identify experts or organize groups capable of fully understanding and evaluating all proposals5. Third, the R&D project selection process is not structured15, and the balance of expert evaluations for various criteria must be considered for rational decision-making16. Finally, evaluating a large number of proposals consumes a lot of time and cost, and in cases where the given environment fails to provide adequate time, it may impede evaluators from making optimally rational judgments10, 15, 16.

More recently, researchers have proposed various data-based methodologies designed to overcome some of these limitations of expert evaluation15,16,17,18,19. Most of these new approaches, however, focus on assigning weighted values to the multiple criteria applied to project selection or on systematically and objectively integrating evaluations from multiple experts20, 21. Therefore, they have been limited in addressing the fundamental problems of expert evaluation discussed above. Meanwhile, there has been active empirical research on how to identify the various critical success factors (CSFs) that affect the success of R&D projects22,23,24,25. However, many of these success factors are difficult to quantify objectively, and most of the empirical studies have relied on the Likert scale to measure survey responses from experts6, 16. Several studies utilized machine learning (ML) techniques to determine the relation between project attributes and performance6, 26, 27, however, there have been few studies focusing on ex-ante predictions of performances of individual R&D proposals from SMEs and applying such predictions to project selection.

Is it possible to predict the performance of individual candidate projects implemented by SMEs for project selection using only objective data and ML models, without relying on the qualitative judgments of experts? Are prediction rules based on ML both theoretically explainable and practical? How effective is it compared to the qualitative method? It is, of course, unlikely that it will be possible to entirely replace the qualitative and intuitive judgments of experts with an exclusively data-based approach. Moreover, the selection and prioritization of projects is not an issue that can be determined based on efficiency alone. Nonetheless, to enhance the efficiency of public efforts to stimulate the technological innovation of SMEs, we need to offer an objective methodology that can supplement the qualitative judgments of experts. In response to this need, we derived new models that predict whether a candidate project will achieve various performance indicators, including R&D success, commercialization, and patent applications, based on ML analysis of data from a large number of previously completed R&D projects. In the process, we analyzed which of the various factors involved, such as the attributes of projects, firms, and market environments, will strongly affect the performances. We also applied an analytic hierarchy process (AHP) survey of related experts to establish the weighted values to be assigned to the performance indicators for project selection. Based on this, we suggested a method for objectively scoring the expected performances of individual candidate projects. Lastly, we applied the methodology to propose practical ways to improve the current selection process of public funded R&D project for SMEs.

Methods

Data

In South Korea, firms with average sales of less than 40 to 150 billion KRW (approximately 33 to 125 million USD, depending on the industry) and total assets of less than 500 billion KRW (approximately 417 million USD) are classified as SMEs. The South Korean government has been implementing various R&D subsidy programs to stimulate the technological innovations of SMEs. Among these, we focused on the “SMEs Technological Innovation Development Program” implemented by South Korea’s Ministry of SMEs and Startups. This program was designed to stimulate technological innovation and exports of SMEs by providing SMEs with R&D subsidies. This program is the largest R&D subsidy program for SMEs in South Korea, with an annual budget of 220 million USD. Each year, the program supported around 500 new firms, and once selected as a beneficiary firm, each firm could usually receive up to 450 to 550 thousand USD over two years. Since this program gives us access to data on many cases of R&D projects, we judged it to be the most suitable for applying our ML technique.

Data on all R&D projects conducted with support from the South Korean government are collected in the National Science & Technology Information Service (NTIS). For each project, NTIS collects around 400 fields of data related to issues such as research period and area, collaborative research, budget and personnel composition, and performances. The data is collected through an annual survey of firms that have implemented R&D, under the supervision of the funding agency. All R&D projects that received public funding must mandatorily submit information regarding the project and its performance to NTIS. NTIS also collects data on performance generated after project completion with lag time. In this study, we collected and utilized NTIS data on 1,771 projects initiated from 2011 to 2013 through the program. Projects begun in 2013 lasted, at maximum, up to 2015. The performance data which was collected up to 2020 (covering a period of five years following project completion) were used to measure the performance indicators.

The attributes of the firm that will perform a project, especially its financial attributes, are resources that could affect the process and outcomes of the project22. Accordingly, we used data regarding firm attributes that existed prior to implementing a project as additional attributes data. The market environments in which firms are belonged may also influence the commercialization performance generated by the project23. Therefore, we used data on market environments such as market size and competition as additional attributes data. Data on the firm attributes and market environments were obtained from a Korean credit rating agency.

Variables

If an R&D project is completed normally, without being abandoned or disqualified, an expert committee organized by the funding agency evaluates its performance. The committees give scores and ratings indicating whether projects achieved the technological level set as the target within the given time and budget. Projects that earn a score of 60 or higher are classified as successful R&D projects. We selected the R&D success as one of our performance indicators. In addition, we selected the variable of whether sales were generated from innovative products that applied the developed technology within five years following project completion as the representative indicator of commercialization success. Since the ultimate purpose of a firm’s pursuit of R&D is to generate revenues and profit through successful commercialization, we considered this to be an important criterion of project success. Moreover, we chose the variable of whether a firm applied for a patent within five years of project completion as one of our performance indicators, since it is most closely related to commercialization and the lag time is relatively short.

In this study, we comprehensively considered as many CSFs as possible that could be measured objectively, as well as factors that were not considered in previous studies. We could obtain around 400 features regarding the attributes of projects, firms, and market environments from NTIS and the credit rating agency. First, we screened and eliminated the features that cannot be quantified or input into ML algorithms, such as project titles and research purpose. Then, we reviewed the distribution of each feature and removed the features with excessively skewed distributions. For instance, when dealing with categorical features, we excluded any feature where the majority of cases belonged to a single class as it would not be suitable for classification. Finally, we considered the similarity and multicollinearity among features, and selected representative features. As a result, 41 factor variables were finally selected. Supplementary Material A shows the operational definitions and descriptive statistics of the variables.

Aspects of a project’s scale, including research period, budget, and personnel have important effects on its performance24, 28. Moreover, many studies have reported that collaborative research (CR) has a significant influence on firms’ R&D performance24. Within NTIS, projects are classified according to various classification systems, based on the characteristics or category of the project; this gives us a supplementary means of judging the contents of projects and grouping them. The size of the firm performing the R&D project, the firm’s age, region, and its area of business are also known to be factors that have a significant effect on performance3, 22. In addition, there have been studies indicating that a firm’s financial strength has a positive correlation with R&D intensity and performance29. The South Korean government grants venture certification to firms that have received investment above a certain level from venture capital. Moreover, Innobiz certification is granted to firms that are judged to have innovative technologies. Although the process of certification considers a firm comprehensively from various aspects, whether the firm has these certifications at the time of submitting the R&D proposal is objective and can be officially verified. Therefore, we used whether a firm had those certifications as one of the firm attributes. A project’s risks and likelihood of success may also vary depending on the industry1. Moreover, market environment factors such as market size, and intensity of competition have also been found to be important factors affecting the success of R&D projects23.

Methodology

Since we considered many factors, we prioritized the use of ML algorithms that are better suited for identifying complex relationships among multiple variables. Since the performance indicators are binary variables, we used various classification algorithms. There is a wide variety of classification algorithms, each with its own pros and cons; there is no single best algorithm that is superior to all other models and applicable to all cases30. Therefore, after comparing the performances of models generated using various algorithms, we selected the model that demonstrated the strongest prediction performance as an optimal model. The ML algorithms used in this study included rule-based Decision Tree (DT) such as Classification and Regression Tree (CART), C5.0, Chi-squared Automatic Interaction Detector (CHAID), Quick, Unbiased, Efficient Statistical Tree (QUEST), Random Forest (RF), and non-linear algorithms Neural Network (NN) and Support Vector Machine (SVM). We also compared our results with those obtained using conventional linear algorithms such as Logistic Regression (LR) and Discriminant Analysis (DA). Through this process, we compared the prediction performances of the ML algorithms with those of linear algorithms. Moreover, we compared the rules derived by ML with those by linear algorithms, and interpreted them theoretically. Supplementary Material B provides details on the characteristics, strengths, and weaknesses of each algorithm we used.

To apply such classification algorithms, we used IBM’s SPSS Modeler 18. We divided the data at a ratio of 7:3 into training data and test data. If the dependent variables’ group distribution is overly skewed to one side, there is a strong likelihood that the algorithm will classify most of the cases as belonging to the majority group, just to obtain high accuracy. To prevent this, we used the bootstrapping method to balance the training data’s distribution to be 5:5. In the case of DT and NN, in which results vary depending on the cross-validation data set for preventing overfitting, we generated 100 models with each algorithm and performed bagging. Parameter tuning was performed to minimize overfitting for each ML algorithm. In DA, we used the method of adding or eliminating variables that minimize Wilks' lambda at each stage to select the key factors. In LR analysis, we selected key factors using a forward stepwise method based on the likelihood ratio. The ML prediction models for a binary dependent variable provide raw propensity scores (RPS) for classification. These scores not only give us insight regarding whether each case will be grouped as true or false but also inform us of the probability value of the prediction. They allow us to predict the feasibility of the three kinds of performances for each candidate project. We conducted an AHP survey to determine which of the three performance indicators should be given greater weight when selecting the beneficiary firms of the program. 21 experts, who participated in the program’s project planning and evaluation, responded to our survey. All 21 experts have a doctorate degree, in various scholarly fields, and are currently engaged in R&D planning and evaluation for SMEs and policy research. Supplementary Material C shows the demographics of the 21 experts.

Results and discussion

Machine learning models for prediction of SME’s R&D performances

Table 1 presents the classification performances of the prediction models for the three performance indicators using each ML and linear algorithm. We basically performed a comparison of the classification accuracy and also presented the values of precision, recall, and F-measure. In this study, we assumed a scenario in which firms predicted to achieve significant performance are properly discriminated and assigned additional points, which increase the likelihood of these firms being chosen as beneficiary firms. Therefore, for the binary performance indicators, we compared the precision, recall, and F-measure based on the group that is predicted to achieve the performance (i.e., “true”).

Table 1 Classification performances of the prediction models.

In terms of accuracy and F-measure in the test data, the prediction models by C5.0 algorithm showed the highest classification performance for all performance indicators. The prediction model for R&D success demonstrated classification accuracies of 97.9% for the training data and 84.5% for the test data. This model also yielded an F-measure of 0.912, demonstrating strong performance in classifying the groups found to be “true” in regard to R&D success. The classification accuracy in the test data of the optimal prediction model for commercialization was 71.3%, and the F-measure was 0.806. For patent applications, classification accuracy and F-measure were 63.9% and 0.708, respectively.

Some of the 41 factors played an important role in predicting the performances. Table 2 shows the key factors for each performance indicator by the optimal C5.0 models. In the case of R&D success, the key factors were venture certification, assets, application area, ratio of subsidy, and number of CR in that order. In the case of commercialization, the key factors include research period, venture certification, Innobiz certification, average debt ratio, and firm age. To predict patent applications, research period, venture certification, located in metropolitan, research budget, and ratio of MS & Ph.D. researchers played the most important role. There were factors that played an important role in common for all performance indicators. Among the project attributes, research period, research area, ratio of subsidy, and ratio of MS & Ph.D. researchers were important. Moreover, among the firm attributes, venture certification and located in metropolitan were important variables in common, and among the market environments, the average debt ratio of the industry was important.

Table 2 Key factors for prediction of the R&D performances by the optimal C5.0 models.

Comparison with linear models and theoretical interpretation of the rules

As indicated in Table 1, the prediction models derived from the various ML algorithms showed stronger prediction performance compared to conventional linear statistical techniques such as LR and DA. For R&D success, commercialization, and patent applications, the optimal C5.0 models had higher accuracy in the test data compared to the results from DA, by a margin of respectively 19.8%p, 9.0%p, and 5.5%p. Linear algorithms select key variables based on statistical significance tests on their relationship with performance indicators. Therefore, relatively few variables are included in the classification rule. In addition, in the case of DA, only continuous variables are used to generate a classification model. On the other hand, ML algorithms utilize relatively more variables, which is one of the reasons for the higher performance.

We generated 100 models for each ML algorithm and performed bagging. Among the models generated by the C5.0 algorithm, representative models that appear repeatedly are shown in Supplementary Material D. Each rule by C5.0 can be theoretically explained in connection with the results of preceding studies. In the rule for R&D success, firms that were pre-certified by venture capital or public agencies as having innovative potential and capacity were found to be more likely to achieve R&D success. In addition, compared to projects with a research period of 1 year, projects with relatively sufficient time (2 years) were more likely to achieve R&D success24. A firm's assets can act as an important resource and capability to continuously and stably carry out R&D29. Firms with a high debt ratio or firms that belong to an industry with a high average debt ratio, resulting in lower financial stability, were notably found to have relative weak likelihood of R&D success. It was found that there is a difference in the likelihood of R&D success depending on research area, application area, and industry to which the firm belongs, because the process and difficulty of R&D are different1. Regarding the composition of research budget, it can be understood that the higher the ratio of cash with a high degree of freedom in use, the more effective research is promoted, which contributes to R&D success. In addition, the lower the ratio of subsidy by increasing the firm's own contribution, the more active and responsible for R&D, the higher the likelihood of R&D success.

In terms of commercialization, firms that had both venture and Innobiz certifications were found to have a higher likelihood of success. Sufficient research period was found to have a positive effect on commercialization as well24. Meanwhile, firm age was found to be a negative factor affecting commercialization. It can be attributed to the fact that firms that have operated well for at least a certain number of years tend to have a significant proportion of its production capabilities already devoted to an existing flagship product, which may delay the timing of input for new products and delay sales31. It was found that there is a difference in the commercialization process depending on research area, and thus there is a difference in the likelihood of commercialization success22. Firms in industries with low average debt ratios could be more likely to succeed in commercialization29. In addition, factories of large corporations in South Korea's major industries are not mainly located in the metropolitan area, and most of the major SMEs that supply parts and equipment to the factories are also the same. It is understood that SMEs located close to the factories of large corporations increase their chances of success based on more information and opportunities for commercialization.

As with other performance indicators, venture certification, Innobiz certification, and research period were found to have a positive effect on patent applications. In addition, firms with a low ratio of subsidy due to their high contribution and active involvement were found to have relatively high patent application performance. We found that a higher ratio of researchers with MS & Ph.D. degrees raised the possibility of patent applications. To apply a patent, it is necessary to draw on in-depth knowledge of cutting-edge technologies to persuasively demonstrate novelty and progress, and therefore, participating researchers with more experiences in related fields will increase the possibility of achieving patents24. It was found that patent applications of SMEs differed depending on research area22. Firms with a small debt ratio, high average total asset turnover, and low average debt ratio were more likely to apply for a patent29. Ratio of female researchers is one of the key factors that positively affect R&D success and patent applications. The majority (> 85%) of researchers belonging to South Korean SMEs are male, and gender diversity is very low32. An increase in the ratio of female researchers to a certain level (i.e., improving gender diversity) in the male-dominated teams could have a positive impact on the performances by providing a variety of perspectives, ways of thinking, and sources of information33.

Supplementary Material E shows the LR models on each performance indicator. Although the linear models had somewhat inferior classification performance, they provided more concise rules that were statistically significant. The variables selected based on the statistical significance for each performance indicator are well included in the key factors by ML shown in Table 2. The direction of the effect of key factors selected in the LR models on each performance indicator was also in good agreement with the optimal rules by the ML algorithm. The rules by ML can cover the rules by statistically significant linear models, provide better prediction performance, and can be explained sufficiently theoretically.

Objective prediction and its use in project selection

This study established the prediction models for three performance indicators to objectively predict the performances of newly proposed projects and presents a method of integrating and scoring these results. Figure 1 shows how the objective prediction of performances by ML is utilized in project selection process in harmony with the qualitative evaluation of expert committees.

Figure 1
figure 1

Project selection process applying the objective prediction results for R&D performances.

According to the AHP survey results, commercialization is the most important aspect of expected performance, and its relative magnitude of importance was derived to be 0.514. This is understandable if we consider that the ultimate purpose of SMEs undertaking R&D is to gain revenues and profit from innovative products that apply new technologies. This was followed in order by the possibility of R&D success (0.366) and patent applications (0.120). The consistency ratio was 0.007, indicating that the survey had achieved consistency. Table 3 shows the process by which we predicted the performances and deduced the comprehensive scores of ten projects that newly received the program’s support in 2014. We used the optimal models respectively found to have the strongest performances. For each of the ten new candidate projects, we derived the predicted group for the three performance indicators and the RPS. RPS compares the probability of each candidate project achieving the performances after receiving subsidies under the same conditions. We used RPS to calculate the partial score. As mentioned earlier, our goal is to assign additional points to projects that have a higher probability of success as an outcome of receiving the same support. On the other hand, if the machine predicts that someone's R&D plans will fail to achieve the performances, we should be very careful in accepting it. In cases where a candidate project was predicted to achieve the performances, we applied the RPS directly as the partial score but in cases where this was not true, we assigned the basic score of 0.5 points. Then, we also applied the weighted values for each performance indicator, and calculated the comprehensive score using the weighted sum of the partial scores. This demonstrates that it is possible to objectively and comprehensively score performances using only objectively measurable data and rules derived by ML, while completely excluding the subjective judgments of experts.

Table 3 Deduction of the comprehensive scores of ten candidate projects.

We are not arguing that this objective prediction is perfectly accurate or entirely eradicates bias or that it can completely replace the role of subjective judgments. We aimed to demonstrate its utility as a supplementary method and to prove its superiority to subjective judgments in certain aspects. This study demonstrates the possibility of the objectively scoring based on quantitatively measured data and the objective rules for predicting performances that do not vary depending on the evaluator. The 41 factors listed in Table SA1, are measured objectively, and since their values are already determined, it does not vary depending on the individual performing the measurement. Moreover, the relation between the factors and a performance indicator, which we refer to as a prediction rule, is generated through ML, and since we choose the optimal rules with the strongest prediction performance among those obtained through various algorithms, such rules will not vary depending on who the evaluator may be. Of course, even objectively measured and obtained data may be biased depending on how it is sampled. However, our analysis included all projects supported by the R&D subsidy program and we generated rules applicable to all research fields. Therefore, we believe that it is possible to significantly reduce the various types of bias that may appear in the qualitative judgments of experts in the objective scoring process.

The prediction performance can also be superior compared to that based on subjective judgments. All 1,771 projects that benefited from the program were selected because they were predicted to yield strong performance, based on the evaluation of expert committees. Therefore, the projects’ performance can be interpreted as the precision (TP/P) of the expert committees, and the precisions for R&D success, commercialization, and patent applications were respectively 85%, 65%, and 52%. By contrast, as shown in Table 1, the precisions of our optimal ML models were respectively 91%, 74%, and 64%, superior to the precision from the qualitative judgments of experts. This demonstrates that an entirely objective method is clearly not inferior in prediction performance compared to the current method relying on experts’ qualitative judgments and the objective method enables the selection of projects with a higher probability of success, thereby improving the program’s efficiency.

However, machines cannot entirely replace the role of people. The results generated by machines should be used only as a means to supplement the qualitative judgments of experts. The existing selection process relied only on the qualitative evaluation by experts, and the expert's score accounted for 100% of the score for selecting beneficiary firms. The objective evaluation method proposed in this study can be conducted independently of the experts’ evaluation. We suggest that a composite score can be derived by giving appropriate weights (for example, 70%:30%) to the expert's qualitative score and the machine's objective score. Rather than relying solely on the qualitative scores from experts, the composite score can be used to prioritize new R&D proposals and select beneficiaries. In this way, the qualitative judgment of experts could be supplemented, and the existing subjective evaluation system can be objectified to a certain extent.

Conclusion

Efficiency should not be the only criterion considered for R&D project selection and budget allocation. In addition to efficiency, diversity and urgency should be considered as well. Nonetheless, improving efficiency is a challenge currently confronted by governmental R&D funding agencies in many countries. To improve the efficiency, we aimed to develop ML models to predict the performances of individual R&D projects in advance, and to present an objective method that can be utilized in the project selection.

The key findings and contributions of this study are as follows. First, from a theoretical perspective, we derived key factors that can influence performances that SMEs can achieve through R&D, such as R&D success, commercialization, and patent applications, using ML and linear algorithms. We showed how each key factor affects the performances through the explainable rules derived from the algorithms. In addition, we provided additional empirical evidence for the relationships between them and theoretically interpret the relationships. Moreover, we showed that the relationships are in good agreement with the results of previous studies and theoretically explainable. For practical application, we provide a method for the project selection process that overcomes the limitations posed by previous reliance on experts’ evaluations, and thereby promotes greater efficiency in the execution of public R&D funding. Different from other studies, this study proved that it is possible to use only objectively measured data on 41 factors and objective rules derived from ML to perform ex-ante predictions of the performances. We also demonstrated that the objective method can perform better than the qualitative expert evaluation. This study is a case study that target R&D projects implemented by South Korean SMEs with government subsidies. However, data on R&D projects are rapidly accumulating in many countries, and the methodology presented in this study can be applied sufficiently. In methodological perspective, by using ML algorithms, this study was able to take account of a larger number of factors more flexibly and comprehensively, compared to studies that used conventional econometric models. This study demonstrates that it is possible to perform comprehensive scoring using the RPS for 3 performance indicators, and a significant contribution of this study is that it offers a method of using this scoring in the project selection process.

This study has several limitations. First, the models we developed have room for further improvement in predictive performance. We mainly used relatively simple DT-based algorithms to derive explainable rules and theoretically investigate the effect of each key factor. Further studies applying more advanced algorithms such as multitask learning and transfer learning would also be meaningful in the future for performance improvement. Of course, the efforts to include larger volumes of data and add more important variables in the models are also important. The efforts to empirically validate the methodology proposed in this study are also required. Efforts should continue to find and improve the limitations of ML methodologies by applying them to various R&D subsidy programs in many countries. Moreover, it is also important to find ways to utilize the methodology in individual firms as a follow-up study. According to a recent study, corporate R&D investment decision makers tend to have higher trust in AI-based advisory systems than human advisors34. If the methodology were to be provided in the form of web services with public statistical data that can serve as a data source for various factors, we anticipate that individual firms will also be able to perform their own evaluations in the R&D planning stage. This can be done without the help of experts, to predict performance and identify ways to maximize it.