Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model

Kemiveš, Aleksandar; Barjaktarović, Lidija; Ranđelović, Milan; Čabarkapa, Milan; Ranđelović, Dragan

doi:10.3390/math12071020

Open AccessCommunication

Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model

¹

Department for Postgraduate Studies, Singidunum University, 11000 Belgrade, Serbia

²

PUC Infostan Technologies, 11000 Belgrade, Serbia

³

Singidunum University, 11000 Belgrade, Serbia

⁴

Science Technology Park Niš, 18104 Niš, Serbia

⁵

Faculty of Diplomacy and Security, University Union-Nikola Tesla Belgrade, 11000 Beograd, Serbia

⁶

Faculty of Engineering, University of Kragujevac, 34000 Kragujevac, Serbia

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(7), 1020; https://doi.org/10.3390/math12071020

Submission received: 6 February 2024 / Revised: 14 March 2024 / Accepted: 25 March 2024 / Published: 28 March 2024

(This article belongs to the Special Issue Quantitative Analysis and DEA Modeling in Applied Economics, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Many methods exist for solving the problem of evaluating efficiency in different processes. They are divided into two basic groups, parametric and non-parametric methods, which can have significant differences in the results. In this study, the authors consider the process of assessing the business climate depending on realized foreign investments. Due to the expected difference in efficiency assessment using different approaches, the goal of this paper is to create an optimization model of an ensemble for efficiency assessment that uses both types of methods with the aim of creating a symmetrical approach that achieves better results than each type of method individually. The proposed solution simultaneously analyzes the impact of different factors on foreign investments in order to determine the most important factors and thus enable each local government to ensure the best possible efficiency in this process. The innovative idea of this study is in the inclusion of classification and feature selection methods of machine learning to fulfill the set goal. Our research, focused on a specific case study in various cities across the Republic of Serbia, evaluated the effectiveness of that process. This study extends previous research and confirms the published results, highlighting the advantages of the newly proposed model.

Keywords:

foreign investments; DEA; SFA; ensemble model; data classification; efficiency analysis

MSC:

68T05

1. Introduction

The economic development of countries in transition, to which the countries of southeastern Europe belong, including the Republic of Serbia, have shown that foreign investments have a significant impact on economic development. In order to attract more direct foreign investments, these countries have changed the social and political conditions of business. Various factors influence the level of foreign investments, including political stability, good education systems and public health, good traffic connections, and a favorable local economic environment. Local self-government of the state has the aim of improving the local business climate in addition to the other processes performed at the country level in Southeast Europe by introducing a procedure for the certification of a friendly environment for business. This certification has been defined as a logistic process based on the data related to previously realized investments. The constant monitoring and quantification of the effect of each task in the process of certification of a business-friendly environment defined by the local government for attracting foreign investments is crucial for its successful practical implementation. Logistics processes, where the main indicator is to define the relationship between achieved results and invested resources, which represents efficiency, enable monitoring of the success of the process of attracting foreign investments [1,2]. Thereby, the determination of the most important resources, i.e., factors in this process, allows for suitable prediction of and planning for the best economic development in the future.

In the field of logistics, efficiency can be divided into strategic, tactical, and operational levels. Among them, the operational level has been most commonly referred to [3], and at this level, technical efficiency [4,5] can be used in the evaluation of the success in attracting foreign investments for each of the local governments, enabling the drawing of conclusions for the overall territory of a country.

The main problem in measuring efficiency in practice, regardless of the type of process considered, has been the misunderstanding of the suitable type of efficiency or using partial indicators that do not represent an appropriate measure, as well as a significant difference between the results obtained using different approaches for efficiency measurement. Currently, the main methods for measuring the efficiency of a process include the multi-criteria decision aid (MCDA) methods [6,7], such as data envelopment analysis (DEA) and stochastic frontier analysis (SFA).

The corrective action is used in the form of an ensemble, and expert knowledge or the multi-stage method is used to achieve better results. Currently, the most modern approach is the discovery of knowledge from data using machine learning (ML)-based methods [8,9,10,11]. One well-known method is the classification of different types of algorithms, which enables feature selection (i.e., significant factors) and, due to the dimensionality reduction, allows for the optimization of a selected procedure, thus providing better results [12].

Therefore, this study examines the possibility of integrating the DEA and SFA methods using the previous models and definitions with classification by the ML-based methods into one ensemble. According to the related literature, the DEA and SFA methods have better characteristics than similar existing methods. Namely, the most commonly used methods for determining efficiency from the MCDA group of methods include the non-parametric DEA method and the parametric SFA method. The symmetry of the combined approach consisting of these two methods enables us to obtain an ensemble method that uses their complementary advantages while eliminating their disadvantages. The integration of the DEA and SFA methods with classification methods is performed considering the possibilities of dimensionality reduction, which, in addition to evident optimization, leads to more accurate results. Therefore, this study addresses the following question: Can an effective efficiency evaluation model be constructed by integrating diverse methods, thereby improving performance compared to using these methods individually? Additionally, does this integrated approach enable the ranking of each individual factor’s influence in the process? Last but not least, can the proposed model adequately solve many processes, and what is the process in the problem of evaluating the effectiveness of certification in a business-friendly environment where the problem of small sample size is often presented?

The proposed method can measure the efficiency of the business-friendly certification (BFC) process of local governments in the Republic of Serbia, adopting the investment per capita data in local governments as a performance indicator. The BFC process efficiency evaluation is performed based on a set of predefined criteria and their importance calculated by the National Alliance for Local Economic Development (NALED) of the Republic of Serbia [13,14]. The NALED is a government-established professional organization in the Republic of Serbia. This process of certification belongs to problems whose input factors are imprecise and susceptible to subjective influence, although the fulfillment of the necessary minimal values for the established factors is necessary to obtain the BFC certificate. The BFC process efficiency can be assessed based on various output features (e.g., the investment per capita that is used in this study), but it could also be the number of new employees or the average salary of employees. Generally, the certification process could be considered a complex multivariate problem, which could be considered in future research. Practically, this paper continues the research previously conducted by some of its authors in the same study, with the same objective as referenced in [14,15]. It aims to demonstrate the benefits of the proposed model compared to the solutions previously suggested, while also validating the proposed ensemble methodology and its results.

The contributions of this paper are twofold:

From a theoretical perspective, the authors have proposed an innovative model that optimizes the efficiency evaluation of a business-friendly environment. This is achieved using a single primitive stacking ensemble model of machine learning, which combines components of the traditional DEA (data envelopment analysis) and SFA (stochastic frontier analysis) methods. These two methods, which are of different types, are combined by classification, and the model demonstrates superior results compared to using them individually, as well as other models in a similar context.
On the practical side, the proposed solution not only evaluates efficiency but also determines the impact of included factors on foreign investments. This is to identify the most significant factors, enabling local governments to ensure the highest possible efficiency in this process. Consequently, the proposed solution can be implemented as a software tool for planning local economic development. This has already been accomplished for the city of Niš in the Republic of Serbia, with the results publicly presented through the publication of the corresponding paper.

Due to the significant number of citations, recent articles addressing efficiency evaluation are discussed in the subsequent section of the literature review. In addition to the previously mentioned references, which form the basis for this communication paper and its continuation in research, the authors have included other recent references. These additional references tackle the problem of efficiency evaluation in a similar context to what is proposed in this paper, from the methodological standpoint of using machine learning and considering the dataset size, particularly focusing on small sample sizes in the last paragraph of this section. The remainder of this paper is structured as follows. Section 2 overviews recently published studies related to the subject of this article. Section 3 describes the BFC process and used materials and methods. Section 4 conducts the case study, and Section 5 discusses the obtained results. Finally, Section 6 concludes this study.

2. Literature Review

The ML-based methods that can be used for measuring efficiency, as well as different algorithms used in them, which have been represented in the related literature, are not significant in terms of their number primarily because ML is a relatively novel technology established at the end of the 20th century. Until the advent of ML, and still at present, traditional methodologies have been dominantly used to measure the efficiency of various processes in human life, especially in the economy.

As mentioned before, the most commonly used approaches for measuring efficiency are traditional DEA and SFA methods. In [16,17], the DEA and SFA were discussed as multi-criteria decision-making methods. In [18], these two methods were compared through bibliometric analysis. These methods have been applied to various fields of human activities, and they have been commonly compared in the related literature. For instance, their application and performances in the healthcare organization were compared in [19]. When it comes to efficiency and its measurement, there have been widespread applications in the field of human activity, such as finance and banking [20], healthcare [21], construction and manufacturing [22], environmental conservation [23], tourism [24], computer science and robotics [25], general logistics [26], emergency management [27], electricity supply [28], and many others [29,30].

The BFC process has not been applied only in the Republic of Serbia, where it was implemented with the help of the NALED organization, but in more than 90 local governments, including Bosnia and Herzegovina, Montenegro, Croatia, and Macedonia, the BFC Southeast Europe (SEE) certification program has been performed, allowing these regions to achieve improvements of more than 70% [31]. The regional network brings together various governmental and non-governmental institutions [14,32]. The evaluation of local economic development (LED) in the Republic of Serbia was conducted in [33,34], whereas the NALED BFC process and its effects on the business environment in the Republic of Serbia was conducted in [34].

In [15,34,35], the application of various traditional MCDA methods to the importance analysis of individual business-related criteria has been studied. Many classical approaches (e.g., DEA and SFA methods and their variations) have been employed for the efficiency evaluation of local governments. The efficiency expenditure indicators in the local governments of the Republic of Serbia were analyzed in [36] using the parametric SFA, and the analysis results demonstrated that local governments could not effectively address the demographic and socio-economic problems. In [37], the SFA method and Tobit regression were employed for the efficiency evaluation of local governments in Portugal. The efficiency of local government in the city of Valencia in Spain was calculated using the DEA method and the free disposable hull (FDH) method [38]. A similar analysis was conducted for Belgian local governments using the DEA and FDH methods and econometric approaches [39]. In [40,41], the public sector efficiency of German cities and technical efficiency evaluation of major Italian cities were performed using the DEA method. Furthermore, mathematical programming and econometric approaches were adopted for the cost-efficiency evaluation of local governments in Australia [42]. The South African local government efficiency measurement was also conducted by the DEA method [43].

In the literature, there have been different integrations of the DEA method and various MCDA methods, such as the analytical hierarchy process (AHP) [44,45]. Diverse SFA and MCDA methods have been combined (e.g., the TOPSIS method) to improve evaluation performance [46]. A detailed overview of the SFA method has been provided in [47,48,49]. In the related research, there have been many attempts to integrate the SFA and DEA methods [50,51,52,53], as well as to combine these methods with the MCDA strategies [54,55], to obtain methods with improved performance in efficiency evaluation.

Further, there have been a number of SFA-DEA integrations with the ML-based methods [56,57,58]. Namely, it is known that ML-based algorithms construct a mathematical model of data, denoted by training data, for the purpose of prediction or decision making without performing a particular task. This strategy was applied to the evaluation of organizational performance by the DEA method [59]. Fethi et al. [60] reviewed 179 recent articles on operation research (OR) and artificial intelligence (AI) applications in the field of bank performance evaluation. The review indicated that there had been fewer studies on combined prediction using many individual models, which were integrated into meta-classifiers. It was concluded that this field of research could be worthy of further attention. Emrouznejad et al. [61] designed a back-propagation DEA algorithm based on the neural network model named the NNDEA to solve the problem of large-scale datasets. Barros et al. [62] analyzed the efficiency rates of insurance companies in Mozambique using the DEA-BPNN methods. In [63], a hybrid DEANN method was proposed to improve the prediction performance of the functional status of patients in organ transplant operations. The relative energy efficiency of residential buildings was studied in [12], using the DEA method, and the DMUs were categorized into efficient and inefficient by three ML classification algorithms; finally, various classification algorithms were employed for comparative analysis. A similar problem of efficiency evaluation among 17 local governments in the Republic of Serbia is considered in the paper [64], employing a methodology largely aligned with the most recent proposals in the field. Specifically, machine learning (ML) clustering has been suggested as a solution within this context.

The recent literature reveals the application of machine learning methods in efficiency evaluation, often integrating different DEA (data envelopment analysis) methods. These applications predominantly utilize supervised machine learning methods, as exemplified in references [65,66], and to a lesser extent, unsupervised machine learning methods [67,68]. Applications of machine learning methods in efficiency evaluation with SFA (stochastic frontier analysis) are found even more infrequently [69], and the integration of machine learning methods with both DEA and SFA is very rare [70].

Bearing in mind the significance, as highlighted in the introduction of this paper, two main questions demand answers in methods that could solve efficiency evaluation problems: Can an effective efficiency evaluation model be constructed by integrating various methods that reduce the dimensionality of the considered problem, thus improving performance over using these methods individually? And can such a model reliably work with small dataset sample sizes? The authors explore potential solutions in the recent literature. They identify articles discussing the reliability of classification models operating with small sample sizes [71,72,73] and studies addressing the impact of reducing the number of inputs and outputs in the efficiency evaluation process using DEA, which depends on these inputs and outputs [74,75,76,77,78].

However, the authors could not find any models in the literature that were similar in context to what they propose in this paper.

3. Materials and Methods

This section provides a brief description of the dataset and methods used.

3.1. BFC Process Efficiency Measurement in the Republic of Serbia

Currently, the two main types of functions of local self-government are service-providing and production functions. The second mentioned type of functions include crafts, industry, and construction, and the first mentioned type of functions include service-providing activities in a local government’s region. In addition, there is a group of basic functions that allow for activities intended for the population inside and outside a local region. However, in addition to the basic function, there are functions of global importance for a particular city or province, which are typically related to the economic development of a particular area and have a high importance for that area. The basic functions significantly affect the fields of city infrastructure and employment. Apart from the basic functions, there has been a need for social functions related to the fields of education, digitalization, recreation, and entertainment. Nevertheless, the importance of the basic functions is higher than that of the social functions from the aspect of economic development. Therefore, local governments should provide the best possible conditions and business environment to ensure a satisfactory FDI level, which will further provide new jobs for the local population and potential salary increases. Thus, local governments have to design appropriate LED plans to stay competitive from the FDI aspect on both local and national levels.

In recent years, uneven regional development has been one of the biggest problems the Republic of Serbia has been facing. Foreign investments denote one of the reasons for differences in development between cities and local governments in the Republic of Serbia. Namely, investors are interested in cities and local self-governments. These differences are caused by numerous factors, including geographical location, the infrastructure of the existing production, personnel profiles, and efforts put in by local self-governments to implement investments successfully. Therefore, constant improvement in the investment conditions is required to ensure the competitiveness of a local government. Being motivated by the success in the FDI field of countries in the European Union, in 2007, with the institutional support of the Ministry of Economy and Regional Development of the Republic of Serbia, the NALED released a certification program intended for local cities, defining a favorable business environment for enhancing the FDI level of investment in local governments to increase the number of employees and increase the average salary of employees. This project defined the standards that local cities have to satisfy to earn the certificate.

The BFC defines a set of standards for service quality assessment in cities in the Republic of Serbia. In addition to the economic development-related benefits, the BFC is beneficial for improvements in future partnerships, which represent important business-related factors. Namely, ensuring a realistic report on the business conditions in cities is crucial for investors. This type of report should provide information on all relevant parameters, such as duration and cost, of various FDI-related factors, such as traffic, infrastructure, construction services, and company registration terms.

The BFC defines 12 strict resource criteria, i.e., factors for the evaluation of business conditions in a particular city [15]:

1.: F1—A strategic approach in development planning;
2.: F2—Organizational capacity for supporting the local economy;
3.: F3—Involvement of the economy in the work of local government;
4.: F4—An effective system for issuing building permits;
5.: F5—Availability of information for investment;
6.: F6—Promotion of investment;
7.: F7—Creditworthiness and financial stability;
8.: F8—Promotion of employment and development;
9.: F9—Encouraging private–public partnerships;
10.: F10—Adequate infrastructure;
11.: F11—Transparent policy of taxes and incentives;
12.: F12—Application of information technologies.

If a city satisfies at least 75% of each criterion, it is considered favorable from the FDI aspect and can earn the BFC certificate, an official document from a particular local government that provides investors with the necessary conditions for start-up. At present, more than 20 cities in the Republic of Serbia own the BFC certificate, and about one-third of the rest of the local self-governments in the Republic of Serbia are in the process of obtaining this certificate. The certification process provides local cities with a clear insight into changes that are necessary to improve business conditions and thus attract more FDIs. It should be noted that the BFC certification process follows the current development trends in the era of the fourth industrial revolution and has been updated every two years to ensure competitiveness. The certification program is unique and defined according to the laws of the Republic of Serbia.

For cities under the BFC certification program, at the beginning of every calendar year, the NALED calculates average scores of the previous assessment according to the defined 12 criteria, which indicates the importance of the criteria [13,15,34,44,79], as shown in Table 1.

As mentioned before, if a city satisfies at least 75% of each of the 12 criteria defined by NALED, it can be given the BFC certificate. The data in Table 1 can be used to rank the local governments using some of the MCDM methods. This study aims to examine the efficiency of the BFC process for cities in the Republic of Serbia from the perspective of the FDI amount per capita. The investment per capita is selected for the analysis because it represents one of the crucial features in measuring the level of economic development in a particular region or country. In this study case, the investment per capita is defined as an output parameter, indicating the efficiency of the BFC process in different cities in the Republic of Serbia.

At the time this study was conducted, only 20 cities had completed the BFC process, so 20 DMUs were used in the efficiency evaluation, and their scores and importance are given in Table 2. In the case study, twelve criteria (F1–F12) were used as input data, and the investments per capita denoted the output result. The data on 12 criteria for each of the 20 cities used in the case study are presented in Table 2. As shown in Table 2, the F1–F12 values are in the range from 0.46 to 1.09, with their standard deviation values in the range of 0.02–0.15; therefore, the BFC process was evaluated in the range from 0.81 to 0.98. The BFC process evaluation indicated that the 20 cities had relatively the same probability for the FDIs, and their investments per capita values were between EUR 111.78 and EUR 995.82, which could have a certain effect on the efficiency progress.

3.2. Methods

The BFC process, as a ranking process, performs the classification into basically two possible groups, efficient and inefficient local governments, which can be ranked. Fortunately, in the literature, classifications can be encountered using the DEA and SFA methods if the input and output attributes are considered single-function variables [14]; the same applies to other MCDA methods, such as different multi-criteria decision-making methods (MCDM) [12,16,80,81,82]. This study proposes an ensemble method that integrates the DEA and SFA methods with an ML-based classification method, which is described in detail in the following subsections.

3.2.1. DEA Method

The DEA method has been commonly employed for the relative efficiency evaluation of decision-making units (DMUs). In this method, the same multiple commensurate datasets are fed to the DMUs to generate multiple commensurate output values. The efficiency is defined in [83], where the single-input/single-output ratio is calculated by dividing the sum of the weighted outputs by the sum of the weighted inputs. Consider n DMUs (DMUj, j = 1, 2, …, n), whose input is denoted by x_ij (i = 1, 2, …, m) and output is denoted by y_rj (r = 1, 2, …, s); then, the absolute efficiency measurement model is defined by [84]:

E_{j} = \frac{\sum_{r = 1}^{s} u_{r} y_{r j}}{\sum_{i = 1}^{m} v_{i} x_{i j}},

(1)

where v_i represents the ith input multiplier, where i = 1, 2, …, m; and u_r is the rth output multiplier, that is, the rth weight value, where r = 1, 2, …, s.

Equation (1) defines a discrete MCDM, but the weight determination process is challenging and complex. The main idea of the DEA method is to eliminate the need for prior weight determination. Charnes et al. [85], who proposed the DEA method, suggested a DMU selected the most appropriate set of weights to be more efficient than the other units in a given set. The linear programming (LP) weighted form of the basic constant return to the scale model (DEA CCR or DEA CRS) with the output orientation [86] is defined by

(\min) h_{k} = \sum_{i = 1}^{m} v_{i} x_{i j},

(2)

where,

\sum_{r = 1}^{s} u_{r} y_{r k} = 1,

(3)

\sum_{i = 1}^{m} v_{i} x_{i j} - \sum_{r = 1}^{s} u_{r} y_{r j} \geq 0, j = 1, \dots, n,

(4)

v_{i} \geq 0, i = 1, \dots, m; u_{r} \geq 0, r = 1, \dots, s .

(5)

An optimal efficiency score h_k value is determined by solving n times the linear model, which is defined by Equations (1)–(5), once for each DMU, comparing it with the other DMUs. In the original Charnes, Cooper, and Rhodes (CCR) DEA model [85], efficient units are assessed with efficiency scores h_k (k = 1, 2, …, n) equal to one, and inefficient units are assessed with a score greater than one, which typically represents a reciprocal value of a value less than one. Further, inefficient units are enveloped by a production frontier composed of efficient DMUs. The efficient DMUs are composed of a real efficient or virtual composite peer unit on the efficient frontier for each inefficient DMU. This model is converted to the so-called Banker, Charnes, and Cooper (BCC) model [86] to include a variable return to scale the assumption. Compared to the DEA CRS model, the DEA BCC (or DEA VRS) model contains an extra variable u*, defining the position of an auxiliary hyperplane lying at or above the DMU of interest, and examines whether the specific DMU has achieved the preferred output level under the minimal input engagement. From the set of possible overlapping hyperplanes, this study selects DMUs from the hyperplane with the shortest horizontal distance from the observed DMU. For u* = 0, the BCC model becomes the CCR model, which can be expressed as follows:

(\min) h_{k} = \sum_{i = 1}^{m} v_{i} x_{i j} - u^{*},

(6)

where,

\sum_{i = 1}^{m} v_{i} x_{i j} = 1,

(7)

\sum_{i = 1}^{m} v_{i} x_{i j} - \sum_{r = 1}^{s} u_{r} y_{r j} - u^{*} \geq 0, j = 1, \dots, n,

(8)

v_{i} \geq 0, i = 1, \dots, m; u_{r} \geq 0, r = 1, \dots, s .

(9)

Finally, the DEA CCR and DEA BCC models could be used to develop different improved versions of the DEA method to address various problems in practice.

3.2.2. SFA Method

The SFA method is a parametric approach developed for efficiency measurement by Aigner, Lovell, and Schmidt [87] and Meeusen and Van den Broeck [88]. It considers the measurement error in estimating the efficiency of a firm under observation.

Assume that a firm j (j = 1, 2, …, n) produces the output

y_{j}

using the input vector

x_{j}

. The corresponding production function is defined as

f (x_{j}, β)

, where β is a parameter vector to be estimated. The output level is under the effect of efficiency

ξ_{j}

and a random error

v_{j}

. Finally, the output production of firm i at a period t is given by:

y_{j} = f (x_{j}, β) ξ_{j} v_{j},

(10)

where

u_{j}

represents the level of efficiency of firm i, and it is in the range of (0,1]. Namely, a firm is considered efficient if

ξ_{j}

= 1; otherwise, it is considered inefficient.

The aim is to estimate vector parameters

β, ξ_{j}

and

v_{j}

to maximize the value of

ξ_{j}

of the firm under observation. For that purpose, natural logs of Equation (11), together with the assumption that the production function of k inputs is linear in logs, is considered, and the following formula is derived:

\ln (y_{j}) = β_{0} + \sum_{i = 1}^{m} β_{i j} \ln (x_{i j}) + v_{j} - u_{i},

(11)

where

u_{j} = - \ln ξ_{j}

represents the inefficiency level, and

v_{j}

is the identically and independently distributed random error.

Then, a stochastic frontier is given by:

β_{0} + \sum_{i = 1}^{m} β_{i j} \ln (x_{i j}) + v_{j},

(12)

while

u_{j}

indicates the inefficiency level.

After estimating the parameters [89] for Equation (11), the technical efficiency of firm i can be easily calculated as a relative distance between the actual output and the estimated stochastic frontier, which is expressed by:

T E_{j} = - e^{u_{j}} .

(13)

3.2.3. Classification Process

Classification has been broadly used in expert systems for domain expert assessment in knowledge identification from massive data. Classification combines supervised learning of data mining (DM) and ML to divide a dataset into two or more classes, where all instances are labeled according to the class they belong to. Supervised ML-based algorithms have been typically employed for generating a classifier based on correctly classified instances, which has been known as expensive training. In contrast, the classification effect of the trained classifier is validated on a test set. Various models have been adopted to develop diverse classifiers, and numerous algorithms have been used for classifier generation, including support vector machines, k-nearest neighbor algorithms, decision trees, logistic regression, and artificial neural network models [90,91,92].

This study combines three types of classifiers to develop the proposed method on a concrete dataset, which describes the problem considered in this study, which is the BFC process in 20 cities in the Republic of Serbia. The three used classifiers are naive Bayes (NB), which is a Bayesian approach; random tree (RT), which is a decision tree approach; and Ada Boost (AB), which is a boosting ensemble approach.

The NB classifiers infer that an event is defined by its likelihood of occurrence. The NB classifier can be trained using a relatively small dataset and can be easily applied to different tasks. Based on related research, for independent predictors, this classifier provides better results compared to the other classifiers [93,94].

The RT classifier uses a number of separate learners and follows a bagging idea to construct a dataset for decision tree generation. The term ensemble indicates an approach that performs prediction averaging of the prediction results obtained by different models on the same dataset [95,96].

The AB classifier, which was proposed by Freund and Schapire, was the first ensemble learning boosting algorithm, and it has been widely used in various fields. Therefore, boosting is a methodological ML approach that defines a highly accurate prediction rule combining different moderately weak and slightly inaccurate prediction rules [97,98].

The basic measures of classifier success include the confusion matrix, which is presented in Figure 1. Also, this study adopts several commonly used evaluation metrics of binary classification effect: precision, accuracy, F1-score, the Matthews correlation coefficient (MCC), recall, precision, and the area under the curve (AUC) related to the receiver operating characteristic (ROC) curve, that is, AUC ROC [99].

The accuracy, precision, and recall values are, respectively, calculated as follows: Accuracy = (TP + TN)/N, Precision = TP/(TP + FP), and Recall = TP/(TP + FN), where TP denotes the number of true positives, TN is the number of true negatives, and N is the total number of data samples (instances).

It should be noted that using the accuracy metric can be inconvenient for data with a highly unequal distribution of classes (i.e., data with skewed classes). Therefore, a trade-off between the precision and recall metrics is necessary. The F1-score metric combines the recall and precision metrics, giving equal importance to both of them, and it is calculated by F1-score = 2 × Precision × Recall/(Precision + Recall). Further, the MCC metric is defined by

M C C = \frac{T P \cdot T N + F P \cdot F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}} .

(14)

The ROC curve shows a binary classifier’s performance comparing the recall (sensitivity) and FPR (specificity) = TN/(TN + FP) metrics, where the recall indicates a trade-off between the precision and recall values. In cases of an unbalanced dataset, the area under the curve (PRC) is referred to as AUCPRC, and this measure is used in place of the ROC measure.

In the proposed method, feature selection algorithms are used for dimension reduction of the considered problem to obtain a model with better characteristics. Optimal feature subset selection methods perform the searching process in the feasible solution domain, and most of them are highly affected by the data dimension and instance/feature ratio.

A selection algorithm searches for an attribute subset that yields an optimal result. The feature ranking is limited and related to classifiers sensitive to the initial input feature order. Therefore, this study adopts a ranker evaluation approach that ranks attributes according to their importance. Further, data volume can be reduced by Weka software [100] that combines diverse methods, and it has been suggested for attribute ranking.

Attribute ranking methods evaluate features combining diverse metrics and assign a rank according to their performance. The evaluation metrics typically consider the statistical properties of features or their expected potential; data dimensionality reduction is performed using the same properties [101].

The methods for attribute selection include filtering methods (filters), prior learning algorithms (wrappers), and embedded methods (embedders). In this study, three algorithms from the filter group, namely, the measures relief (RF), principal components (PCs), and correlation attribute (CA), which perform individual attribute ranking and have been originally developed for classification into only two classes (which is the case onsidered in this study), are used.

The entropy, which has been widely applied to information theory [102], indicates the purity of an arbitrary collection of data samples. The entropy indicates the unpredictability of a system, and the entropy value of Y is calculated by

H (Y) = - \sum_{y \subset Y} p (y) \log_{2} (p (y)),

(15)

where p(y) denotes the marginal probability density function of variable y.

Consider random variables x and y in a training set S. If the entropy of variable y with respect to the partitions induced by variable x is less than the entropy of variable y before dataset partitioning, then, the conditional entropy function is defined as follows:

H (Y | X) = - \sum_{x \subset X} p (x) \cdot \sum_{y \subset Y} p (y | x) \log_{2} (p (y | x))

(16)

where p(y|x) is the conditional probability of variable y conditional to the knowledge about variable x.

Finally, entropy can be regarded as an impurity criterion of a training set S. Therefore, a value that indicates an attribute’s entropy value decreases when obtaining additional information on the attribute provided by the class can be defined.

3.2.4. Proposed Ensemble Method

The MCDA formulation corresponds to the DEA formulation according to their relation, as considered by Stewart [103]. The attributes or criteria for DMU efficiency evaluation represent the inputs and outputs, and the main objective is to minimize the input number while maximizing the output number. Thus, the DEA can be regarded as a non-parametric approach for data classification into efficient and inefficient groups. Similarly, the SFA method denotes the input and output parameters as the production function’s variables.

As mentioned before, the classification process divides given data into two or more categories based on predefined criteria. Currently, there exist various groups of feature selection algorithms for data dimensionality reduction that only select the necessary attributes, which enables general optimization of the BFC process. The main positive effects include the DEA application condition satisfaction and a strong correlation between the DMU number and the number of input and output parameters. As defined in [104], the number of DMUs should be larger than the number of input and output parameters multiplied by a factor of three. Moreover, this study adopts an additional condition that two DMUs are required for each input and output pair.

As stated previously, this study aims to combine efficiency ensemble models and assessment methods to improve classification performances. The methods adopted in this study are all state-of-the-art methods in the field of efficiency measurement.

The basic motivations for integrating the DEA and SFA with classification methods can be summarized as follows:

Eliminates the noise in the data;
Increases the readability of the results;
Improves the calculation speed.

This study addresses the problem of the BFC efficiency evaluation in the Republic of Serbia, considering the BFC of local governments. The main motivations for combining the ensemble and assessment methods to improve performance compared to the included methods adopted individually can be summarized as follows:

The DEA and SFA methods provide significantly different results. In this study, the efficiency of the BFC process is classified into two classes, efficient and inefficient, which implies using the maximum voting method as an ensemble algorithm (n ≥ 3, where n is the number of used algorithms). Namely, only DMUs that are labeled as efficient by two or three methods are marked as efficient;
Classification algorithms are valuable in parameter determination, indicating the classification performance change in terms of specific attribute selection. In this study, there are three classifiers, the NB, RT, and AB, which are considered depending on the training data amount;
The DEA method requires that the DMU number is at least three (the weaker condition is two) times the input and output attribute number. This condition is suitable for applications requiring data dimensionality reduction;
Attribute selection could be beneficial for attribute number reduction, addressing the DEA method’s limitations and realizing noise data reduction. The RF, PC, and CA algorithms, namely, the intersection of their results, are used to estimate the weights of attributes and rank them. For instance, in [105], a number of conclusions that justify using only the ReliefF algorithm have been drawn.

The flowchart of the proposed efficiency evaluation method is shown in Figure 2.

As shown in Figure 2, the proposed algorithm’s steps are as follows:

Data preparation defines the DMU number and efficiency evaluation criterion number, acquires and preprocesses the data, and addresses the problem of missing data;
The efficiency value is calculated individually by the DEA CRS, the SFA, and the DEA VRS methods. The application of the three methods that adopt diverse assumptions provides better insight into data sources’ usefulness and result efficacy. All DMUs are labeled as efficient DMUs or inefficient DMUs, and the latter ones are ranked;
The calculated efficiency data are used as an attribute for classification into efficient and inefficient groups to determine crucial metrics for classification performance: precision, accuracy, F1-score, MCC, recall, AUC ROC, and AUC PRC [99]. The algorithm consisting of NB, RT, and AB with the best results is selected, considering the classification dataset size [106]. If the quality metrics’ values are acceptable, the algorithm proceeds to Step 4; if the quality metrics’ conditions are not met, the algorithm proceeds to Step 7. If the dataset is balanced, the dominant evaluation measure is AUC ROC; otherwise, it is AUC PRC. The results are analyzed before the end of the algorithm based on the determined efficiency in Step 2;
The attribute selection process is performed using the intersection of the feature selection of the three algorithms, RF, PC, and CA, as an evaluation of each of the attributes, and they are ranked. This evaluation is performed using maximum voting and criteria that are determined as significant by all three algorithms. The ranking allows for the determination of a relevant parameter set and inspects the eligibility conditions for the DEA method application by examining the ratio of the attribute/unit number ratio. If the dimensionality reduction is possible, the algorithm proceeds to Step 5; otherwise, it proceeds to Step 7. The results are analyzed before the end of the algorithm using the determined efficiency in Step 2;
If feature selection enables dimension reduction, that is, a smaller number of criteria are significant for the efficiency measurement of the considered BFC process and the classification evaluation (using the best of the three used classifiers determined in Step 3 of this algorithm) of the optimized problem shows better results than the results obtained by the same classifier in Step 3, the algorithm proceeds to Step 6; otherwise, it proceeds to Step 7. The results are analyzed before the end of the algorithm using the determined efficiency in Step 2;
The efficiency measurement and ranking of the DMUs’ efficiency is repeated;
Result discussion is performed, and the final results are analyzed.

At the conclusion of their description of the proposed ensemble method, the authors feel it is essential to clarify that this model effectively combines elements of both maximum voting and stacking ensemble methods in machine learning. Both methods aim to integrate the outcomes or, more commonly, the predictions from several base methods, which may vary in nature, to achieve a superior final result or prediction. The principal distinction between maximum voting and the stacking approach lies in the mechanism of final integration. Maximum voting compiles the outcomes from each base method, or classifier decisions, selecting the class that is most frequently predicted, whereas stacking uses an additional method or classifier (often referred to as a combiner) for the final integration. In the proposed model, the authors rely solely on previously defined models and concepts for DEA (data envelopment analysis) and SFA (stochastic frontier analysis) as the foundational components, without introducing new concepts. They then combine classification with a feature selection method that operates in a maximum voting mode as a meta-classifier combiner [107].

For the proposed model, the authors opted for a classification approach that utilizes binary, rather than linear, regression due to the dichotomous nature of the dependent variable and the clear association of the values of independent variables with the probability of the occurrence of the modality, or the class, of the dependent variable. At the same time, the authors do not address the issue of ranking efficient DMUs (decision-making units); the model is solely focused on identifying so-called relative efficiency and pinpointing the most critical factors for its achievement. This precise calibration of efficient units is left as a potential subject for future research.

4. Results

The proposed method was verified on the real dataset, and this section presents and discusses the results obtained in the case study. At the outset of this section, it is crucial to highlight two important observations:

The literature commonly suggests an 80/20% split for the training and test portions of the dataset in the classification process, though other configurations are also recognized [108]. Given the small sample size in the dataset used in this case study, which comprises 20 instances (i.e., local self-governments) and characterizes 12 relevant criteria, as detailed in Table 2 of this paper, a 10-fold cross-validation was employed for the evaluation of the classification. This method partitions the training set into 10 subsets. Subsequently, it trains on nine subsets while reserving one for testing. This cycle is repeated 10 times, ensuring each subset is used for testing once.
The literature sources [71,72] indicate that, generally, a larger dataset offers greater statistical power for pattern recognition and other forms of analytical processing, as an appropriate sample size is critical for achieving precise and reliable results in any study. However, in several domains such as medicine and economics, it is common to encounter studies with small datasets, as is the case with the study at hand. Considering this, along with the fact that the proposed model addresses an economic issue typically associated with small datasets, the authors reviewed the existing literature, identified several strategies for tackling this challenge, and opted for a 10-fold cross-validation approach. This method was chosen to select the data for model development and validation due to its suitability for studies with small sample sizes [73].

4.1. Data Preparation

Step 1 of the proposed method includes the preparation of data on the BFC process in 20 cities in the Republic of Serbia, which has been described in Section 3.1.

4.2. Preliminary Efficiency Evaluation

The BFC efficiency evaluation results for 20 cities obtained using the predefined criteria, which are provided in Table 2, are presented in Table 3.

The efficiency values obtained by the DEA method (2–5) under the constant-to-return (CRS) economy assumption are given in the second and third columns. The fourth and fifth columns show the results obtained by the DEA VRS method (6–9), which used the variable-to-return economy assumption. Finally, the last two columns in Table 3 show the results achieved by the parametric SFA model.

Although all three methods classified all cities into only two groups (i.e., efficient and inefficient cities), the number of cities in these two groups differed among the methods. The DEA CRS method classified four cities as efficient (i.e., cities 6, 7, 10, and 15) and achieved an average efficiency and standard deviation (St Dev) of 0.684 and 0.275, respectively. In contrast, fourteen cities were assessed as efficient by the DEA VRS method (i.e., cities 1, 3, 5–11, 14, 15, 17, 19, and 20); the average efficiency and standard deviation of the DEA VRS method were 0.927 and 0.011, respectively. Compared to the DEA method, the number of cities categorized as efficient by the SFA method was smaller; namely, only eight cities were labeled as efficient: cities 1, 3, 4, 5, 6, 11, 14, and 15; the average efficiency and standard deviation of the SFA method were 0.762 and 0.241, respectively.

According to the obtained results, the DEA VRS method achieved the most unrealistic values among the three methods, which could be due to its large number of degrees of freedom, which was higher than those of the other two methods.

It should be noted that the above-presented results could be considered predictable since there were 13 criteria (inputs and outputs) and only 20 DMUs. Namely, for this number of criteria, according to the suggestion given in [104], there should be 39 DMUs (i.e., 13 × 3 = 39). In contrast, following the same suggestion, for 20 DMUs, there should be up to six criteria. Practically, only two DMU cities fulfilled the criteria to be efficient according to the maximum voting method, which was confirmed by all three types of method (cities 6 and 15).

4.3. Preliminary Classification Evaluation

In Step 3 of the proposed algorithm, the preliminary classification is performed by three algorithms, namely, the NB, RT, and AB algorithms, using the strictest measure of efficiency; practically, the one with the smallest number of DMUs estimated as efficient (in our case study, this was the DEA CRS with only four efficient DMUs) was determined the best of them by standard measures. The obtained results are given in Table 4.

Next, the AB classifier, in integration with the feature selection procedure described in the next step of the proposed model, enabled us to obtain a desired optimized model for efficiency assessment with better classification characteristics than the initial preliminary classification. The results showed that the best classifier was AB regarding all measures. If the quality measures were satisfactory, the algorithm would proceed to Step 4; otherwise, it would proceed to Step 7. The result was analyzed before the end of the algorithm using the determined efficiency in Step 2 of the algorithm.

4.4. Feature Selection

In the fourth step of the proposed algorithm, the feature selection was performed on the dataset from the case study using three algorithms presented in Section 3.2.1, which were suitable for this case study, namely, the RF, PC, and CA algorithms. The results obtained using the maximum voting are given in Table 5, where it can be seen that important attributes included only six attributes, which represented the input criteria, namely, F2, F3, F5, F6, F8, and F9 criteria.

In the integrated feature selection process, two rules were followed:

-: Ranked criteria with a negative value were rejected and treated as insignificant;
-: Criteria that were not significant for all three selected algorithms were not selected as significant in the proposed method.

If the reduction dimensionality was possible, the algorithm proceeded to Step 5; otherwise, it proceeded to Step 7. The result was analyzed before the end of the algorithm using the efficiency determined in Step 2 of this algorithm. In this case study, the result was good, so the algorithm proceeded to the next step.

4.5. Final Classification Evaluation

In the next step of the algorithm, the final classification was performed by the determined AB classifier, determined as the best for the specific dataset in Step 3 of the proposed algorithm, using the strictest measure efficiency as in Step 3 of this algorithm but with only six significant criteria determined in Step 4: F2, F3, F5, F6, F8, and F9.

The results and parameters of the classification with six input criteria are presented in Table 6, where it can be seen that better results were achieved when all parameters were used in the efficiency evaluation than when only 12 criteria were considered in the evaluation (Table 4). The value of the precision metric was 0.725 (in comparison to 0.622), the recall was 0.750 (former value was 0.7), the F1-score was 0.736 (in comparison to 0.659), the MCC was 0.140 (in comparison to −0.167), the AUC ROC was 0.891 (in comparison to 0.703), and the AUC PRC was 0.895 (in comparison to 0.807). These results, especially results for the AUC ROC and AUC PRC, indicated that criteria selection and reduction of the problem dimensionality could provide an improvement in the classification performance.

Since the quality measures of the final classification were more than satisfactory, the algorithm proceeded to the next step, Step 4; otherwise, the algorithm would proceed to Step 7. The result was analyzed before the end of the algorithm using the efficiency determined in Step 2 of this algorithm.

4.6. Final Efficiency Evaluation

In Step 6, the maximum voting method was applied again to the efficiency evaluation by the DEA CRS, DEA VRS, and SFA methods using six criteria determined in Step 4 of the proposed model.

The obtained results are given in Table 7, where it can be seen that four cities (cities 6, 7, 10, and 15) were categorized as efficient by the DEA CRS method; meanwhile, the DEA VRS method found eleven efficient cities, namely, cities 5–8, 10–13, 15, 19, and 20. Once again, the SFA achieved results that were below the results of the other two methods, assessing two cities as efficient (cities 7 and 8).

The final results are given in Table 7.

Practically, by using fewer criteria in the proposed method, only one city, City 7, was classified as efficient according to the maximum voting methodology, which was the expected result because fewer criteria led to less efficient DMUs [109].

Based on the findings from our case study, by comparing the results presented in Table 3 with those in Table 7 for the efficiency evaluation of DMUs (see Table 8), and the results in Table 4 with those in Table 6 for the quality of classifying DMUs as efficient or not (see Table 9), we can draw several conclusions about the benefits of the proposed method, which uses a subset of six criteria for efficiency measurement:

The method advocates for the use of the DEA CCR approach due to its robustness and rigor in validating results for the problem at hand, both as a standalone method and within an ensemble, as demonstrated in this research. Each DEA method is particularly effective for problems involving multiple inputs and outputs, as well as in cases with a limited number of criteria [80]. On the other hand, the SFA method is better suited to situations with a large number of criteria and where the stochastic nature of parameters is crucial. Thus, each method can serve as a corrective measure to the other, with DEA enhancing SFA and vice versa.
It yields superior outcomes compared to the methodology previously suggested by the authors [14], especially due to its adaptation to various strategies depending on the dataset’s balance.
The method develops an effective evaluation model by merging different methods, thereby improving the performance beyond what is possible with individual methods.
It follows the general rule that the total number of outputs plus three times the number of inputs should be less than the number of DMUs analyzed.
This approach effectively minimizes data noise.
It simplifies the computational process.
The method aids in determining the ranking of the impact of individual resources or factors in the efficiency process. This enables local governments to formulate strategies to enhance these factors, ultimately fostering a better environment for FDI.

5. Discussion

At the beginning of the discussion on the results and the proposed model, as outlined in Section 4 (Results), it is crucial to offer some key observations for a better understanding of the context:

The literature review section covers a broad spectrum of case studies across various human activity domains, ranging from medicine and economy to production, culture, and education. However, the authors were unable to identify any model in the literature that addresses the evaluation of the effectiveness of local self-governments’ certification processes in enhancing foreign investments and methodologically integrates machine learning (ML) classification procedures in an ensemble with two fundamental types of efficiency assessment methods—DEA (data envelopment analysis) and SFA (stochastic frontier analysis), other than their previously mentioned paper [14]. This gap in the literature motivated the continuation of their research and led to the writing of this paper. Therefore, the comparison showcasing the advantages of the proposed model is made solely with the previous work. Specifically, papers [32,33,44] investigate the local business environments in the Republic of Serbia in the context of promoting direct foreign investment; papers [15,34,35] suggest approaches for identifying critical factors for increasing foreign direct investments; paper [36] explores the application of SFA methods; and paper [64] discusses the use of the ML cluster method in this context.
It is acknowledged that there is no consensus on the optimal approach for selecting variables for dimensionality reduction in DEA problems because DEA results can be sensitive to the choice of inputs and outputs, with no definitive method for testing their suitability [74]. The proposed model employs various procedures to limit the number of variables in relation to the number of DMUs, as discussed earlier. These procedures include correlation analysis among variables to select a set that is not highly correlated and methods that leverage regression and correlation analysis to determine which variables to exclude from DEA models with minimal information loss [75,76].
In addition to considering the practical constraints of the mandatory application of controlled dimension reduction in the efficiency evaluation problem by limiting the number of variables relative to the number of DMUs as described, the validity and robustness of the proposed model must also be evaluated. This evaluation is crucial, as detailed in Section 3, which necessitates the use of 10-fold cross-validation.
The final note, deemed by the authors as essential to precede the discussion of the results regarding the proposed model, pertains to the limitation of the SFA method, which is restricted to a single output [110]. This limitation is relevant to the case study discussed in this paper.

The correlation rates before and after applying a reduction in the number of involved factors show that the DEA CRS method was more robust than the others, with a correlation coefficient of 99.21% at a significance level of 0.001. This is significantly higher compared to the DEA VRS and SFA methods, which had correlation rates of 41.59% and 44.39%, respectively. The selection of criteria and the number of degrees of freedom are crucial for evaluating DEA efficiency. The study demonstrates that using fewer criteria can lower DMU efficiency. However, the SFA method is highly sensitive to the choice and number of criteria values.

The efficiency results of the three models, presented in Table 7, were evaluated using Spearman’s rank correlation values. The DEA rank correlation was found to be 37.09%, with no statistical significance at the 0.01 level, likely due to the economies-of-scale assumption. Notably, all cities deemed efficient under the CRS assumption were also classified as efficient under the VRS assumption, though there was an increase in the number of efficient DMUs. The variations in DMU rankings can be understood by comparing them to the NALED experts’ ranks, shown in Table 2. For example, City 5, having attracted more investment per capita, served as the benchmark DMU for City 18. Conversely, City 17, with less investment per capita, could not serve as a benchmark for City 5, especially when considering the economy as an important efficiency evaluation parameter.

The correlation between parametric and non-parametric rankings was found to be 65.82% for the DEA CRS and SFA methods, while the correlation between DEA VRS and SFA was only 27.73%, showing negligible statistical significance, as also noted by Silva et al. [111]. In another study [112], the DEA and SFA methods produced contradictory results, attributed to their varying degrees of dispersion. Similarly, a conclusion in [113] highlighted that relying on a single ranking method could lead to incorrect results, especially for methods exhibiting low correlation.

Considering the results obtained by the same authors in a similar case study [14], it can be concluded that the method proposed in this paper significantly improves upon the previously suggested approach [14]. This improvement is not only due to better classification results but also because it accounts for the crucial aspect of dataset balance. Specifically, in the case study conducted, we encountered an imbalanced dataset with a class distribution ratio of 80%:20%.

To sum up, various ensemble models previously published by the coauthors of this study can be found in references [114,115,116,117,118]. These models employ diverse strategies such as maximum voting or triple modular redundancy, combined with various classification and machine learning (ML) algorithms, to address a range of problems in human life. The insights derived from the aforementioned studies and the literature addressing the challenges of small dataset sizes [119] and appropriate dimension reduction in efficiency evaluation [120] can guide future research efforts aimed at addressing the issues discussed in this paper. The insights gained from these earlier works and the literature [103] could inform future research aimed at resolving the issue discussed in this paper. This future research would involve integrating a broader array of algorithms and methods to develop a more effective ensemble method. Additionally, future work will consider incorporating decision-making methods as outlined in [121] and addressing multiclass classification problems, which were mentioned in the introduction of this paper [122].

6. Conclusions

This study adopts the hypothesis that it is possible to develop an innovative assessment model by combining various advanced technologies to improve assessment accuracy. This paper proposes a method that combines the DEA VRS, non-parametric DEA CRS, parametric SFA, and ML-based models for classification, including the process of feature selection. The proposed framework is verified using real data on the BFC certification of most cities important for economic development in the Republic of Serbia.

The results have proved the mentioned hypothesis. Namely, the model achieves the following:

It constructs an effective efficiency evaluation model using diverse methods to improve the performance compared to the adopted methods when they are used separately;
It reduces noise in data;
It reduces the computation time;
It improves upon the method previously proposed by the same authors [14], as it not only yields better results but also incorporates a distinct approach that accounts for the crucial parameter of dataset balance;
It enables simultaneous determination of the ranking for each individual resource, i.e., a factor’s influence in the considered process.

From another practical perspective, the proposed method could serve two valuable functions critical to the operations of local governments:

It could be utilized as an application on the website of each local government, providing a tool for officials responsible for attracting foreign direct investments. This application would facilitate the monitoring of investment attraction success and assist in identifying and tracking key factors that influence it.
It could also be developed into a concrete system for planning sustainable local economic development. An example of such a system is already implemented on the website of the city of Niš in the Republic of Serbia, serving as a model for other local governments aiming to enhance their economic development strategies.

Considering the existing problems in the measurement of BFC efficiency, regardless of whether one DEA method is used, as is the case in this study, when there is an expressed need to reduce the dimensions due to the usually small number of included units of local governments but a large number of influencing factors, using a classification method in synergy with a future selection of attributes could be a good choice. The proposed method enhances the utility of the DEA approach by reducing data noise and computational time while also delivering more precise outcomes than when integrating methods are used individually. However, when applying this model, it is crucial to consider its limitations concerning the size of the dataset under examination. This includes considerations for the data employed, as well as the necessity for rigorous data validation, specifically a mandatory minimum of 10-fold cross-validation in the methodology. Additionally, the inherent limitations of the DEA method, such as the relationship between variables and observational units, and those of the SFA method, particularly its reliance on a univariate approach, must be acknowledged. The proposed method enables proper use of the DEA method, reduces noise in data and computation time, and obtains more accurate results compared to the integrated methods when used individually.

Future work could consider the issue of efficiency measurement in the considered BFC and other processes using the DEA CCR model because it is an effective assessment method for multi-input, multi-output processes, especially those where the advantages of the input and output data could be mutually exchanged with each other. The SFA is a method of choice when there are numerous criteria, and there is a necessity for the inclusion of the stochastic nature of parameters. All in all, these three methods can be imposed as corrective and explanatory factors to one another, as has been shown in the example of the ensemble model proposed in this study. In future work, the integration of a larger number of different types of algorithms and methodologies including multi-criteria methods and multi-class classification methods could be considered to develop an ensemble, probably multi-stage, method with better performance than the method proposed in this paper.

Author Contributions

Conceptualization, investigation, writing—original draft: A.K.; validation: L.B.; project administration, formal analysis: M.R.; software, writing—review & editing: M.Č.; supervision, methodology, writing—original draft: D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available at ref. [14].

Conflicts of Interest

Aleksandar Kemiveš was employed by the PUC Infostan Technologies, Milan Ranđelović were employed by the Science Technology Park Niš. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The PUC Infostan Technologies and Science Technology Park Niš had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Kilibarda, M.; Andrejić, M.; Vidovic, M. Measuring efficiency of logistics processes in distribution centers. In Proceedings of the 14th QMOD Conference on Quality and Service Sciences 2011—From Learnability & Innovability to Sustainability, San Sebastian, Spain, 29–31 August 2011; pp. 996–1010. [Google Scholar]
Hamdan, A.; Rogers, K.J. Evaluating the efficiency of 3PL logistics operations. Int. J. Prod. Econ. 2008, 113, 235–244. [Google Scholar] [CrossRef]
Jeličić, D. Development of Logistics Controlling Model in Industrial Systems. Ph.D. Thesis, Faculty of Engineering Sciencies, University of Novi Sad, Novi Sad, Serbia, 2019. Available online: https://nardus.mpn.gov.rs/bitstream/handle/123456789/11420/Disertacija.pdf (accessed on 12 December 2023).
Ertugrul, I.; Oztas, T. Efficiency Measurement with a Three-Stage Hybrid Method. Int. J. Assess. Tools Educ. 2018, 5, 370–388. [Google Scholar] [CrossRef]
Fried, H.; Lovell, C.; Schmidt, S. The Measurement of Productive Efficiency and Productivity Change; Oxford University Press: Oxford, UK, 2008. [Google Scholar] [CrossRef]
Dyckhoff, H.; Souren, R. Integrating multiple criteria decision analysis and production theory for performance evaluation: Framework and review. Eur. J. Oper. Res. 2022, 297, 795–816. [Google Scholar] [CrossRef]
Ray, P.K.; Sahu, S. Productivity measurement through multi-criteria decision making. Eng. Costs Prod. Econ. 1990, 20, 151–163. [Google Scholar] [CrossRef]
Mirmozaffari, M.; Yazdani, M.; Boskabadi, A.; Ahady Dolatsara, H.; Kabirifar, K.; Amiri Golilarz, N. A Novel Machine Learning Approach Combined with Optimization Models for Eco-efficiency Evaluation. Appl. Sci. 2020, 10, 5210. [Google Scholar] [CrossRef]
Sikimić, V.; Radovanović, S. Machine learning in scientific grant review: Algorithmically predicting project efficiency in high energy physics. Eur. J. Philos. Sci. 2022, 12, 50. [Google Scholar] [CrossRef] [PubMed]
Aparicio, J.; Esteve, M.; Kapelko, M. Measuring dynamic inefficiency through machine learning techniques. Expert Syst. Appl. 2023, 228, 120417. [Google Scholar] [CrossRef]
Zhang, Z.; Xiao, Y.; Niu, H. DEA and Machine Learning for Performance Prediction. Mathematics 2022, 10, 1776. [Google Scholar] [CrossRef]
Gupta, A.; Kohli, M.; Malhotra, N. Classification based on Data Envelopment Analysis and supervised learning: A case study on the energy performance of residential buildings. In Proceedings of the 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES), Delhi, India, 4–6 July 2016; pp. 1–5. [Google Scholar]
Ranđelović, M.; Stanković, J.; Savić, G.; Kuk, K.; Ranđelović, D. An Approach to Determining the Importance of Criteria in the Process of Certifying a City as a Business-Friendly Environment. Interfaces 2018, 48, 156–165. [Google Scholar] [CrossRef]
Jovanović, M.; Nedeljković, S.; Ranđelović, M.; Savić, G.; Stojanović, V.; Stojanović, V.; Ranđelović, D. A Multicriteria Decision Aid-Based Model for Measuring the Efficiency of Business-Friendly Cities. Symmetry 2020, 12, 1025. [Google Scholar] [CrossRef]
Ranđelović, M.; Nedeljković, S.; Jovanović, M.; Čabarkapa, M.; Stojanović, V.; Aleksić, A.; Ranđelović, D. Use of Determination of the Importance of Criteria in Business-Friendly Certification of Cities as Sustainable Local Economic Development Planning Tool. Symmetry 2020, 12, 425. [Google Scholar] [CrossRef]
Sarkis, J. A comparative analysis of DEA as a discrete alternative multiple criteria decision tool. Eur. J. Oper. Res. 2000, 123, 543–557. [Google Scholar] [CrossRef]
Prado, J.; Heijungs, R. Implementation of stochastic multi attribute analysis (SMAA) in comparative environmental assessments. Environ. Model. Softw. 2018, 109, 223–231. [Google Scholar] [CrossRef]
Hannes, W. Lampe, Dennis Hilgers, Trajectories of efficiency measurement: A bibliometric analysis of DEA and SFA. Eur. J. Oper. Res. 2015, 240, 1–21. [Google Scholar] [CrossRef]
Katharakis, G.; Katharaki, M.; Katostaras, T. An empirical study of comparing DEA and SFA methods to measure hospital units’ efficiency. Int. J. Oper. Res. 2014, 21, 341–364. [Google Scholar] [CrossRef]
Berger, A.N.; Hunter, W.C.; Timme, S.G. The Efficiency of Financial Institutions: A Review and Preview of Research Past, Present, and Future. J. Bank. Financ. 1993, 17, 221–249. [Google Scholar] [CrossRef]
Jacobs, R.; Smith, P.; Street, A. Measuring Efficiency in Health Care; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
Sarkis, J.; Talluri, S. Ecoefficiency measurement using data envelopment analysis: Research data practitioner issues. Integr. Environ. Assess. Manag. 2004, 6, 91–123. [Google Scholar] [CrossRef]
Carlsson, B. The Measurement of Efficiency in Production: An Application to Swedish Manufacturing Industries 1968. Swed. J. Econ. 1972, 74, 468–485. [Google Scholar] [CrossRef]
Peypoch, N.; Solonandrasana, B. Research note: Technical efficiency in the tourism industry. Tour. Econ. 2006, 12, 653–657. [Google Scholar] [CrossRef]
Klimchik, A.; Ambiehl, A.; Garnier, S.; Furet, B.; Pashkevich, A. Efficiency evaluation of robots in machining applications using industrial performance measure. Robot. Comput.-Integr. Manuf. 2017, 48, 12–29. [Google Scholar] [CrossRef]
Andrejić, M.; Kilibarda, M. The problems of measuring efficiency in logistics. Vojnoteh. Glas. 2013, 61, 84–104. [Google Scholar] [CrossRef]
Haar, S.; Segers, R.M.; Jehn, K. Measuring the effectiveness of emergency management teams: Scale development and validation. Int. J. Emerg. Manag. 2013, 9, 258. [Google Scholar] [CrossRef]
Filippini, M.; Farsi, M. Regulation and Measuring Cost-Efficiency with Panel Data Models: Application to Electricity Distribution Utilities. Rev. Ind. Organ. 2004, 25, 1–19. [Google Scholar]
Ferrera, J.; Pedraja-Chaparro, F.; Salinas-Jimenez, J. Measuring Efficiency in Education: An Analysis of Different Approaches for Incorporating Non-discretionary Inputs. Appl. Econ. 2008, 40, 1323–1339. [Google Scholar] [CrossRef]
Liket, K.C.; Maas, K. Nonprofit organizational effectiveness: Analysis of best practices. Nonprofit Volunt. Sect. Q. 2015, 44, 268–296. [Google Scholar] [CrossRef]
BFC SEE Network. Available online: http://bfc-see.org/about-bfc-see-network (accessed on 1 January 2024).
BFC See Standard. Available online: https://naled.rs/images/preuzmite/Program_Certifikacije_opstina_sa_povoljnim_poslovnim_okruzenjem_u_JIE_brosura.pdf (accessed on 30 January 2024).
Stojanović, B.; Stanković, J.; Ranđelović, M. The City of Niš Competitiveness Analysis in the Field of Foreign Direct Investment. Econ. Enterp. 2012, 60, 167–178. [Google Scholar] [CrossRef]
Ranđelović, D.; Stanković, J.; Anđelković, M.; Ranđelović, M. Application of AHP method in cities certification process. Management 2014, 69, 75–84. [Google Scholar]
Randjelovic, M.; Savic, G.; Stojanović, B.; Randjelovic, D. An integrated DEA/AHP methodologyfor determining the criteria of importance in the process of bussines-friendly certification the local level. Teme 2020, 285, 285–300. [Google Scholar] [CrossRef]
Radulovic, B.; Dragutinovic, S. Efficiency of local self-governments in Serbia: An SFA approach. Industrija 2015, 43, 123–142. [Google Scholar] [CrossRef]
Afonso, A.; Fernandes, S. Assessing and Explaining the Relative Efficiency of Local Government. J. Soc. Econ. 2008, 37, 1946–1979. [Google Scholar] [CrossRef]
Balaguer, M.T.; Prior, D.; Tortosa, E. On the determinants of local government performance: A two-stage nonparametric approach. Eur. Econ. Rev. 2007, 51, 425–451. [Google Scholar] [CrossRef]
De Borger, B.; Kerstens, K. Cost Efficiency of Belgian Local Governments: A Comparative analysis of FDH, DEA, and econometric approaches. Reg. Sci. Urban Econ. 1996, 26, 145–170. [Google Scholar] [CrossRef]
Geys, B.; Heinemann, F.; Kalb, A. Voter involvement, fiscal autonomy and public sector efficiency: Evidence from German cities. Eur. J. Political Econ. 2010, 26, 265–278. [Google Scholar] [CrossRef]
Lo Storto, C. Evaluating technical efficiency of Italian major cities: A dataenvelopment analysis model. Procedia-Soc. Behav. Sci. 2013, 81, 346–350. [Google Scholar] [CrossRef]
Worthington, A.C. Cost Efficiency in Australian Local Government: A Comparative Analysis of Mathematical Programming and Econometric Approaches. Financ. Account. Manag. 2000, 16, 201–223. [Google Scholar] [CrossRef]
Westhuizen, G.; Dollery, B. South African Local Government Efficiency Measurement; Centre for Local Government, School of Business, Economics and Public Policy, University of New England: Armidale, Australia, 2009. [Google Scholar]
Stanković, J.; Radenković-Jocić, D. The of bussines environment in the promotion of investment activities: Case study and cities in the Republic of Serbia. Teme 2017, XLI, 457–473. [Google Scholar] [CrossRef]
Gharizadeh Beiragh, R.; Alizadeh, R.; Shafiei Kaleibari, S.; Cavallaro, F.; Zolfani, S.H.; Bausys, R.; Mardani, A. An integrated Multi-Criteria Decision Making Model for Sustainability Performance Assessment for Insurance Companies. Sustainability 2020, 12, 789. [Google Scholar] [CrossRef]
Chitnis, A.; Vaidya, O. Efficiency ranking method using SFA and TOPSIS (ERM-ST): Case of Indian banks. Benchmark. Int. J. Res. 2018, 25, 471–488. [Google Scholar] [CrossRef]
Kumbhakar, S.; Parmeter, C.; Zelenyuk, V. Stochastic Frontier Analysis: Foundations and Advances; Working Papers 2017-10; University of Miami, Department of Economics: Coral Gables, FL, USA, 2017. [Google Scholar]
Mirmozaffari, M.; Yazdani, R.; Shadkam, E.; Khalili, S.M.; Tavassoli, L.S.; Boskabadi, A. A Novel Hybrid Parametric and Non-Parametric Optimisation Model for Average Technical Efficiency Assessment in Public Hospitals during and Post-COVID-19 Pandemic. Bioengineering 2022, 9, 7. [Google Scholar] [CrossRef]
Keshtkar, L.; Rashwan, W.; Abo-Hamad, W.; Arisha, A. A hybrid system Dynamics-Discrete Event Simulation and Data Envelopment Analysis to investigate boarding patients in acute hospitals. Oper. Res. Health Care 2020, 26, 100266. [Google Scholar] [CrossRef]
Katharaki, M. Approaching the management of hospital units with an operation research technique: The case of 32 Greek obstetric and gynaecology public units. Health Policy 2008, 85, 19–31. [Google Scholar] [CrossRef]
Thoraneenitiyan, N.; Avkiran, K.N. Measuring the impact of restructuring and country-specific factors on the efficiency of post-crisis East Asian banking systems: Integrating DEA with SFA. Socio-Econ. Plan. Sci. 2009, 43, 240–252. [Google Scholar] [CrossRef]
Bezat-Jarzebowska, A. A concept of technical efficiency measurement based on the integrated use of the sfa and dea methods. Sci. Work. Wrocław Univ. Econ. 2012, 261, 11–24. [Google Scholar]
François-Seck, F.; Al-Mouksit, A.; Wassongma, H. DEA and SFA research on the efficiency of microfinance institutions: A meta-analysis. World Dev. 2018, 107, 176–188. [Google Scholar] [CrossRef]
Emrouznejad, A.; Anouze, A.L. Data Envelopment Analysis with classification and regression tree—A case of banking efficiency. Expert Syst. 2010, 27, 231–246. [Google Scholar] [CrossRef]
Amin, G.R.; Emrouznejad, A.; Rezaei, S. Some clarifications on the DEA clustering approach. Eur. J. Oper. Res. 2011, 215, 498–501. [Google Scholar] [CrossRef]
Anouze, A.L.; Bou-Hamad, I. Data envelopment analysis and data mining to efficiency estimation and evaluation. Int. J. Islam. Middle East. Financ. Manag. 2019, 12, 169–190. [Google Scholar] [CrossRef]
Azadeh, A.; Saberi, M.; Moghaddam, R.; Javanmardi, L. An integrated data envelopment analysis-artificial neural network-rough set algoritm for assessment of personnel efficiency. Expert Syst. Appl. 2011, 38, 1364–1373. [Google Scholar] [CrossRef]
Antonuccia, L.; Crocettaa, C.D.; D’Ovidio, F. Evaluation of Italian Judicial System. Procedia Econ. Financ. 2014, 17, 121–130. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; ISBN 978-0-387-31073-2. [Google Scholar]
Fethi, M.; Pasiouras, F. Assessing bank efficiency and performance with operational research and artificial intelligence techniques: A survey. Eur. J. Oper. Res. 2010, 204, 189–198. [Google Scholar] [CrossRef]
Emrouznejad, A.; Shale, E. A combined neural network and DEA for measuring efficiency of large scale datasets. Comput. Ind. Eng. 2009, 56, 249–254. [Google Scholar] [CrossRef]
Barros, C.P.; Wanke, P. Insurance companies in Mozambique: A two-stage DEA and neural networks on efficiency and capacity slacks. Appl. Econ. 2014, 46, 3591–3600. [Google Scholar]
Misiunas, N.; Oztekin, A.; Chen, Y.; Chandra, K. DEANN: A healthcare analytic methodology of data envelopment analysis and artificial neural networks for the prediction of organ recipient functional status. Omega 2016, 58, 46–54. [Google Scholar] [CrossRef]
Radukić, S.; Stanković, J. Evaluation of local business environment in the Republic of Serbia. Procedia Econ. Financ. 2016, 19, 353–363. [Google Scholar] [CrossRef]
Zhu, N.; Zhu, C.; Emrouznejad, A. A combined machine learning algorithms and DEA method for measuring and predicting the efficiency of Chinese manufacturing listed companies. J. Manag. Sci. Eng. 2021, 6, 435–448. [Google Scholar] [CrossRef]
Nandy, A.; Singh, P.K. Farm efficiency estimation using a hybrid approach of machine-learning and data envelopment analysis: Evidence from rural eastern India. J. Clean. Prod. 2020, 267, 122106. [Google Scholar] [CrossRef]
Moragues, R.; Aparicio, J.; Esteve, M. An unsupervised learning-based generalization of Data Envelopment Analysis. Oper. Res. Perspect. 2023, 11, 100284. [Google Scholar] [CrossRef]
Younes, K.; Kharboutly, Y.; Antar, M.; Chaouk, H.; Obeid, E.; Mouhtady, O.; Abu-samha, M.; Halwani, J.; Murshid, N. Application of Unsupervised Learning for the Evaluation of Aerogels’ Efficiency towards Dye Removal—A Principal Component Analysis (PCA) Approach. Gels 2023, 9, 327. [Google Scholar] [CrossRef]
Đokić, D.; Novaković, T.; Tekić, D.; Matkovski, B.; Zekić, S.; Milić, D. Technical Efficiency of Agriculture in the European Union and Western Balkans: SFA Method. Agriculture 2022, 12, 1992. [Google Scholar] [CrossRef]
Azadeh, A.; Ghaderi, S.; Omrani, H.; Eivazy, H. An integrated DEA–COLS–SFA algorithm for optimization and policy making of electricity distribution units. Energy Policy 2009, 37, 2605–2618. [Google Scholar] [CrossRef]
Raudys, S.J.; Jain, A.K. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 252–264. [Google Scholar] [CrossRef]
Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef] [PubMed]
Varma, S.; Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 2006, 7, 91. [Google Scholar] [CrossRef] [PubMed]
Wong, W.-P. A Global Search Method for Inputs and Outputs in Data Envelopment Analysis: Procedures and Managerial Perspectives. Symmetry 2021, 13, 1155. [Google Scholar] [CrossRef]
Cooper, W.W.; Seiford, L.M.; Tone, K. Data Envelopment Analysis: A Comprenhensive Text with Models, Applications, References and DEA-Solver Software, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Jenkins, L.; Anderson, M. A multivariate statistical approach to reducing the number of variables in data envelopment analysis. Eur. J. Oper. Res. 2003, 147, 51–61. [Google Scholar] [CrossRef]
Appa, G.; Norman, M.; Stoker, B. Data Envelopment Analysis: The Assessment of Performance. J. Oper. Res. Soc. 1992, 43, 919. [Google Scholar] [CrossRef]
Banker, R.D. Hypothesis tests using data envelopment analysis. J. Prod. Anal. 1996, 7, 139–159. [Google Scholar] [CrossRef]
Swinburn, G.; Goga, S.; Murphy, F. Local Economic Development: A Primer Developing and Implementing Local Economic Development Strategies and Action Plan; The World Bank: Washington, DC, USA, 2006; Available online: http://siteresources.worldbank.org/INTLED/423069-1099670772921/20738133/led_primer.pdf (accessed on 1 January 2024).
Figueira, J.; Greco, S.; Ehrgott, M. (Eds.) Multiple Criteria Decision Analysis: State of the Art Surveys; Springer: Boston, MA, USA, 2005. [Google Scholar]
Ishizaka, A.; Nemery, P. Multi-Criteria Decision Analysis: Methods and Software; J. Wiley: New York, NY, USA, 2013. [Google Scholar]
Tsoukiàs, A. From decision theory to decision aiding methodology. Eur. J. Oper. Res. 2008, 187, 138–161. [Google Scholar] [CrossRef]
Belton, V.; Vickers, S.P. Demystifying DEA—A visual interactive approach based on multi criteria analysis. J Oper. Res. Soc. 1993, 44, 883–896. [Google Scholar]
Sarrico, C. Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software. J. Oper. Res. Soc. 2001, 52, 2601257. [Google Scholar] [CrossRef]
Charnes, A.; Cooper, W.W.; Rhodes, E. Measuring the efficiency of decision making units. Eur. J. Oper. Res. 1978, 2, 429–444. [Google Scholar] [CrossRef]
Banker, R.D.; Charnes, A.; Cooper, W.W. Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis. Manag. Sci. 1984, 30, 1078–1092. [Google Scholar] [CrossRef]
Aigner, D.; Lovell, C.A.K.; Schmidt, P. Formulation and estimation of stochastic frontier production function models. J. Econom. 1977, 6, 21–37. [Google Scholar] [CrossRef]
Meeusen, W.; Van Den Broeck, J. Efficiency estimation from Cobb–Douglas production functions with composed error. Econom. Rev. 1977, 18, 435–444. [Google Scholar] [CrossRef]
Kumbhakar, S.C.; Lovell, C.A.K. Stochastic Frontier Analysis; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
MacKay, D. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Mitchell, T. Machine Learning; McGraw-Hill Science/Engineering/Math: New York, NY, USA, 1997. [Google Scholar]
Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Prentice Hall: Hoboken, NJ, USA, 2003. [Google Scholar]
Langley, P.; Iba, W.; Thompson, K. An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence-AAAI, San Jose, CA, USA, 12–16 July 1992; pp. 399–406. [Google Scholar]
Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
Stohl, R.; Stibor, K. Predicting Safety Logic Device Solutions via Decision Trees and Rules Algorithms. In Proceedings of the 21th International Carpathian Control Conference (ICCC), High Tatras, Slovakia, 27–29 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
Le Gall, J.F. Random trees and applications. Probab. Surv. 2005, 2, 245–311. [Google Scholar] [CrossRef]
Bartlett, P.L.; Traskin, M. Ada Boost is consistent. J. Mach. Learn. Res. 2007, 8, 2347–2368. [Google Scholar]
Jiang, W. Process consistency for Ada Boost. Ann. Stat. 2004, 32, 13–29. [Google Scholar] [CrossRef]
Gong, M. A novel performance measure for machine learning classification. Int. J. Manag. Inf. Technol. 2021, 13, 11–19. [Google Scholar] [CrossRef]
Weka. University of Waikato: New Zealand. Available online: http://www.cs.waikato.ac.nz/ml/weka (accessed on 1 January 2024).
Liu, H.; Motoda, H. Feature Selection for Knowlegde Discovery and Data Mining; Kluwer Academic Publishers: London, UK, 1998. [Google Scholar]
Abe, N.; Kudo, M. Entropy criterion for classifier-independent feature selection. Lect. Notes Comput. Sci. 2005, 3684, 689–695. [Google Scholar]
Stewart, T.J. Relationships between data envelopment analysis and multicriteria decision analysis. J. Oper. Res. Soc. 1996, 47, 654–665. [Google Scholar] [CrossRef]
Cooper, W.; Seiford, L.; Tone, K. Data Envelopment Analysis; Springer: New York, NY, USA, 2000. [Google Scholar]
Urbanowicz, R.; Meeker, M.; La Cava, W.; Olson, R.; Moore, J. Relief-based feature selection: Introduction and review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef]
Domingos, P.; Pazzani, M. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 1997, 29, 103–130. [Google Scholar] [CrossRef]
Zhou, Z.H. Ensemble Methods Foundations and Algorithms; Chapman & Hall/CRC Machine Learning & Pattern Recognition Series; Herbrich, R., Graepel, T., Eds.; CRC Press Taylor & Francis Group: Boca Raton, FL, USA, 2012; ISBN 978-1-4398-3005-5. [Google Scholar]
Muraina, I. Ideal dataset splitting ratios in machine learning algorithms: General concerns for data scientists and data analysts. In Proceedings of the 7th International Mardin Artuklu Scientific Researches, Mardin, Turkiye, 10–12 December 2021; pp. 496–504. [Google Scholar]
Charles, V.; Aparicio, J.; Zhu, J. The curse of dimensionality of decision-making units: A simple approach to increase the discriminatory power of data envelopment analysis. Eur. J. Oper. Res. 2019, 279, 929–940. [Google Scholar] [CrossRef]
Arsad, R.; Isa, Z.; Shaari, S.N.M. Estimating Efficiency Performance of Decision-Making Unit by using SFA and DEA Method: A Cross-Sectional Data Approach. Int. J. Eng. Technol. 2018, 7, 25–31. [Google Scholar] [CrossRef]
Silva, T.C.; Tabak, B.M.; Cajueiro, D.O.; Dias, M.V.B. A comparison of DEA and SFA using micro-and macro-level perspectives: Efficiency of Chinese local banks. Phys. A 2017, 469, 216–223. [Google Scholar] [CrossRef]
Popović, M.; Savić, G.; Kuzmanović, M.; Martić, M. Using Data Envelopment Analysis and Multi-Criteria Decision-Making Methods to Evaluate Teacher Performance in Higher Education. Symmetry 2020, 12, 563. [Google Scholar] [CrossRef]
Dong, Y.; Hamilton, R.; Tippett, M. Cost efficiency of the Chinese banking sector: A comparison ofstochastic frontier analysis and data envelopment analysis. Econ. Model. 2014, 36, 298–308. [Google Scholar] [CrossRef]
Aleksić, A.; Nedeljković, S.; Jovanović, M.; Ranđelović, M.; Vuković, M.; Stojanović, V.; Radovanović, R.; Ranđelović, M.; Ranđelović, D. Prediction of Important Factors for Bleeding in Liver Cirrhosis Disease Using Ensemble Data Mining Approach. Mathematics 2020, 8, 1887. [Google Scholar] [CrossRef]
Ranđelović, M.; Aleksić, A.; Radovanović, R.; Stojanović, V.; Čabarkapa, M.; Ranđelović, D. One Aggregated Approach in Multidisciplinary Based Modeling to Predict Further Students’ Education. Mathematics 2022, 10, 2381. [Google Scholar] [CrossRef]
Ranđelović, D.; Ranđelović, M.; Čabarkapa, M. Using Machine Learning in the Prediction of the Influence of Atmospheric Parameters on Health. Mathematics 2022, 10, 3043. [Google Scholar] [CrossRef]
Aleksić, A.; Ranđelović, M.; Ranđelović, D. Using Machine Learning in Predicting the Impact of Meteorological Parameters on Traffic Incidents. Mathematics 2023, 11, 479. [Google Scholar] [CrossRef]
Mišić, J.; Kemiveš, A.; Ranđelović, M.; Ranđelović, D. An Asymmetric Ensemble Method for Determining the Importance of Individual Factors of a Univariate Problem. Symmetry 2023, 15, 2050. [Google Scholar] [CrossRef]
Nguyen, Q.H.; Ly, H.-B.; Ho, L.S.; Al-Ansari, N.; Van Le, H.; Tran, V.Q.; Prakash, I.; Pham, B.T. Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil. Math. Probl. Eng. 2021, 2021, 1–15. [Google Scholar] [CrossRef]
Adler, N.; Golany, B. PCA-DEA: Reducing the curse of dimensionality. In Modeling Data Irregularities and Structural Complexities in Data Envelopment Analysis; Zhu, J., Cook, W.D., Eds.; Springer: Boston, MA, USA, 2007; pp. 139–153. [Google Scholar]
Hashemi, A.; Dowlatshahi, M.; Nezamabadipour, H. Ensemble of feature selection algorithms: A multi-criteria decision-making approach. Int. J. Mach. Learn. Cybern. 2022, 13, 49–69. [Google Scholar] [CrossRef]
Kumar, A.; Kaur, A.; Singh, P.; Driss, M.; Boulila, W. Efficient Multiclass Classification Using Feature Selection in High-Dimensional Datasets. Electronics 2023, 12, 2290. [Google Scholar] [CrossRef]

Figure 1. Confusion matrix of a binary classification process.

Figure 2. The flowchart of the proposed model for the BFC efficiency evaluation.

Table 1. The importance of the BFC process’s criteria calculated by the NALED.

Expert	F1	F2	F3	F4	F5	F6	F7	F8	F9	F10	F11	F12
NALED’s experts	1.25	0.90	0.67	1.19	0.66	0.71	1.00	0.75	1.08	1.21	1.50	0.83

Table 2. Descriptive statistics of the NALED’s evaluation of 20 local cities.

	Input												Output
City	F1	F2	F3	F4	F5	F6	F7	F8	F9	F10	F11	F12	Investment	Score	Rank
City 1	0.80	1.06	1.00	0.73	0.88	1.00	1.00	0.73	0.64	0.83	1.00	1.00	520.02	0.88	10
City 2	1.00	0.82	0.75	1.00	0.93	1.00	1.00	0.93	1.00	0.88	1.00	1.00	686.57	0.95	3
City 3	0.63	0.95	0.80	0.94	0.86	1.00	0.90	0.75	0.67	0.94	0.93	1.00	580.64	0.86	14
City 4	0.90	0.82	0.88	1.00	0.95	1.00	1.00	0.70	0.68	0.76	1.00	0.75	464.16	0.87	12
City 5	1.00	0.62	1.00	0.78	0.60	0.67	1.00	0.60	0.59	0.98	0.83	1.00	315.94	0.82	19
City 6	1.00	1.06	0.75	0.94	0.90	0.94	1.00	0.87	0.91	0.79	1.00	1.00	942.36	0.94	4
City 7	1.00	0.94	1.00	0.78	0.70	0.78	1.00	0.57	0.73	0.70	1.00	0.50	879.20	0.82	18
City 8	1.00	0.82	1.00	0.89	1.00	1.00	1.00	0.83	0.55	0.88	1.00	1.00	415.97	0.91	7
City 9	1.00	0.82	1.00	0.67	0.65	1.00	1.00	0.87	0.96	0.81	0.83	1.00	622.95	0.88	11
City 10	1.00	0.94	0.75	0.81	0.63	0.94	1.00	0.67	0.91	0.79	1.00	1.00	754.09	0.89	8
City 11	1.00	0.77	0.75	0.83	0.73	1.00	1.00	0.53	0.64	0.76	0.83	1.00	687.33	0.83	17
City 12	0.80	1.00	0.75	0.89	0.90	1.00	1.00	0.53	0.73	0.73	0.83	0.75	200.01	0.83	16
City 13	0.80	1.00	1.00	0.74	0.73	1.00	1.00	0.77	0.46	0.77	1.00	1.00	111.78	0.85	15
City 14	1.00	0.94	1.00	0.87	0.73	1.00	1.00	0.83	0.55	0.68	1.00	0.88	368.21	0.87	13
City 15	1.00	0.94	1.00	1.00	0.90	1.00	1.00	1.00	1.00	0.94	1.00	0.62	995.82	0.96	2
City 16	1.00	0.82	1.00	0.89	0.78	1.00	1.00	0.80	0.91	0.78	1.00	1.00	208.68	0.92	5
City 17	1.00	0.82	1.00	0.78	0.88	0.89	1.00	0.67	0.55	0.83	0.67	0.75	306.58	0.81	20
City 18	1.00	0.88	1.00	1.02	0.90	0.94	1.00	0.87	1.09	0.93	1.00	1.00	295.83	0.98	1
City 19	1.00	0.95	0.60	0.97	0.93	1.00	1.00	0.85	0.58	0.94	1.00	1.00	432.21	0.91	6
City 20	1.00	0.77	0.88	0.81	0.80	1.00	1.00	0.73	0.82	0.77	1.00	1.00	697.12	0.89	9
Max	1	1.06	1	1.024	1	1	1	1	1.09	0.98	1	1	995.82	0.98
Min	0.63	0.62	0.60	0.67	0.60	0.67	0.90	0.53	0.46	0.68	0.67	0.50	111.78	0.81
Average	0.95	0.89	0.90	0.87	0.82	0.96	1.00	0.76	0.75	0.82	0.95	0.91	524.27	0.88
SD	0.10	0.11	0.13	0.10	0.11	0.09	0.02	0.13	0.18	0.08	0.09	0.15	248.77	0.05

Table 3. Efficiency results with DEA CRS, DEA VRS, and SFA methods for 12 input criteria.

City	DEA CRS		DEA VRS		SFA
City	Score Value	Rank	Score Value	Rank	SFA	Rank
City 1	0.716	10	1	1	1.000	1
City 2	0.845	8	0.929	18	0.878	10
City 3	0.933	6	1	1	1.000	1
City 4	0.588	13	0.999	15	1.000	1
City 5	0.505	15	1	1	1.000	1
City 6	1.000	1	1	1	1.000	1
City 7	1.000	1	1	1	0.582	15
City 8	0.631	12	1	1	0.649	14
City 9	0.838	9	1	1	0.883	9
City 10	1.000	1	1	1	0.676	12
City 11	0.960	5	1	1	1.000	1
City 12	0.277	18	0.999	17	0.419	19
City 13	0.203	20	0.999	15	0.505	16
City 14	0.559	14	1	1	1.000	1
City 15	1.000	1	1	1	1.000	1
City 16	0.245	19	0.269	20	0.358	20
City 17	0.501	16	1	1	0.658	13
City 18	0.324	17	0.339	19	0.451	17
City 19	0.674	11	1	1	0.425	18
City 20	0.885	7	1	1	0.759	11
Average	0.684		0.927		0.762
Max	1.000		1		1.000
Min	0.203		0.270		0.358
St Dev	0.275		0.011		0.241

Table 4. The classification results obtained by the NB, DT, and AB classifiers based on the DEA CRS as the strictest score.

Weight. Avg.	TP Rate	FP Rate	Precision	Recall	F1-Score	MCC	AUC ROC	AUC PRC
NB	0.7	0.825	0.622	0.7	0.659	−0.167	0.438	0.667
RT	0.650	0.838	0.612	0.650	0.630	−0.210	0.406	0.657
AB	0.7	0.825	0.622	0.7	0.659	−0.167	0.703	0.807

Table 5. Feature selection results obtained by the three suitable feature selection algorithms.

Algorithm	F1	F2	F3	F4	F5	F6	F7	F8	F9	F10	F11	F12
RF		*	*		*	*		*	*	*	*	*
PC	*	*	*	*	*	*	*	*	*
CA	*	*	*	*	*	*	*	*	*	*	*	*
Proposed algorithm		*	*		*	*		*	*

* indicates that ba specific attribute is significant in specific method.

Table 6. The final classification results obtained by the AB classifier based on the DEA CRS and the strictest score.

Weight. Avg.	TP Rate	FP Rate	Precision	Recall	F1-Score	MCC	AUC ROC	AUC PRC
AB	0.75	0.625	0.725	0.750	0.736	0.140	0.891	0.895

Table 7. Efficiency results with DEA CRS, DEA VRS, and SFA methods using 6 criteria.

City No.	DEA CRS		DEA VRS		SFA
City No.	Score Value	Rank	Score Value	Rank	SFA	Rank
City 1	0.676	10	0.932	17	0.885	7
City 2	0.845	7	0.987	14	0.606	12
City 3	0.754	9	0.936	16	0.897	6
City 4	0.588	13	0.911	18	0.67	14
City 5	0.505	15	1	1	0.561	16
City 6	1	1	1	1	0.878	8
City 7	1	1	1	1	1	1
City 8	0.631	12	1	1	1	1
City 9	0.787	8	0.986	15	0.685	11
City 10	1	1	1	1	0.681	13
City 11	0.96	5	1	1	0.993	3
City 12	0.275	18	1	1	0.205	20
City 13	0.203	20	1	1	0.37	17
City 14	0.559	14	0.99	13	0.959	5
City 15	1	1	1	1	0.978	4
City 16	0.245	19	0.859	19	0.224	19
City 17	0.465	16	0.993	12	0.634	15
City 18	0.324	17	0.845	20	0.226	18
City 19	0.674	11	1	1	0.821	9
City 20	0.885	6	1	1	0.817	10
Average	0.669		0.972		0.704
Max	1.000		1.000		1.000
Min	0.203		0.262		0.205
St Dev	0.267		0.488		0.270

Table 8. Comparison of city efficiency scores: standard vs. proposed method.

City	DEA CRS				DEA VRS					SFA
City	Score Value		Rank		Score Value		Rank		Score Value	Rank
City 1	0.716	0.676	10	10	1	0.932	1	17	1.000	0.885	1	7
City 2	0.845	0.845	8	7	0.929	0.987	18	14	0.878	0.606	10	12
City 3	0.933	0.754	6	9	1	0.936	1	16	1.000	0.897	1	6
City 4	0.588	0.588	13	13	0.999	0.911	15	18	1.000	0.67	1	14
City 5	0.505	0.505	15	15	1	1	1	1	1.000	0.561	1	16
City 6	1.000	1	1	1	1	1	1	1	1.000	0.878	1	8
City 7	1.000	1	1	1	1	1	1	1	0.582	1	15	1
City 8	0.631	0.631	12	12	1	1	1	1	0.649	1	14	1
City 9	0.838	0.787	9	8	1	0.986	1	15	0.883	0.685	9	11
City 10	1.000	1	1	1	1	1	1	1	0.676	0.681	12	13
City 11	0.960	0.96	5	5	1	1	1	1	1.000	0.993	1	3
City 12	0.277	0.275	18	18	0.999	1	17	1	0.419	0.205	19	20
City 13	0.203	0.203	20	20	0.999	1	15	1	0.505	0.37	16	17
City 14	0.559	0.559	14	14	1	0.99	1	13	1.000	0.959	1	5
City 15	1.000	1	1	1	1	1	1	1	1.000	0.978	1	4
City 16	0.245	0.245	19	19	0.269	0.859	20	19	0.358	0.224	20	19
City 17	0.501	0.465	16	16	1	0.993	1	12	0.658	0.634	13	15
City 18	0.324	0.324	17	17	0.339	0.845	19	20	0.451	0.226	17	18
City 19	0.674	0.674	11	11	1	1	1	1	0.425	0.821	18	9
City 20	0.885	0.885	7	6	1	1	1	1	0.759	0.817	11	10
Average	0.684	0.669			0.927	0.972			0.762	0.704
Max	1.000	1.000			1	1.000			1.000	1.000
Min	0.203	0.203			0.270	0.262			0.358	0.205
St Dev	0.275	0.267			0.011	0.488			0.241	0.270

Table 9. Comparison of city efficiency classifications’ quality using the proposed method.

Weight. Avg.	TP Rate	FP Rate	Precision	Recall	F1-Score	MCC	AUC ROC	AUC PRC
AB	0.7	0.825	0.622	0.7	0.659	−0.167	0.703	0.807
AB	0.75	0.625	0.725	0.750	0.736	0.140	0.891	0.895

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kemiveš, A.; Barjaktarović, L.; Ranđelović, M.; Čabarkapa, M.; Ranđelović, D. Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model. Mathematics 2024, 12, 1020. https://doi.org/10.3390/math12071020

AMA Style

Kemiveš A, Barjaktarović L, Ranđelović M, Čabarkapa M, Ranđelović D. Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model. Mathematics. 2024; 12(7):1020. https://doi.org/10.3390/math12071020

Chicago/Turabian Style

Kemiveš, Aleksandar, Lidija Barjaktarović, Milan Ranđelović, Milan Čabarkapa, and Dragan Ranđelović. 2024. "Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model" Mathematics 12, no. 7: 1020. https://doi.org/10.3390/math12071020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessing the Efficiency of Foreign Investment in a Certification Procedure Using an Ensemble Machine Learning Model

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. BFC Process Efficiency Measurement in the Republic of Serbia

3.2. Methods

3.2.1. DEA Method

3.2.2. SFA Method

3.2.3. Classification Process

3.2.4. Proposed Ensemble Method

4. Results

4.1. Data Preparation

4.2. Preliminary Efficiency Evaluation

4.3. Preliminary Classification Evaluation

4.4. Feature Selection

4.5. Final Classification Evaluation

4.6. Final Efficiency Evaluation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI