1 Introduction

The hegemonic idea of a country's progress was related to economic development, which is the historical and systematic process of productivity growth. However, despite promoting improvement in people's living conditions, economic development does not guarantee a fairer society.

According to the United Nations Development Programme (UNDP 2020a), development must be based on what is happening to people; this view gave rise to the concept of human development. The human development approach emerged as an attempt to reallocate human beings at the center of the discourse and actions related to development (Gor and Guital, 2010). Therefore, from this perspective, the central concern changed from how much is being produced to how it is affecting people's lives (UNDP 2020a).

The human development concept is based on the capability approach, which was developed by the Nobel Prize winner Amartya Sen. In this approach, freedom is understood as the end and the primary means for development to occur, so that at the same time that freedom generates development, it is also that development itself (Sen, 2000). According to Sen (2000), human development is the process of expanding the freedoms that people enjoy, expanding their capacity to carry out freely chosen and valued activities.

In 1990 the UNDP officially adopted the capabilities approach for defining a country's development. Since then, the entity has been spreading this concept through the Human Development Reports (HDR). Following the capability approach's assumptions, several indicators that adopt a multidimensional perspective, also called composite indexes (CIs), were proposed in the HDR.Footnote 1

The Human Development Index (HDI), created in 1990 by Mahbub Ul Haq, is the most famous of these CIs. One of the main advantages of the HDI is its simplicity, since it is based on few dimensions (income, health and education) and uses simple construction methods (basically, averages). However, the same simplicity that made the HDI popular has resulted in several criticisms, requiring a number of methodological changes over the past 25 years (Morse, 2014). For example, in 2010, when one switched to using the geometric average to calculate the HDI, with equal weights.

The HDI, as is true of any CI, is subject to arbitrariness, due to the subjective choices made during its construction process. According to Booysen (2002), the construction of a CI involves five steps—selection, normalization, weighting, aggregation, and validation, and there is no "gold standard" to optimize the choices made during each one. In this sense, the entire CI is usually loaded with arbitrariness and subjectivity.

In this context, the Data Envelopment Analysis (DEA) technique has been used as an alternative strategy for solving issues associated with CIs. DEA is especially useful and presents several advantages related to the normalizing, weighting (mainly), and aggregation of CI construction steps (Cherchye et al., 2007).

DEA is a technique based on linear programming proposed by Charnes et al. (1978)—CCR—in order to determine the efficiency of decision-making units in transforming a set of inputs into a set of outputs. Over the years, several DEA models have been proposed, modifying the original hypotheses of the CCR model, such as: the Variable Return of Scale model (VRS or BCC) (Banker et al., 1984), the Additive model (Charnes, 1985), the Multiplicative model (Charnes et al., 1983), the Slack Based Measure (SBM) (Tone, 2001) and the Range Adjusted Model (RAM) (Aida et al., 1998). These models, in addition to offering efficiency, also determine the relative contribution of variables and targets that enable the units analyzed to become more efficient. Furthermore, extensions can be added in DEA models, such as weight restrictions, tie-breaking methods, and two-stage approaches (that use DEA results as inputs). For this reason, DEA models have been used for human development issues in several studies.

Mariano et al. (2015) highlighted the gaps in the human development literature using DEA models. According to the authors, among the articles that used DEA for the analysis of human development, some addressed the concept of social efficiency—efficiency in the conversion of economic inputs into human development (e.g. Mariano & Rebelatto, 2014); others addressed the construction of CIs—aggregation of multiple indicators in a single index; and a recent study combines these two approaches in the same index (Ferraz et al., 2020). Further, according to Mariano et al. (2015), DEA can be used in terms of CI construction in two ways, namely: (a) the Benefit of the doubt (BoD) approach, in which only desirable attributes are considered (e.g., Mahlberg & Obersteiner, 2001); and (b) that based on the simultaneous treatment of undesirable (input), and desirable (output) attributes (e.g., Hashimoto et al. 2009).

The BoD approach proposes the construction of CIs using DEA, making all the units compared adopt a constant input equal to 1. The BoD approach was proposed by Melyn and Moesen (1991) and analyzed in detail by Cherchye et al. (2007). The DEA-BoD technique may be used due to the fact that HDI only presents desirable outputs.

The main difference between the HDI measured by BoD and its original form is that the HDIBoD adopts the most advantageous weights for each country analyzed and the original HDI adopts equal weights (Bougnol, 2010). Thus, the HDIBoD is a perspective of comparison between countries, provinces or regions, in which strengths are highlighted, while weaknesses are less taken into account. In short, BoD based CIs have three characteristics: the weights adopted for each indicator vary from unit to unit; the weights adopted are the most advantageous for each unit; and the index obtained is always relative to the units analyzed, so that the unit with the best performance will always have a CI equal to 1 (Ramanathan, 2006). The BoD also has two other advantages: it allows variables to be used without normalization, eliminating the need to include more subjectivity in the HDI construction process; and it provides, in addition to the CI, information that are useful to calculate the relative contribution of each variable, and the absolute and relative targets of each country.

According to the BoD approach, each country must adopt a different set of countries, called benchmarks, as reference. The number of times a country has served as a benchmark can be used to rank its level of importance. It is also possible to group the countries that have the same reference set (clustering tool). Both analyzes can be improved by integrating DEA with Social Network Analysis (SNA).

SNA use is possible because the link between a country and its benchmark can be treated as a network. In this sense, SNA presents several analytical advantages, such as: it allows a better visualization of the performance data of countries, and it measures and illustrates the centrality of benchmarks. The benchmarks of a country are a set of high-level human development countries with the characteristics most similar to it, and serving as a guide for the possible improvement of its own performance level. However, the connection between a country and its benchmark is not based on any real link; it is just a virtual link between a country and the target it must achieve.

The first application of DEA-BoD in HDI indicators dates back to the early 2000s (Mahlberg & Obersteiner, 2001). Since then, several applications have followed, although most of them have underutilized the considerable range of analyzes made possible by this tool, as evidenced by the 20 gaps raised in the work of Mariano et al. (2015). Despite this burgeoning literature, there is a lack of studies analyzing the differences among DEA techniques in human development. The research problem to be addressed in this study is the lack of systematic work addressing its advantages and disadvantages, and the main possibilities of applying different approaches to DEA in human development indicators. To fill this gap, this study aims to compare, using SNA and information derived from the technique itself, multiple DEA approaches to readjust, expand, and analyze the human development index of 189 countries taken from the UNDP database in 2018.

2 Literature Review

We developed the literature review using a search in the Scopus database on October 8, 2020. We used the keywords "data envelopment analysis" and "human development index", which brought 47 articles in this field. Thus, we filtered these articles by reading titles and abstracts, and 15 articles were selected. This filtering process excluded all articles whose proposal was not to use the DEA to recalculate the HDI, specifically: articles that did not use the BoD approach and whose objective was to assess some type of social efficiency; articles that used sub-indicators of the HDI in other CIs; articles that only cited the HDI in the abstract, but did not address the theme of CI construction; and articles that were not written in English. To these 15, the pioneering article by Mahlberg and Obersteiner (2001) was added—since although it was not found within the Scopus database, it is considered of high relevance to the theme.

Mahberg and Obersteiner (2001) used the BoD model to build an alternative indicator to the Human Development Index (HDI). Raw data from 1998 of 174 countries were used to compare: (a) the traditional HDI (equal weights); (b) the HDIBoD (most advantageous weights); and (c) the HDIBoD with restrictions to the variables relative contribution (semi-variable weights). Concerning the HDIBoD, 32 countries had an index equal to one, among which were countries with a low HDI, such as Lithuania, Kazakhstan, Latvia, Uzbekistan, and Tajikistan. In the HDIBoD with weight restrictions, on the other hand, the authors did not find ties. The correlation between HDIBoD with weight restrictions and HDI was high. However, while the country with the highest HDI was Canada, Luxembourg had the most outstanding performance in the weight-constrained HDIBoD.

Despotis (2005a) used the DEA-BoD in the normalized world HDI data from 2000 and found that the countries with the highest HDIBoD were Canada, Japan, Australia, Sweden, Belgium, the United Kingdom, Luxembourg, Finland, Norway, the United States of America, and Iceland. Using the same approach in only 27 Asian and the Pacific countries, Despotis (2005b) determined that the group with the highest HDIBoD was formed by Hong Kong, Singapore, and South Korea.

Although the BoD is useful for investigating extreme cases, Despotis (2005a, b) argued that this approach would not be suitable for constructing rankings, which should preferably be based on the use of a set of common weights. As a solution to this problem, a second stage multi-objective programming model was proposed to determine the single set of weights that maximizes the average efficiency of the units analyzed. It is worth mentioning that the use of common weights highlighted Canada, in the analysis of Despotis (2005a), and Hong Kong, in the analysis of Despotis (2005b), as the countries with the highest HDIBoD.

In a similar approach, Lee et al. (2006) proposed a DEABoD model based on Fuzzy logic, which also worked with common weights. With this model, the authors evaluated the same group of 27 countries as Despotis (2005b), reaching the same result, and highlighting Hong Kong as the country with the most exceptional human development.

Despotis et al. (2010) revisited their previous work, applying a DEA model with non-linear outputs to determine a worldwide HDI. Their model was specifically developed to deal with the income, whose normalization is performed in a non-linear way, as it presents a decreasing return on human development. Despite the methodological improvement, the results obtained in this work were very close to the work of Despotis (2005a), in which normalized data were used.

Boulgnol et al. (2010) proposed an alternative model to determine the HDIBoD, with the presence of a scaling factor. The use of a scaling factor meant the authors could use this model with direct weight restrictions (Dyson & Thanassolis, 1988) to assess 15 countries intentionally selected in 2005. Boulgnol et al. (2010) also cluster the countries using the “onion method “of Barr et al. (2000), obtaining four different clusters. The onion method is based on successive applications of DEA where, after each application, the benchmarks are taken out of the sample, forming a cluster. The cluster with the greatest human development was made up of Luxembourg, Norway, Iceland, and Australia.

Zhou et al. (2010) proposed a multiplicative BoD model, which was in line with the new HDI calculation method. They also combined their multiplicative model with the inverted frontier approach proposed by Zhou et al (2007). The term inverted frontier is motivated because in this approach the units adopt the frontier formed by the worst performing units (called anti-benchmarks) as reference. However, contrary to what Zhou (2007) stated, the inverted frontier approach does not guarantee the use of the least advantageous weights for each country (Entani et al. 2002; Athanassoglou 2016). Zhou et al (2007) approach combines the normalized inverted HDIBoD and the normalized HDIBoD in the same index using an arithmetic mean.Footnote 2.With this new model, accompanied by weight restrictions (Wong & Beasley, 1990), Zhou et al. (2010) evaluated the HDI of the same set of 27 countries analyzed by Despotis (2005b), identifying Hong Kong, Singapore, South Korea, Brunei and Malaysia as benchmarks.

Following these studies, Toffalis (2013) integrated the common weight approach and the multiplicative BoD to determine the HDI of the countries studied. His approach, however, was based on linear regression to determine the common weights. The countries with the best index were Japan, Australia, Hong Kong, Switzerland, and Norway.

Dominguez-Serrano and Blancas (2011) integrated the inverted frontier approach of Zhou et al (2007) and the common weights approach of Despotis (2005a, b) to determine, separately, the HDI of men and women. Their model was used to assess 27 European countries, highlighting the Netherlands as the best country regarding men and Spain as the best country regarding women.

Hatefi and Torabi (2010, 2018) proposed a two-stage model to determine the single most advantageous set of weights for all countries on average, which was used to recalculate the HDI and the sustainable energy index. Their model is equivalent to the particular case of Despotis’ (2005a, b) model—with the parameter “t” equals 0Footnote 3. In Hatefi and Torabi (2010) the model was proposed and used to recalculate the HDI of Asian and Pacific countries. Hong Kong was the country that stood out the most. Hatefi and Torabi (2018), on the other hand, presented the dual of the previous model to determine targets for low HDI countries.

Alves et al. (2016) analyzed the HDI of 187 countries in 2015. The authors used three BoD models (traditional, SBM and multiplicative models), two extensions (inverted frontier and common weights approaches) and two types of data (raw and normalized). They also tested the inclusion of fictitious countries in the sample.

Van Puyenbroeck (2018) argued that the original BoD formulation, which is based on the input-oriented DEA-CCR model, has no practical significance, as it represents a reduction in the dummy input necessary for a country to become a benchmark. To overcome this limitation, the author proposed a new BoD formulation, based on the output-oriented CCR model, and applied it to evaluate the female HDI of 19 municipalities in the Brussels region.

A relatively recent group of studies about the HDIBoD has analyzed the differences between groups of countries. Rogge (2018a) applied a recent extension of the DEA to determine a region's aggregate HDI. The author tested his model on data from 27 countries in Europe and determined that the region's pooled HDI was 0.9230. Van Puyenbroeck and Rogge (2020) used a derivation of the BoD model, called the "Model of global boundary differences", with the use of weight restrictions, to analyze the difference in the level of human development in 6 regions worldwide. Their results indicated that the regions/groups with the best human development were, in decreasing order: (a) OECD, (b) the Arab States, (c) Asia and the Pacific, (d) Latin America and the Caribbean, (e) South Asia and (f) Sub-Saharan Africa.

Finally, Rogge (2018b) extended the BoD model in two stages, based on index numbers proposed by Van Puyenbroeck and Rogge (2017), to incorporate different types of averages. He used several variations of this model to recalculate the HDI of the countries studied. His results showed that the countries that are most sensitive to the type of aggregation used are those in the middle of the traditional HDI ranking.

3 Method

The first step of this research involves the tabulation of HDI data in its raw form –extracted directly from the UNDP website—and normalized form—calculated following the rules of the HDI technical notes (UNDP 2020b), expressed in Table 1. The raw data refers to the following sub-indicators: life expectancy at birth (LEB), Gross National Income per capita (GNIpc) and the average of the mean years of schooling (MYS) and the expected years of schooling (EYS). The normalized data refers to: health index—linear normalization of LEB; education index—mean of the linear normalization of MYS and EYS; and income index—logarithmic normalization of GNI per capita, which is used to minimize the considerable difference in this indicator that exists between countries (UNDP 2020b). In addition, the values of EYS and GNIpc were limited, respectively, to 18 years and $ 75,000; so that any addition in these variables beyond these values does not count for HDI.

Table 1 HDI data normalization method

Regarding the effect of normalization, it is important to mention that the BoD models have a scale invariance property (Cooper et al., 2007), whereby the sub-indicators can be multiplied or divided by any value, without altering the CI result. Almost all of these models, however, are not invariant to translation (adding or subtracting a value) or the application of logarithms or the establishment of a threshold for the variables. The only exception is the Range Adjusted Model (RAM), which is also invariant to translation. So, the use of raw and normalized HDI data could generate different findings in DEA models.

In the second step, we carried out a statistical analysis of the sub-indicators, both in their raw and normalized forms. This analysis was essential to understand the results obtained after the construction of the indexes, because CIs reflect the patterns of the aggregated variables. In this step, we used descriptive statistics and outlier analysis.

In the next step, we measured the new CIs using several models and extensions of DEA on the raw and normalized HDI data. All the models were implemented using the R software.

Finally, an exhaustive analysis step was necessary so that the vast range of data obtained could be transformed into useful results, both from the indexes and the human development of the countries chosen. Social Network Analysis (SNA) was used to build the two-mode network between the countries and their benchmarks, allowing to graph the countries and to determine the standardized degree centrality of each benchmark. NetMiner software was used to apply SNA tools.

The standardized degree centrality is the division between the number of edges of a benchmark \((l)\) and the maximum number of edges it could have made (Wasserman & Faust, 1994). The maximum number of edges, on the other hand, is the number of countries \((n)\) minus the number of benchmarks \((b)\), as the benchmarks cannot be linked together (Expression 1).

$$ Standard\;Degree = \frac{l}{n - b} $$
(1)

In addition, calculations derived from the DEA were set as the benchmarks of each country, the relative contributions of the variables, and the relative and absolute target for each country to become a benchmark. With this information, we clustered countries based on the respective benchmarks. In this approach, proposed by Po et al. (2009) and critically analyzed by Krüger (2010), all countries with the same set of benchmarks were grouped in the same cluster, potentially having many characteristics in common (e.g. the relative contribution).

4 BoD Models and Extensions

The CIs addressed in this work are based on the results of different DEA modelsFootnote 4 and extensionsFootnote 5 applied to the BoD approach. Thus, the following approaches were used:

  1. (a)

    Different DEA-BoD models in the multipliers and envelopment form such as:

  1. a.

    Traditional BoD—proposed by Melyn and Moesen (1991); the model used in this work was derived from the output-orientedFootnote 6 CCR model (Charnes et al., 1978);

  2. b.

    Multiplicative BoD—developed by Toffalis (2003)Footnote 7 based on the Multiplicative DEA model (Charnes et al. 1983).

  3. c.

    SBM-BoD—derived from the output-oriented Slack Based Measure (SBM) model (Tone, 2001); and

  4. d.

    RAM-BoD—used by Zhou et al (2017) based on the output-oriented Range Adjusted DEA Model (Aida et al. 1998)

Table 2 shows the formulation of the traditional, SBM, and RAM BoD in the multipliers and envelopment form. Table 3 shows the original and linearized multiplicative BoD formulation in the multipliers and envelopment form. To linearize the multiplicative model, it is necessary to apply natural logarithms to the objective function (OF) and restrictions.

Table 2 Traditional, SBM and RAM BoD
Table 3 Multiplicative BoD model

Depending on the model used, the HDIBoD of a country “0” should be calculated by one of the alternatives presented in Expressions 2, 3, 4 and 5:

$$ Traditional\;- BoD: HDI_{BoD} = \frac{{w_{I} }}{v}.I_{0} + \frac{{w_{E} }}{v}.E_{0} + \frac{{w_{H} }}{v}.H_{0} = \frac{1}{\eta } $$
(2)
$$ Multiplicative\;- BoD: HDI_{BoD} = v \cdot I_{0}^{{w_{I} }} \cdot E_{0}^{{w_{E} }} \cdot H_{0}^{{w_{H} }} = \theta $$
(3)
$$ SBM\;- BoD: HDI_{BoD} = \frac{1}{{v - w_{I} \cdot I_{0} - w_{E} \cdot E_{0} - w_{H} \cdot H_{0} }} = \frac{1}{{1 + \frac{1}{3} \left( {\frac{{S_{I} }}{{I_{0} }} + \frac{{S_{E} }}{{E_{0} }} + \frac{{S_{H} }}{{H_{0} }}} \right)}} $$
(4)
$$ RAM\;- BoD: HDI_{BoD} = v + w_{I} \cdot I_{0} + w_{E} \cdot E_{0} + w_{H} \cdot H_{0} = 1 - \frac{1}{3} \left( {\frac{{S_{I} }}{{R_{I} }} + \frac{{S_{E} }}{{R_{E} }} + \frac{{S_{H} }}{{R_{H} }}} \right) $$
(5)

The models also require, in addition to the HDI value, the benchmarks of each country, the relative contribution of the variables (level of importance of each indicator), the absolute target (value to be achieved) and the relative target (percentage of necessary increase) for a country to become a benchmark. The benchmarks of a country are all those in which the variable \({\lambda }_{k} ,\) obtained through the BoD models, is different from zero. To exemplify the determination of the other measures, Table 4 presents its calculation method considering the Income of a country “0” in each model presented.

Table 4 Target, relative target and relative contribution of income in BoD models

For all models, we have:

θ: HDIBoD

η: Inverse of HDIBoD

Ik, Ek, Hk: Income, Education and Health of a country k;

I0, E0, H0: Income, Education and Health of the country under analysis;

wI, wE, wH: Weight of the Income, Education and Health;

v: Independent weight (scaling factor)

SI, SE, SH: Slacks of the Income, Education and Health;

RI, RE, RH: The range of the Income, Education and Health of the countries in the sample;

λk: Importance level of benchmark “k” for the target of the country under analysis;

n: Number of countries analyzed;

ε: Non-archimedean number.

  1. (b)

    Models with restrictions for the sub-indicators relative contributionFootnote 8 – the approach of Van Puyenbroeck et al (2020) based on the Wong and Beasley (1990):

Most of the works on BoD that adopt some kind of weight restrictions used the Wong and Beasley (1990) approach, which imposes restrictions on the sub-indicators relative contribution. Sarrico and Dyson (2004), however, warned that this approach, by restricting only the sub-indicators relative contribution of the unit under analysis (country 0), could mean that the sub-indicators relative contribution of the other units, including benchmarks, do not respect the limits established, causing CI value bias. As a solution, the authors proposed a model that limited the sub-indicators relative contribution of all compared units. But these additional restrictions could leave the linear programming problem unsolved. To avoid this, Van Puyenbroeck et al. (2020) proposed a second-stage model, which limited the relative contribution of the unit under analysis and of all the benchmarks previously identified in the first stage (see Expression 6).

$$ Min \frac{1}{{HDI_{BoD} }} = v $$

Subject to:

$$ \begin{aligned} & & w_{I} \cdot I_{0} + w_{E} \cdot E_{0} + w_{H} \cdot H_{0} = 1 \\ & w_{I} \cdot I_{k} + w_{E} \cdot E_{k} + w_{H} \cdot H_{k} - v \le 0, \forall\, k \\ & w_{I} \cdot I_{0} \ge L \\ & w_{E} \cdot E_{0} \ge L \\ & w_{H} \cdot H_{0} \ge L \\ & w_{I} \cdot \left( {\mathbf{1} - L} \right) \cdot I_{k} - w_{E} \cdot L \cdot E_{k} - w_{H} \cdot L \cdot H_{k} \ge 0, \,for\, \forall k \in \; benchmarks \\ & - w_{I} \cdot L \cdot I_{k} + w_{E} \cdot \left( {\mathbf{1} - L} \right) \cdot E_{k} - w_{H} \cdot L \cdot H_{k} \ge 0, \,for\, \forall k \in \; benchmarks \\ & - w_{I} \cdot L \cdot I_{k} - w_{E} \cdot L \cdot E_{k} + w_{H} \cdot \left( {\mathbf{1} - L} \right) \cdot H_{k} \ge 0, \,for\, \forall k \in \; benchmarks \\ & v \ge \varepsilon \\ \end{aligned} $$
(6)

wherein:

Ik, Ek, Hk: Income, Education and Health of a country k;

I0, E0, H0: Income, Education and Health of the country under analysis;

\({w}_{I}, {w}_{E},{ w}_{H}\): Weight of the Income, Education and Health;

v: Independent weight (scaling factor).

n: Number of countries analyzed;

ε: Non-archimedean number.

L: Lower bound of the relative contribution of the indicators;

(c) Common weight approaches of:

  1. a.

    Despotis (2005a, b)—based on a second stage multi-objective programming model;

  2. b.

    Toffalis (2013)—based on linear regression (with an intercept equal to 0) of the result of traditional BoD in the function of income, education and health data; and

  3. c.

    Cross-evaluation – based on the approach of Dolly and Green (1994);

Expression 7 shows the Despotis (2005a, b) model, where parameter ‘t’ represents the distribution of the percentage of the objective function (OF), referring to the average deviation \((\sum_{k=1}^{n}\frac{{d}_{k}}{n})\) and the maximum deviation \((z)\) between CI with common weights and CI with traditional BoD.

$$ Min\; t \cdot \mathop \sum \limits_{k = 1}^{n} \frac{{d_{k} }}{n} + \left( {1 - t} \right) \cdot z $$

Subject to:

$$ \begin{aligned} & w_{I}^{'} \cdot I_{k} + w_{E}^{'} \cdot E_{k} + w_{H}^{'} \cdot H_{{k~}} + d_{k} = \theta _{k} ~{\forall}k \\ & d_{k} - z \le 0,~~\forall k \\ & w_{I}^{'} ,~w_{E}^{'} ,~w_{H}^{'} {\text{~}} \ge \varepsilon \\ \end{aligned} $$
(7)

wherein:

\(w_{I}^{\prime } ,w_{E}^{\prime } ,w_{H}^{\prime }\): Common weights of the variables Income, Education and Health;

Ik, Ek, Hk: Income, Education and Health of a country k;

\({\theta }_{k}\): HDIBoD of a country k (with traditional BoD);

n: Number of countries analyzed;

dk: Deviation between the index with common weights and with BoD of a country k;

z: Maximum deviation of the sample;

t: Parameter of Despotis’ approach;

Regarding the approach of Toffalis (2013), the CIs obtained from the linear regression are not contained in the range 0 to 1. So, it should necessarily be normalized (division by the highest value of the sample), so that the CI presents this property.

The cross-evaluation approach is based on the arithmetic mean of the CI of a country calculated with the most advantageous weights for all other units (that can be expressed in the form of a cross matrix): \({HDI}_{BoD}^{Cross}\) Using this approach is equivalent to determining the CI with a set of common weights corresponding to the average weight of all units. Thus, although it is often classified as a tiebreaker method, cross-evaluation also can be classified as a common weight approach (see Expression 8).

$$ Cross \;evaluation: HDI_{BoD}^{Cross} = \frac{1}{n} \mathop \sum \limits_{K = 1}^{n} \theta_{0}^{k} = \left( { \frac{{\mathop \sum \nolimits_{k = 1}^{n} \frac{{w_{I}^{k} }}{{v^{k} }}}}{n}} \right) I_{0} + \left( {\frac{{\mathop \sum \nolimits_{k = 1}^{n} \frac{{w_{E}^{k} }}{{v^{k} }}}}{n}} \right) E_{0} + \left( {\frac{{\mathop \sum \nolimits_{k = 1}^{n} \frac{{w_{H}^{k} }}{{v^{k} }}}}{n}} \right) H_{0} $$
(8)

wherein:

\( \theta _{0}^{k} \): HDI of a country 0 calculated with the most advantageous weights for the country k;

I0, E0, H0: Income, Education and Health of the country under analysis;

\({w}_{I}^{k}, {w}_{E}^{k}, {w}_{H}^{k}\): Most advantageous weights of the variables Income, Education and Health for the country k;

\({v}^{k}\): Most advantageous independent weight for the country k;

n: Number of countries analyzed;

To deal with the existence of multiple optimal weights in the traditional BoD, which can prove unviable in cross-evaluation, the Doyle and Green (1994) “aggressive formulation” was used (Expression 9), being a second stage model to obtain a unique set of weights for each country. The objective of the aggressive formulation is to find the optimal set of weights for one unit, and one which minimizes the average efficiency of the other units.

$$ Min\; w_{I} \cdot I_{0}^{Sum} + w_{E} \cdot E_{0}^{Sum} + w_{H} \cdot H_{0}^{Sum} $$

Subject to:

$$ \begin{aligned} & w_{I} \cdot I_{k} + w_{E} \cdot E_{k} + w_{S} \cdot H_{k} \le 1,\, \forall k \ne country\;in\;analysis \\ & w_{I} \cdot I_{0} + w_{E} \cdot E_{0} + w_{H} \cdot H_{0} = \theta_{0} \\ & w_{I} , w_{E} , w_{H} \ge \varepsilon \\ \end{aligned} $$
(9)

wherein:

\( \theta _{0}\): HDIBoD of the country under analysis (with traditional BoD);

\({I}_{0}^{Sum}, {E}_{0}^{Sum}, {H}_{0}^{Sum}\): Sum of income, education and health of all countries in the sample, except the country under analysis (country 0).

I0, E0, H0: Income, Education and Health of the country under analysis;

\({w}_{I}, {w}_{E}{, w}_{H}\): Weight of Income, Education and Health;

To calculate the HDI with the common weights obtained in the three approaches, it is necessary to use Expression 10:

$$ Common\;weights\;approach: HDI_{BoD} = w_{I}^{^{\prime}} .I_{0} + w_{E}^{^{\prime}} .E_{0} + w_{H}^{^{\prime}} .H_{0} $$
(10)

(d) Tiebreaker methods:

  1. a.

    Super-BoD – based on the super-efficiency approach proposed by Anderson and Petersen (1993);

  2. b.

    Inverted Frontier—proposed by Yamada et al. (1994) and Entani et al (2002);

  3. c.

    Multiplicative cross-evaluation—proposed by Mariano and Rebelatto (2014); and

  4. d.

    Triple index—also proposed by Mariano and Rebelatto (2014) and used by Sobreiro Filho et al. (2016) and Santana et al. (2015);

The super-efficiency approach was proposed by Andersen and Petersen (1993) to rank efficient units. However, from the BoD perspective, this approach is more adequately named the super-BoD approach -\(HDI_{BoD}^{Super}\). Unlike other tiebreaker methods, the application of super-BoD does not interfere with the ranking of units that are not benchmarks. The basic idea of the method is simply to exclude the constraint that limits the CI of the country analyzed to 1 (see Expression 11) from the linear programming model, which allows benchmarks to achieve CIs greater than 1.

$$ Min \frac{1}{{HDI_{BoD}^{Super} }} = v $$

Subject to:

$$ \begin{aligned} & w_{I} \cdot I_{0} + w_{E} \cdot E_{0} + w_{H} \cdot H_{0} = 1 \\ & - v + w_{I} \cdot I_{k} + w_{E} \cdot E_{k} + w_{H} \cdot H_{k} \le 0, \forall k \ne country\;in\;analysis \\ & w_{I} , w_{E} , w_{H} \ge \varepsilon \\ \end{aligned} $$
(11)

wherein:

Ik, Ek, Hk: Income, Education and Health of a country k;

I0, E0, H0: Income, Education and Health of the country under analysis;

\({w}_{I}, {w}_{E}{, w}_{H}\): Weight of the Income, Education and Health;

v: Independent weight (scaling factor).

n: Number of countries analyzed;

ε: Non-archimedean number.

The inverted frontier approach determines the CI using the distance of a country from the frontier of the worst practices (anti-benchmarks).Footnote 9 The result of this approach is the inverted HDI—\({HDI}_{BoD}^{Inv}\), in which a higher objective function value indicates worse relative performance by a country. Table 5 presents the inverted traditional BoD model in the multiplier and envelopment form.

Table 5 Inverted traditional BoD

Wherein:

θ: HDIBoD

Ik, Ek, Hk: Income, Education and Health of a country k;

I0, E0, H0: Income, Education and Health of the country under analysis;

\(w_{I} , w_{E} , w_{H}\): Weight of the Income, Education and Health;

v: Independent weight (scaling factor)

SI, SE, SH: Slacks of the Income, Education and Health;

\(\lambda_{k}\): Importance level of benchmark “k” for the target of the country under analysis;

n: Number of countries analyzed;

ε: Non-archimedean number.

After obtaining \(HDI_{BoD}^{Inv}\), calculation of a composite index can be made, with the result of the traditional BoD. Following the approach of Leta et al. (2005)—\(HDI_{BoD}^{CI\_Leta}\) this composite index is based on the normalized average result of the traditional frontier and one minus the result of the inverted frontier (Expression 12).

$$ { \rm{Leta\; et al.}}\, \left( {2005} \right)\,CI: HDI_{BoD}^{CI\_Leta} = \alpha \times HDI_{BoD} + \left( {1 - \alpha } \right)\times\left( {1 - HDI_{BoD}^{Inv} } \right) $$
(12)

Another way to use this approach is calculating the inverted index, which is the inverse and normalized value of the result obtained at the inverted frontier (Expression 13).

$$ Inverted\;index\,: HDI_{BoD}^{Inv\_Index} = \left( {\frac{1}{{HDI_{BoD}^{inv} }}} \right) $$
(13)

Zhou et al (2007) used the inverted index to build a composite index. However, they used rescaling normalization (based in maxima and minima) for each index component separately. The authors proposed a composite index resulting from the average between the normalized inverted index and the normalized HDIBod, as can be seen in Expression 14.

$$ {\rm{ Zhou\;et al. }}\,\left( {2007} \right)\,CI:$$
$$ HDI_{BoD}^{CI\_Zhou} = \alpha \times\frac{{HDI_{BoD} - Min\left( {HDI_{BoD} } \right)}}{{Max\left( {HDI_{BoD} } \right) - Min\left( {HDI_{BoD} } \right)}} + \left( {1 - \alpha } \right)\times\frac{{\left( {\frac{1}{{HDI_{BoD}^{Inv} }}} \right) - Min\left( {\frac{1}{{HDI_{BoD}^{Inv} }}} \right)}}{{Max\left( {\frac{1}{{HDI_{BoD}^{Inv} }}} \right) - Min\left( {\frac{1}{{HDI_{BoD}^{Inv} }}} \right)}} $$
(14)

wherein:

\(\alpha\): Weight of the HDIBoD in the composite index (value between 0 and 1);

\(Min()\): Minimum value of an indicator in the sample.

\(Max()\): Maximum value of an indicator in the sample.

The cross-evaluation approach admits some variations in its calculation method, for example: normalizing the index obtained, using a geometric mean, and not taking into account the most advantageous weights for the country itself. All of these changes were incorporated into the multiplicative cross-evaluation index-\(HDI_{BoD}^{MCross}\), as shown in Expression 15. It is noteworthy that, unlike cross-evaluation, this method cannot be classified as a common weights approach, since each country will adopt a different set of weights.

$$ Multiplicative\; cross\; evaluation: HDI_{BoD}^{MCross} = \sqrt[{n - 1}]{{ \mathop \prod \limits_{k = 1}^{n - 1} \theta_{0}^{k} }}, \; \forall k \ne country\; itself $$
(15)

wherein:

\(\theta_{0}^{k} \): HDI of a country 0 calculated with the most advantageous weights for the country k;

n: Number of countries analyzed;

Finally, the triple index—\(HDI_{BoD}^{Triple}\).—is based on the normalized geometric mean (Expression 16) of the approaches: traditional (more advantageous weights)—\(HDI_{BoD}\); inverted (little advantageous weights)—\(HDI_{BoD}^{Inv\_Index}\); and multiplicative cross-evaluation (cross-evaluation using a geometric mean without the more advantageous weights)—\(HDI_{BoD}^{MCross}\).

$$ Triple\; index\; BoD: HDI_{BoD}^{Triple} = \left( {HDI_{BoD} } \right)^{\alpha } *\left( {HDI_{BoD}^{Inv\_Index} } \right)^{\beta } *\left( {HDI_{BoD}^{MCross} } \right)^{\gamma } $$
(16)

wherein:

\(\alpha , \beta , \gamma\): Weight of each component of triple index, with: \(\alpha + \beta + \gamma = 1\);

One final detail about the tiebreaker methods is that the results of Expressions 12, 13, 15 and 16 are usually normalized with the division by the highest value of the sample (distance to group leader). This procedure ensures that the indexes are between 0 and 1.

5 Results and Discussion

Before presenting the results of the BoD models, Table 6 presents the primary statistical information and the outliers referring to the raw and normalized data related to the variables of HDI. This information is essential for understanding the difference between the results of the models. We consider outlier countries with indicators bigger than\(: Q3 + 2*IQR\), where Q3 is the third quartile and IQR is the Interquartile range.

Table 6 Statistics for the raw and normalized indicators

Table 6 shows that the raw variables become more homogeneous with normalization; GNI per capita was the indicator that showed the highest variability, being the only one in which the standard deviation was greater than the average; the life expectancy at birth presented low variability with no outlier.

It is noteworthy that seven countries were outliers in per capita income, all of which have in common that they are wealthy and have a small population. Given that many of the models presented in the next sections are sensitive to outliers, these countries will occupy the top positions in the rankings of many models.

In the next sections, we present the results of the following BoD models (Traditional, Multiplicative, SBM, RAM) and extensions: (with weight restriction, with common weights, and with tiebreak methods).

5.1 Traditional HDIBoD

5.1.1 Raw Variables

Initially, the application of the traditional BoD model in non-normalized HDI indicators is presented. When each country was allowed to adopt the weights that were most favorable to them in the variables without normalization, a group of five countries reached HDIBoD equal to 1 (benchmarks), namely: Australia, Hong Kong, Singapore, Liechtenstein, and Qatar.

Hong Kong has the longest life expectancy at birth of the sample group (84.1 years), although it is not high enough to be an outlier; Australia has the most significant educational variable in the sample (17.9 years), and it is considered an outlier; Singapore ($ 82,503), Liechtenstein ($ 97,336) and Qatar ($ 116,818) are positive outliers in GDI per capita. The conclusion drawn is that the five benchmarks were the countries that each performed well in one specific data dimension.

Figure 1 shows the two-mode network representing the connections between countries (circles) and their benchmarks (squares). The five clusters in the figure (identified by the colour of the circles) are based on the benchmarks of the countries.

Fig. 1
figure 1

Network of the traditional BoD with raw data

Table 7 presents the number of links and the standardized degree centrality referring to the five identified benchmarks.

Table 7 Analysis of benchmarks of the traditional BoD and raw data

As can be seen, Hong Kong served as a benchmark for almost all the countries analyzed, except for Norway (cluster A), which was the only country to adopt Liechtenstein as a benchmark. It should also be noted that Singapore, which has a high income, served as a benchmark for only nine countries, all of which have high incomes (present in clusters A, B, and C). On the other hand, Qatar is isolated in the network since it was not a reference for any country.

Analyzing the relative contribution of each variable, health had the most significant impact on the HDIBoD with raw values. On average, considering only non-benchmark countries, it was found that while the health relative contribution was 98.4%, the education relative contribution was 1.4%, and the income relative contribution was 0.2%. The great contribution of life expectancy occurs due to its low variability, as the model tends to assign high weights to variables where the distance between the best and worst performance is not as high (i.e., life expectancy), and low weights for variables where the distance between the top and bottom is greater (i.e., GNI per capita).

Norway was the only country in cluster A that had the most balanced relative contribution between the three variables analyzed. This result explains why this country achieved first position in the HDI ranking. Of the eight countries whose income relative contribution was around 3%, four (Ireland, United States, Switzerland, and Saudi Arabia) present an education relative contribution of 7.5%, and the other four countries (Brunei, United Arab Emirates, Luxembourg, and Kuwait) present an education relative contribution of 0%. It should be noted that the first four countries are part of cluster B, which adopted Australia (the highest educational index), Hong Kong, and Singapore as benchmarks, and the other four are part of cluster C, which did not adopt Australia as a benchmark. For the 40 countries in cluster D (which also adopted Australia as a benchmark), the education relative contribution was 5%, and the GNI per capita relative contribution was zero. Finally, in the 135 countries in cluster E (which did not adopt Australia as a benchmark), health contributed with 100% of the index.

Table 8 summarizes all this information regarding the clustering of countries, showing the countries and their respective benchmarks, as well as the average and amplitude (in parentheses) of the relative contributions presented by the countries in each cluster. As can be seen, the relative contributions presented by the countries of the clusters were quite homogeneous.

Table 8 Clustering of countries based in traditional BoD and raw data

Analyzing the other DEA information, we found income slacks in 170 non-benchmark countries, education slacks in 134 non-benchmark countries, and no health slacks. In terms of magnitude, the average income slack was $43,191.42 and the average education slack was 2.71 years.Footnote 10 The slack analysis allows us to conclude that there is an imbalance in the behavior of the three variables used in the HDI. Given that countries are closer to the top in health than in income and education, it was expected that the weight of the HDIBoD of all countries would be concentrated on this variable, and that everything lacking for a country to reach the benchmark in terms of income and education treated as slack. There is no coincidence, therefore, that the slacks are inversely proportional to the relative contributions of the indicators.

Finally, there are also relative targets for each non-benchmark country, which represent the percentage of required increase in each indicator for a country to become a benchmark. If there were no slack, a country's relative target would be precisely the inverse of its HDIBoD; the slacks, however, cause the relative targets to vary from indicator to indicator. Corroborating everything that has been argued, it appears that for countries to become benchmarks, they must, on average, increase health by 18.3%, education by 45.0% and income by 1,144.1%. Table 9 summarizes this information.

Table 9 Average relative contributions, slacks and relative targets of the non-benchmark countries in traditional BoD with raw data

The main conclusions that could be drawn from applying the traditional HDIBoD to the raw variables were: (1) an HDIBoD equal to 1 was reached by the countries that stood out the most in each variable alone; (2) the BoD model is susceptible to the presence of outliers (e.g. the countries with small populations), which tend to occupy the highest values in the index, starting to serve as a benchmark for other countries; (3) the fact that countries are closer to the top in life expectancy than in the other variables caused most of them to adopt only Hong Kong as a benchmark, which has the longest life expectancy, but is not an outlier; and (4) this same reason caused countries to concentrate all the weight of the HDIBoD on life expectancy, neglecting the other variables.

5.1.2 Normalized Variables

When the BoD model was applied to the normalized variables, some changes took place, such as the fact that Germany, Switzerland, and Norway became benchmarks and Qatar, Liechtenstein and Brunei (outliers in GNI per capita) became false benchmarks (units with an indicator equal to 1 and slack). The three countries mentioned above that became benchmarks, also achieved prominence in the original HDI, since they have an excellent balance in performance across the three aggregated indicators. This shows that the use of normalized data makes the indicator less likely to favor only outliers.

Figure 2 illustrates the network map formed by the connection between countries and their benchmarks, in which it is possible to identify 12 clusters (A to L), in contrast to the five clusters obtained with raw data. The most numerous of these clusters is cluster L, which groups all countries that link only to Hong Kong (which has the longest life expectancy in the sample).

Fig. 2
figure 2

Network of the traditional BoD with normalized data

Table 10 shows the links and the standardized degree centrality of the six benchmarks identified.

Table 10 Analysis of benchmarks in traditional BoD and normalized data

We found some patterns among the analyzed clusters. Cluster L concentrates countries with a 100% weight on life expectancy. Note that this cluster adopted Hong Kong as a benchmark, and it is similar to the cluster using raw data. Cluster G is the second biggest cluster, and it concentrates countries with weights on life expectancy (83.8%) but it adopted Hong Kong and Australia as benchmarks. Cluster J, composed of Belarus, Fiji, Georgia, Latvia, Lithuania, Mongolia, and Palau, concentrates weights on education, and it uses Germany as a benchmark. Cluster I is composed of the three false benchmarks and eight other countries, which attributed 100% of their weight to income; this cluster adopted Singapore as a benchmark. Cluster K, composed of Botswana, Saudi Arabia and Trinidad and Tobago, concentrates weights on income (86.8%), and adopted Singapore and Norway as benchmarks.

Finally, Denmark and Ireland (cluster A), Belgium, Finland, the Netherlands, Sweden, Hungary, Lesotho and Slovakia (cluster C), and Seychelles (cluster F) presented three benchmarks each. These clusters show a better weight distribution among the three dimensions. Note that cluster A presents Norway, Australia, and Germany as benchmarks, and concentrates weights on education (55.5%). Cluster C presents Norway, Australia and Switzerland as benchmarks, and concentrates weights on health (55.4%). Cluster F presents Norway, Switzerland and Singapore as benchmarks, and concentrates weights on income (56.9%). Table 11 summarizes these results.

Table 11 Clustering of countries in traditional BoD with normalized data

Table 12 shows the relative contribution average for each sub-indicator. This table also shows the number of countries with slacks, the slack average, and the average of the relative target of the non-benchmark countries, when the traditional BoD model was used with normalized data.

Table 12 Average relative contributions, slacks and relative targets of the non-benchmark countries in traditional BoD with normalized data

As can be seen, more homogeneous relative contributions with the normalized variables than with the raw data were found. However, a tendency to concentrate the weight on the health dimension is still observed, since (even normalized) health continues to be the dimension in which countries have a discrepancy between them. In other terms, most countries show slacks in the income variable. Correlated with slack, there are relative targets that were higher for income, which means that, in this dimension, countries must concentrate their efforts on achieving a better ranking position and becoming a benchmark.

5.2 Multiplicative HDIBoD

The Multiplicative BoD presents similar findings to the traditional BoD. Many countries present similar (or the same) HDIBoD in both models, especially those that concentrated all the weight on a single dimension. This finding is explained because the assumptions of the two models are almost the same, the only difference between them being the aggregation method. Due to this, the descriptive HDIBoD statistics between the Multiplicative and traditional BoD model are very similar (see Table 13).

Table 13 Descriptive statistics of the traditional and Multiplicative HDIBOD

Despite this similarity, the benchmarks between multiplicative and traditional BoD models are not necessarily the same. For example, in the multiplicative raw data model, Norway is now a benchmark, improving its performance compared to the traditional BoD model. In contrast, using normalized data, we found the same benchmarks for both models, including the three false benchmarks.

Analyzing the standardized degree centrality of the benchmarks, the result obtained with the multiplicative model was similar to the traditional model. For example, Hong Kong occupies a central position in both models. However, some differences were found; for example, unlike the traditional model, in the multiplicative model with raw data, Liechtenstein was not recognized as a benchmark for any country. Table 14 summarizes these findings.

Table 14 Analysis of benchmarks in the Multiplicative BoD model

Figure 3 illustrates the network using the multiplicative BoD model with raw data. The network using the multiplicative BoD with normalized data was not given since it presents the same findings as the traditional model.

Fig. 3
figure 3

Network of the multiplicative BoD with raw data

As can be seen in Table 15, the clusters formed in both the raw and normalized data were quite similar to those obtained in the traditional model. For example, as in the traditional model, it was possible to build 5 clusters in the raw data and 12 in the normalized data from the benchmarks.

Table 15 Clustering of countries in the Multiplicative BoD

There were, however, differences in relation to cluster A. For example, the raw data, which in the traditional model contained only Norway, adopted Liechtenstein, Australia and Singapore as benchmarks, whereas the multiplicative model contained only Ireland, with Norway, Australia and Singapore as benchmarks. All other clusters, both in raw and normalized data, are defined by the same set of benchmarks in both models.

The multiplicative model does not define the relative contribution in the same way the traditional model does. In this sense, weights denote the importance of each analyzed dimension. For example, while the multiplicative model presents identical weights for countries in the same cluster, the traditional model presents very close relative contributions (range less than 2%) for the countries in the same cluster.

Finally, Table 16 presents the average value of the weights, relative targets, and slacks for the multiplicative model of the non-benchmark dimension countries. Except for slacks, all other information was quite similar to the traditional BoD.

Table 16 Average relative contributions, slacks and relative targets of the non-benchmark countries in multiplicative BoD

In the multiplicative model, the effect of the slacks must be combined with the CI's effect for the calculation of the target of a country. The multiplication of each sub indicator by its own slack and its HDIBOD inverse value will bring the target value. Thus, with raw data, in addition to the increase resulting from the division of each sub indicator by the HDIBOD, countries that present slacks must increase, on average, an additional 28% of their educational level and 908% of their income level to become a benchmark. With normalized data, it is necessary to increase 25% in education, 26% in income, and 6% in health on average. Despite this difference, income shows the most frequent and considerable average slack among the dimensions analyzed in both models. In addition, the smallest and less frequent slacks were observed for the health dimension.

5.3 Slack Based HDIBoD

The slack based measure BoD (SBM-BoD) was used to measure the HDIBoD based on the slacks of each variable and not on their equiproportional distance to the frontier as is the case in the traditional and multiplicative BoD. In this model, the slack is no longer a bias and becomes the basis for the construction of the composite index; this is a feature of all non-radial DEA models, such as SBM and RAM.

The properties of this model mean that the resulting CI is always less than or equal to that calculated from the traditional BoD model; and the greater the slack presented in the traditional BoD model, the greater the discrepancy between the indices of the two models. This fact explains some of the patterns presented in Table 17, which shows the descriptive statistics related to the indexes constructed with the two models, using the raw and normalized HDI data.

Table 17 Descriptive statistics of the traditional and SBM HDIBOD

For both types of data, it was found that the mean value for the CIs was always lower, and the standard deviation always higher in the SBM model than in the traditional model. It was also noted that the most considerable discrepancy between the two models occurred within the raw data, since with this data countries showed enormous slacks in the traditional model due to the presence of outliers. The conclusion drawn was that the distance between the outliers and the other units is even more significant in the SBM model.

The benchmarks for the two approaches are precisely the same. However, the ranking positions of the other countries changed from one model to another. For example, considering the raw data, Japan ranked in sixth position in the traditional model (with HDIBOD = 0.998), but in 17th position in the SBM-BoD (with HDIBOD of = 0.824). Considering the normalized data, the variation is smaller, and Japan ranks in 10th position in the traditional model (with HDIBOD of = 0.997), and in 14th position in the SBM-BoD (with HDIBOD of = 0.991). The SBM-BoD model does not present false benchmarks (countries with HDIBOD equal to 1 but with slacks), and therefore Qatar (HDIBOD = 0.916), Brunei (HDIBOD = 0.913), and Liechtenstein (HDIBOD = 0.983) no longer present a CI equal to one with the normalized data.

As was done in the previous sections, Figs. 4, 5 illustrate the network map obtained from the results of the SBM-BoD model applied to the raw and normalized data, respectively; 9 clusters were formed when the raw data were applied and 7 when the normalized data were applied.

Fig. 4
figure 4

Network of the SBM-BoD with raw data

Fig. 5
figure 5

Network of the SBM-BoD with normalized data

Detailing the patterns of these figures, we found that although the benchmarks were the same as those represented by the traditional model, the standardized degree centrality had changed. This finding can be seen in Table 18.

Table 18 Analysis of benchmarks in SBM-BoD model

For example, Qatar presented the most standardized degree centrality in the raw data and Norway the most standardized degree centrality in the normalized data. On the other hand, Hong Kong was the least central in both models. This change is due to the property of the SBM model, which works with the maximization of the sum of slacks, and tends to impose more aggressive targets on countries. For this reason, the benchmark of each country will no longer be the one that is the most similar, but the one that is the most distant from it.

Table 19 details the clusters formed in each network. The largest cluster was formed by countries that adopted only Qatar (in the raw data) and only Norway (in the standardized data) as a benchmark, and adopted equal relative contribution (33.33%). In the raw data, clusters C, G and H did not have a regular pattern of relative contributions; despite this, it was decided to keep them together, indicating the range found in the table. Another point to be highlighted is that in the two types of data, no country attributed a zero relative contribution to any sub-indicator.

Table 19 Clustering of countries in the SBM-BoD

Finally, Table 20 presents some average information on the non-benchmark countries obtained through the SBM BoD approach, with raw and normalized data.

Table 20 Average relative contributions, slacks and relative targets of the non-benchmarks in the SBM-BoD model

As can be seen, the relative contribution in the SBM model is much more balanced than that obtained in traditional models, reaching very close to one third for each variable. The explanation for this is that in the form of the multipliers of the SBM model there are a series of restrictions on the weights assigned to each variable.

Another point that deserves to be highlighted is that the slacks and relative target in the SBM model are much more intense than those of traditional models, in addition to being present in all countries that are not benchmarks. The formulation of the model, based in maximizing the sum of slacks, generates this result. In this model, therefore, each country seeks to reach the frontier by the longest path, which explains why the results of the CIs calculated by the SBM are usually lower than those calculated by the traditional BoD.

5.4 Range Adjusted HDIBoD

The RAM-BoD model presents peculiar characteristics. For the raw data, this model presents an intermediate behavior between the traditional and SBM models (closer to the traditional), both in terms of variability and average value. On the other hand, using the normalized data, the CI shows more significant variability and a lower average value than the SBM model, despite the high correlation. The RAM model results were similar to the raw and normalized data, due to the translation-invariant characteristic of the RAM model. Table 21 summarizes this finding.

Table 21 Descriptive statistics of the traditional, SBM and RAM HDIBOD

We conclude that among the four tested models, SBM-BoD is the most affected by the presence of outliers (predominant with the raw data), and RAM-BoD is the least affected. In turn, the benchmarks are the same in the traditional, SBM-BoD and RAM-BoD models. However, the RAM-BoD and SBM-BoD models are not affected by the presence of false benchmarks.

In Table 22, we compare the HDIBOD generated by the four models presented in this article for a developed country (Denmark), a developing country (Brazil) and a low human development country (Nigeria).

Table 22 HDIBOD of Denmark, Brazil and Nigeria in the traditional, multiplicative, SBM and RAM BoD

In both raw and normalized data, Denmark's index presented the least variability among the models analyzed. On the other hand, with the raw data, the most significant variation occurred for Brazil, which presents the lowest indicator with the SBM model. Using the normalized data, Nigeria presented the lowest indicator with the RAM model. The RAM model results with the raw and normalized data were similar for the three countries analyzed.

Table 23 presents the links and standard degree centrality of the benchmarks. We found that using raw data, the most important country was Liechtenstein, and using normalized data, the most important country was, again, Norway. The centrality of the benchmarks was precisely the same in the SBM-BoD and RAM-BoD, using normalized data. With the raw data, Qatar, which was the most central country in the SBM-BoD, presents few connections. Given that Liechtenstein's income (the most central country in the RAM-BoD model) is lower than Qatar´s, we argue that the targets suggested by the RAM-BoD model tend to be less aggressive than those suggested by the SBM-BoD.

Table 23 Analysis of benchmarks in the RAM-BoD model

Using the RAM-BoD model, we found 6 clusters with raw data and 7 clusters with normalized data. The clusters formed by the normalized data were the same as those formed in the SBM-BoD model; therefore, the network will not be reproduced in this section. The network and the clusters obtained with the raw data can be seen in Fig. 6

Fig. 6
figure 6

Network of the RAM-BoD with raw data

The clusters formed with the RAM-BoD model have a characteristic in common with the multiplicative BoD. The weights attributed to all countries belonging to a given cluster were practically the same. Such behavior is quite different from that found with the traditional BoD model, in which the weights were different, but the relative contributions of the countries were homogeneous within a cluster. Since countries have different sub-indicators, the fact that the weights are the same caused the relative contributions to vary within a cluster. In Table 24, we chose to present the relative contributions (mean and amplitude) of the clusters, as they are easier to interpret.

Table 24 Clustering of countries in the RAM-BoD

With the raw data, clusters A, B, C, and D are formed by the same countries and benchmarks from which they were formed in the SBM model. In both models, cluster E was formed by countries that adopted Singapore and Liechtenstein as benchmarks, but with the RAM-BoD model, this cluster is composed of 6 countries (Luxembourg, Italy, Malta, Cyprus, Andorra, and Portugal), while in the SBM-BoD, the ‘cluster’ comprises Luxembourg only. In the SBM-BoD model, there were also four other clusters (F, G, H, and I), which were composed of countries that adopted Qatar (alone or accompanied) as a benchmark. In the RAM-BoD model, on the other hand, given that no country adopts Qatar as a reference, only one more cluster (cluster F) remains, comprising the 145 that adopted only Liechtenstein as a benchmark. This difference did not occur in the normalized data, and all clusters were the same in the SBM-BoD and RAM-BoD models.

Finally, Table 25 shows the average relative contribution, slack and relative target of the non-benchmark countries in the RAM-BoD model.

Table 25 Average relative contributions, slacks and average relative targets of the non-benchmarks in the RAM-BoD model

As in the traditional model, both in the raw and normalized data, the relative contributions had predominance in the health sub-indicator. However, this predominance was not as strong as in the traditional model. In the raw data, unlike the other models, there was not a significant discrepancy between with the number of countries that had slacks in the three dimensions. The number of countries with slack and the average relative target were the same in the RAM and SBM models in the normalized data. Still, in relation to the clearances, the RAM model tended to be less intense than those obtained in the SBM model for both types of data.

5.5 HDIBoD with Weight Restrictions

The next approach to be explored is models with restrictions in the relative contribution of the sub-indicators. In principle, we could choose the lower bound percentages for the sub-indicators relative contribution, ranging from 0 to 1/3 for each dimension. When adopting the approach of Van Puyenbroeck et al. (2020), however, we face the infeasibility of linear programming problems for some countries in the sample, especially when we use raw data. Table 26 shows the number of infeasible results for each type of restriction, considering the limits of 5%, 10%, 15%, 20%, 25%, 30%, and 33.33%.

Table 26 Number of infeasible results for each type of restriction

Table 27 shows the benchmarks obtained for each established lower bound percentage (with no infeasible results) for the raw and normalized data, as well as the number of connections and the standardized degree centrality.

Table 27 Analysis of benchmarks in the models with weight restrictions

The greater the restrictions placed on the weights, the fewer the benchmarks that are generated. In the analysis with raw data, for example, the benchmarks that were initially five (0%) became four (5%). In the normalized data, the benchmarks were initially six (0%, 5% and 10%), reduced to four (15%), to three (20%) and finally to one (25%). In addition, the more substantial the restrictions are, the lower the standardized degree centrality in Hong Kong and the greater that in Qatar (raw data) or Norway (normalized data).

Hong Kong’s loss of importance was because its prominence in the traditional BoD was mainly due to its excellent performance in the health dimension (which was how countries most concentrated HDIBoD weight). In turn, Norway's importance with the normalized data occurs due to the more noticeable balance between its variables. Qatar and Liechtenstein were protagonists of the models with restrictions in the raw data due to their excessively discrepant performance in the income variable. When the other units were prevented from attributing zero weight to this indicator, Qatar and Liechtenstein gained an advantage.

Table 28 shows the descriptive statistics of the CI constructed with different types of weight restrictions.

Table 28 Descriptive statistics of HDIBoD with weight restrictions

When weight restrictions are applied, the CI value of all countries becomes less than or equal to that obtained in the unrestricted model; and the greater the restriction, the lower the CI will be (see Table 29). The reason for this is that, by definition, the traditional model works with the most advantageous weights for each unit, which is because the average value of the CI decreases, and the standard deviation and the range increases as more substantial restrictions are placed on the weights. Table 29 presents the 10 countries with the highest CI, obtained considering different weight restrictions for the raw and normalized data.

Table 29 Ranking of the 10 best countries in the in the HDIBOD with weight restrictions

Finally, Table 30 presents the average relative contributions obtained in the models. In the models without restriction, in both raw and normalized variables, most countries concentrated the weight on the health dimension and avoided assigning weight to the income dimension. When weight restrictions were placed, this premise did not change, so that countries continued, insofar as restrictions would allow, avoiding the income variable and concentrating weight on the health dimension.

Table 30 Average relative contributions of models with weight restrictions

5.6 HDIBoD with Common Weights

The next BoD extension explored is the common weight models of Despotis (2005a, b), Toffalis (2013) and cross-evaluation. When applying Despotis’ (2005a, b) model to the raw HDI data, testing different values to the parameter “t” (step of 0.01), three weight sets were reached: (w1) t ranging from 0 to 0.13; (w2) t ranging from 0.14 to 0.36; (w3) t ranging from 0.37 to 1. In normalized data, three sets of weights were also reached: (w1) t ranging from 0 to 0.59; (w2) t ranging from 0.6 to 0.71; (w3) t ranging from 0.72 to 1. In addition, weights were adopted derived from: (a) the simple average between w1, w2 and w3 (Dominguez-Serrano & Blancas, 2011); and (b) the weighted average, for the size of the t intervals, of these three sets of weights. In contrast, the common weights used in the Toffalis (2013) approach are obtained from the application of the ordinary least squares (OLS) method; and in the cross-evaluation approach they are obtained from the average of the most advantageous weights of all countries in the sample.

Table 31 shows the sets of common weights and the average relative contributions (and their ranges) obtained in each approach. It should be noted that even if the same set of weights is used, the relative contribution of countries, which also depends on the magnitude of the sub-indicators, can vary considerably.

Table 31 Weights and average relative contribution obtained in the common weight approaches

As in the traditional model, the highest concentration of weight occurred in the health dimension in all approaches and for both types of data. The average relative contribution of this variable, however, was less intense than in the traditional BoD model (except in the w3 set of weights in the normalized data, in which the weight was concentrated only on the health dimension). The education variable, in turn, received very low or zero weights for most approaches in the standardized data (with the exception of cross-evaluation). It was also found that income received considerable weight, especially in normalized data. For both types of data, the average relative contribution of this variable was greater than in the traditional boD model. Probably, this variable did not receive zero weight, as in the traditional BoD model, because of the effect that this would have on the countries with high income, which would be severely penalized, creating huge deviations between the result of traditional BoD and that of the common weights approach. Given that common weights are determined to minimize deviations, it is natural that considerable weights had been assigned to the income variable.

When the approach based on linear regression was applied to the raw data, education received a negative weight, probably also due to the outliers effect on the income variable. This makes no practical sense, as weights must always be positive. To deal with this problem, we consider the weight of education to be 0, and we only used the positive weights in this case.

In practice, as can be seen in Table 32, which compiles the descriptive statistics for the different indices, the different sets of common weights do not make much difference to the HDIBoD value. It is worth mentioning that the common weights based on cross-evaluation and on the average and weighted average of w1, w2 and w3, did not generate any country with a CI equal to 1.

Table 32 Descriptive statistics of HDIBoD with common weights

Table 33 shows the 10 countries with the best CI for each set of common weights.

Table 33 Ranking of the 10 best countries in the HDIBOD with common weights

In the raw data, Liechtenstein, Qatar, Singapore and Norway (income outliers) stood out with all the weights obtained using Despotis’ model. In all approaches with normalized data and in the linear regression and cross evaluation approaches with raw data, the effect of income was not so high, causing Qatar and Liechtenstein to drop out of the ranking of the top 10. In these cases, Hong Kong, which has the longest life expectancy in the sample, Singapore, which besides being an outlier in income, also has a good performance in the health dimension, and Switzerland, occupied the top of the ranking.

5.7 HDIBoD with Tiebreaker Methods

Table 34 shows the results obtained for the benchmarks in the super-BoD model. It should be noted that authors such as Banker and Chang (2006) have criticized the use of this method for ranking, claiming that its greatest utility is for the identification of outliers. This is clear from the results in Table 34, in which it can be seen that in the raw data the most outstanding countries were Qatar and Australia, the two largest outliers in the income and education variables respectively. In the normalized data, there was less discrepancy between the results, with the principal highlight being Norway.

Table 34 Super HDIBOD of the benchmarks

One of the results which the inverted frontier provides is the identification of anti-benchmarks, which are the countries that constitute the frontier of the worst practices. Both in the raw and normalized data, the anti-benchmarks were the same four countries: Sierra Leone, Chad, Central Africa Republican and Niger.

Table 35 presents the descriptive statistics related to the following BoD tiebreaker approaches: traditional, inverted index, multiplicative cross-evaluation, composite index of Leta et al. (2005) and Zhou et al. (2007) – both with α = 0.5, and triple index—with α = β = γ = 1/3.

Table 35 Descriptive statistics of HDIBOD with tiebreaker methods

As can be seen, the highest average and the lowest variability occurred in the results of the traditional BoD in which the most advantageous weights are adopted. The lowest average value and the highest variability occurred in the composite index of Zhou et al (2007). The triple index has neither the lowest average nor the highest variability, since the multiple approaches considered tend to compensate each other. Thus, the triple index, despite having less discrimination power, has the advantage of incorporating a greater plurality of views.

Table 36 shows the ranking of the top 10 countries for the different tiebreaker approaches used in this work, with raw and normalized data.

Table 36 Ranking of the 10 best countries in the HDIBOD with tiebreaker methods

The first detail that draws attention in Table 36 is that when using the different tiebreaker methods, the rankings obtained were very similar (including between raw and normalized data). For example, Hong Kong continued to be the country that stood out the most in all tiebreaker criteria, both when using raw and normalized data. Other countries that stood out were: (a) Switzerland, which occupied prominent positions in the multiplicative crossed and inverted rankings (in both types of data); (b) Japan, which was the highlight in the inverted index, ranking second in both types of data; and (c) Australia and Singapore, which also performed well in all rankings. The main conclusion reached in this section, therefore, was that the tiebreaker methods do not differ much from each other in terms of HDI results and that they are less influenced by normalization than the traditional BoD model.

6 Conclusion and Pratical Implications

Several studies have used Data Envelopment Analysis (DEA) to measure human development. This article compared several DEA models and extensions, using raw and normalized data, to measure human development worldwide, and to provide 40 different ranking of countries. In this work, we assumed that there is no perfect model, and that advantages can be derived from the application of several models together. These different models/extensions allow for a detailed analysis of the countries, including their ranking, clustering, building networks, goal setting, etc., in different contexts, e.g. using most advantageous weights, less advantageous weights, common weights, cross-weights, and weight restrictions etc.

This study presented some contributions. First, in the traditional BoD using raw data, the benchmarks were countries (outliers or not) that each performed well in one specific data dimension. Second, the use of normalized data in the traditional BoD contributes to high HDI countries reaching top positions in the HDIBoD ranking. Third, in the traditional BoD, the sub-indicator relative contribution was more homogenous with normalized than with raw data. Fourth, all the results of the multiplicative model were quite similar to those of the traditional model. Fifth, the SBM-BoD and the RAM-BoD did not present false benchmarks. Sixth, the sub-indicators relative contribution in the SBM-BoD were the most balanced (near to 33.33% for each variable). Seventh, the RAM-BoD model presents the smallest differences between the normalized and raw data results. Eighth, the weight restriction approach revealed that higher restrictions mean fewer benchmarks and lower index averages and that normalized data is less subject to the infeasibility problem. Ninth, indexes showed similar averages and rankings, using different common weights approaches. Tenth, different tiebreak techniques little affected the HIDBoD rank and are little affected by sub-indicator normalization.

In terms of practical implications, this paper presents several recommendations. First, researchers must prefer normalized data to avoid outliers and find a more homogenous relative contribution between the variables. Second, we argue in favor of the non-radial models (RAM and SBM), which do not show false efficiencies, but have been less adopted by the literature. Third, researchers can use weight restrictions, common weights or tiebreaker methods to reduce benchmarks and avoid ties.

We present some limitations of this study to open avenues for future research. First, although the study used three dimensions of human development, future studies are encouraged to examine other aspects, such as infrastructure and gender inequality, among others. Second, future researchers can also examine the phenomenon using regional datasets. Third, the empirical results represent an aggregate developed and developing countries estimate; testing the DEA models for each development group may provide further interesting information. Fourth, future studies can contribute by analyzing how the DEA technique should treat small top-ranked countries (generally income outliers) such as Norway, Hong Kong, Switzerland, Singapore, Qatar, Brunei, and Liechtenstein in an original way; this advance could contribute to a more effective differentiation between HDIBoD and the Human Development index.

Finally, despite these limitations, Data Envelopment Analysis presents several opportunities to corroborate with social indicators, especially measuring human development across regions.