1 Introduction

A significant number of companies worldwide rely on management systems (MSs) (ISO, 2021) to improve corporate operations (Robson et al., 2007; Sampaio et al., 2009) and address stakeholders’ needs systematically (Poltronieri et al., 2018). Given that achieving “development that meets the needs of the present without compromising the ability of future generations to meet their own needs” (UNWCED, 1987, p. 54) nowadays represents a normative concept (Hahn et al., 2015), corporate executives are under increasing pressure to fulfil one particular stakeholder demand: making their companies more sustainable (e.g. Ashrafi et al., 2020; Silva et al., 2019; Talbot et al., 2021; Yunus et al., 2020).

For example, consumer attitudes towards sustainable products and services are increasingly positive (e.g. de-Jacobs et al., 2018; Magistris & Gracia, 2016) and investors are placing increasing value on data on sustainability-related issues for financial commitments (e.g. Amel-Zadeh & Serafeim, 2018; Grim & Berkowitz, 2020; van Duuren et al., 2016). In this context, such stakeholders often consider firms’ environmental, social and governance (ESG) scores in their decision-making process (Avetisyan & Hockerts, 2017; Rajesh & Rajendran, 2020) and, in return, companies that apply ESG practices can improve stakeholders’ trust by accumulating social capital and strengthening attachment to the firm (La Fuente et al., 2021). Scholars also devote a great deal of attention to the ESG concept (Do & Kim, 2020), which has emerged as a measure of companies’ corporate sustainable performance (CSP) (Avetisyan & Hockerts, 2017; Dorfleitner et al., 2020; Rajesh & Rajendran, 2020).

When it comes to researching CSP in relation to MSs, however, academics focus more on investigating the benefits related to specific issues such as reduced emissions (e.g. Russo, 2009) and sustainable supply chains (e.g. Zimon et al., 2021), as opposed to connecting MSs with the broader ESG concept as a framework for the various CSP demands of stakeholders. Few studies consider ESG ratings alongside MSs. Broadstock et al. (2021), for example, state that, to achieve higher scores in the environmental pillar, companies must perform well in environmental MSs (EMS) certification. Furthermore, Schmid et al. (2017) conclude that ESG themes may be anchored in quality MSs (QMSs), and Chams et al. (2021) deduce that firms with QMSs are less reliant on financial capital to improve ESG ratings. Nevertheless, to the best of the authors’ knowledge, there is a shortage of academic studies that connect MSs to ESG performance and empirically analyse their relationship, which is evidenced by the lack of corresponding search results in databases like Web of Science and Scopus.

Such studies would provide valuable insight into the strengths and weaknesses of individual MSs in terms of meeting specific environmental, social and/or governance needs. This knowledge would make it possible to draw managerial conclusions regarding which MSs to implement and combine to satisfy certain stakeholder CSP demands. Thus, the aim of this work is to start filling this research gap by empirically proving that QMSs and EMSs, which are the most widely adopted MSs on a global level (ISO, 2021), represent powerful business tools to achieve enhanced ESG performance, by answering the following three research questions (RQs):

RQ1: Do companies that operate QMSs and/or EMSs achieve statistically significant higher ESG scores than firms without such MSs?

RQ2: Which ESG issues are positively impacted by the implementation of QMSs and/or EMSs?

RQ3: Do companies that apply both QMSs and EMSs simultaneously achieve higher ESG performance than firms that operate with only one of these MSs?

To answer these RQs, this study presents a comprehensive exploratory literature review and both descriptive and cluster analyses of ESG data from 2019 for 4292 companies spread among the three leading global economic areas: Europe, East Asia and North America. Refinitiv Eikon is used as data basis. The descriptive analysis describes the fundamental characteristics of the data and measures central tendencies among the sample groups with or without MSs (Mishra et al., 2019). The cluster analysis gradually classifies the sample based on similarities (J. Bu, Liu, et al., 2020; Bu, Qiao, et al., 2020), thus allowing patterns to be defined between companies with QMSs, EMSs or no alike MSs.

This paper contributes to the academic literature by directly connecting QMSs and EMSs to the ESG concept and by empirically proving at a global level that both MS types serve as powerful business tools for enhancing ESG scores. The study helps corporate executives to understand the ESG-related strengths inherent in quality and environmental MSs and, in addition, highlights how combining these MSs can impact a corporation’s sustainable performance in different ESG categories. Furthermore, the results give policymakers an insight into the positive relationship between MSs and CSP, as well as the regional and industrial differences in ESG scores, thus emphasizing the importance of pushing forward with the international standardization of best practices in management as well as their global diffusion.

The paper continues in six sections. Section 2 provides extensive background information on MSs and ESG ratings. Section 3 explains the data sampling process and methodologies applied. Section 4 presents the findings and Sect. 5 the discussion. Section 6 offers some conclusions.

2 Literature review

2.1 Stakeholder theory

In accordance with the increasing stakeholder focus on CSP, this paper follows the reasoning that companies must not only fulfil obligations to their shareholders in order to be successful, but that the interests of multiple parties with stakes in the social and financial performance of the firm must be taken into account (Donaldson & Preston, 1995). This aligns with the concept of MSs, which are directed at satisfying specific stakeholder needs (as outlined in the MSs’ underlying standards), as well as the ESG concept, which is linked to numerous stakeholders, including society, suppliers, employees and shareholders (La Fuente et al., 2021; Muñoz-Torres et al., 2019). Thus, this study is grounded in stakeholder theory, which goes beyond simply maximizing the wealth of owners to acknowledging “any group or individual who is affected by or can affect the achievement of an organization’s objectives” (Freeman, 1984, p. 46), while addressing “morals and values explicitly as a central feature of managing organizations” (Phillips et al., 2003, p. 481).

In general, Freeman’s (1984) stakeholder theory offers a pragmatic approach to strategy that urges firms to be aware of their relationships with all stakeholders in order to become more successful (Laplume et al., 2008; Lee & Isa, 2020). At the moment, the stakeholder theory appears to be the prevailing theory in CSP-related research (Daugaard & Ding, 2022). Thereby, it should be acknowledged that (i) different stakeholders influence organizations in different ways, (ii) some stakeholders have more influence over organizations than others, (iii) not all stakeholders might be regarded as legitimate stakeholders by organizations–in this regard, stakeholder theory is closely related to legitimacy and institutional theories “in the sense that only those with legitimate claims and institutional identification can be considered stakeholders” (Daugaard & Ding, 2022, p. 2)–and (iv) existing organization/stakeholder relations are not static but can change (Friedman & Miles, 2002). Developments in relationships in any direction might be induced by (a) changes in material interests of either side, (b) emergence of contingent factors, (c) changes in the sets of ideas held by stakeholders and/or organizations, or (d) institutional support changes (Friedman & Miles, 2002). Nowadays, we witness increasing contingent factors such as related to global climate change or pandemics, causing more and more stakeholder groups, including shareholders, to adjust their material interests and to value sustainable development as an increasingly important aspect. In alignment, the institutional support for CSP increases as visible in policy making and media coverage. Hence, to ensure sustained business success, this study argues that companies must be aware of the environmental, social and governance demands of stakeholders and address them accordingly by using suitable business tools. Therefore, the following exploratory literature review on MSs and ESG ratings emphasizes the stakeholder focus inherent in both concepts.

2.2 Management systems

MSs are a set of procedures to be followed to achieve stakeholder satisfaction concerning specific demands, thus a “process of systemizing how things are done” (Mahesh & Kumar, 2016, p. 578). They are implemented to handle stakeholders’ needs systematically in both internal and external organizational contexts (Poltronieri et al., 2018; Rebelo et al., 2016) and are aimed at the continuous improvement of operations and procedures (Robson et al., 2007; Sampaio et al., 2009). MSs can be classified as quality, environmental or occupational health and safety (OHS) systems, among others, depending on their objective (Jørgensen et al., 2006). The core elements of MSs are often defined in management system standards (MSSs), and compliant companies can receive certification if the standard allows it (Oliveira, 2013; Santos et al., 2011). These MSSs are developed and published by national and international bodies, the most famous being the International Organization for Standardization (ISO) (Karapetrovic & Jonker, 2003), and ISO 9001 for QMSs as well as ISO 14001 for EMSs are the most commonly implemented and certified MSSs worldwide (ISO, 2021).

In general, a QMS is the means by which quality management practices, such as quality planning, control, assurance and improvement, are turned into an integral part of an organization that directly affects the way it conducts business (Nanda, 2005). An EMS, on the other hand, seeks to make organizations both more competitive and more environmentally responsible by adapting techniques aimed at reducing environmental impacts such as waste reduction and process/product redesign (Watson et al., 2004). The implementation of such MSs results in various benefits (e.g. Aba & Badar, 2013; Bernardo et al., 2015; Tarí et al., 2012). For example, QMSs are positively correlated with business performance, as companies improve the efficiency of their processes, provide their customers with added value, enhance customer satisfaction and, ultimately, generate more revenue (Singh, 2008; Tarí et al., 2012; Zaramdini, 2007). Similarly, EMSs positively impact the performance of firms due to savings in resource input and energy consumption, increased efficiency and better profitability (Tarí et al., 2012; Zutshi & Sohal, 2004). However, the adoption benefits depend on the individual circumstances of firms. Operating MSs alongside comparable practices, for example, might be less beneficial for companies’ financial performance due to the redundancy of different processes aimed at similar goals related to stakeholder satisfaction (e.g. Franco et al., 2020).

2.3 ESG ratings and scores

ESG ratings are company assessments based on an evaluation of environmental, social and governance matters whose individual weightings result in an overall score (Clementino & Perkins, 2021). They are provided by specialized rating agencies, whose expertise makes them a key reference point for firms, financial markets and scholars regarding CSP data (Escrig-Olmedo et al., 2019) and which emerged in response to an increased demand for social and environmental information (Avetisyan & Ferrary, 2013). Rating agencies typically use their own research methodologies (Avetisyan & Hockerts, 2017), which are based mainly on publicly available information, third-party research and corporate reports (Drempetic et al., 2020; Jackson et al., 2020).

Applying ESG practices is generally aligned with stakeholder theory (Lee & Isa, 2020), as the concept is linked to numerous stakeholders (La Fuente et al., 2021; Muñoz-Torres et al., 2019). Furthermore, ESG scores play a crucial role “in helping stakeholders apprehend, evaluate and manage the increasingly complex, multi-faceted nature of business ethics and sustainability” (Clementino & Perkins, 2021, p. 381). They serve as a standard for comparison and set benchmarks for further improvement (Rajesh, 2020; Tamayo-Torres et al., 2019). Managing ESG issues responsibly increases companies’ integrity within society and stakeholders’ trust, thus influencing the economic performance of firms (Tarmuji et al., 2016). Therefore, companies with high ESG ratings might enjoy better market and financial performance (e.g. Aboud & Diab, 2019; Kotró & Márkus, 2020; Shakil, 2020), although there is no univocal consensus (Brogi & Lagasio, 2019; Miralles-Quirós et al., 2019; Taliento et al., 2019). Due to increasing public awareness of sustainability issues and the corresponding corporate acknowledgement, the number of firms disclosing ESG data is rapidly increasing (Alsayegh et al., 2020).

However, ESG ratings also face criticism. As the concept has no fixed boundaries, the validity of ratings is questioned, since the various rating agencies view the ESG pillars differently and, moreover, use different weighting strategies to compile the final scores (Chatterji et al., 2016; Saadaoui & Soobaroyen, 2018). Another set of criticism concerns the quality of the data underlying the scores (Clementino & Perkins, 2021; Drempetic et al., 2020). To mitigate these key concerns related to ESG ratings, this study utilizes data from Thomson Reuters, whose ESG database is one of the market leaders and is both used and accepted by fellow scholars (e.g. Burritt et al., 2020; Jeriji & Louhichi, 2021; Rajesh, 2020; Yunus et al., 2020).

2.4 ESG-related benefits of MS implementation

To justify researching the role of QMSs and EMSs as business tools to enhance ESG ratings, this work clusters their adoption benefits by ESG pillar (see Table 1) and, subsequently, derives corresponding hypotheses about their impact on ESG performance.

Table 1 Benefits of QMS and EMS Adoption sorted by ESG Dimensions (source: own elaboration based on Eikon database)

2.4.1 Benefits regarding the environmental pillar

EMS adoption leads to various environmental-related benefits, such as decreased and more efficient use of resources (e.g. Gavronski et al., 2008; Tan, 2005), and facilitates the implementation of environmental management practices regarding green product design, procurement, production, logistics and packaging (e.g. Wong et al., 2020). Furthermore, EMSs enable companies to reduce emissions (e.g. Potoski & Prakash, 2005; Russo, 2009) and the risk of environmental accidents (e.g. Bravi et al., 2020). Environmental innovation capabilities (e.g. M. Bu, Liu, et al., 2020; Bu, Qiao, et al., 2020; Montobbio & Solito, 2018) and enhanced problem solving with regard to technologies and procedures might also evolve (e.g. Ann et al., 2006). With regard to QMSs, these can reduce waste (e.g. Zimon et al., 2021) and, furthermore, positively impact environmental process innovations (e.g. Ziegler, 2015), especially for supply chain management (e.g. Shi et al., 2019), a crucial organizational element of CSP. In addition, quality management “can help support necessary stakeholder management in sustainable development” (Siva et al., 2016, p. 151). In conclusion, the following hypotheses are derived:

H1: Companies operating with QMSs achieve higher performance scores in the environmental pillar than firms without QMSs.

H2: Companies operating with EMSs achieve higher performance scores in the environmental pillar than firms without EMSs.

2.4.2 Benefits regarding the social pillar

Both MSs present several positive effects when it comes to workforce, community and product responsibility. Regarding human rights, no specific academic research was detected. However, EMS implementation increases legal and regulatory compliance (e.g. Bravi et al., 2020), which implies a certain level of conformity with basic human rights. Important benefits related to workforce are increased employee motivation (e.g. Gavronski et al., 2008; Zaramdini, 2007) and better internal communication (e.g. Sampaio et al., 2009; Tan, 2005). With respect to community, both MSs result in improved relationships with suppliers and other key stakeholders, as stated in the standards (e.g. Bernardo et al., 2015; Casadesús & Karapetrovic, 2005; Zeng et al., 2005), among others. Regarding product responsibility, MSs increase customer satisfaction, communication and relationships, as well as product and service quality (e.g. Casadesús & Karapetrovic, 2005; Gotzamani & Tsiotras, 2002; Tarí et al., 2012). Hence, the hypotheses related to this pillar are as follows:

H3: Companies operating with QMSs achieve higher performance scores in the social pillar than firms without QMSs.

H4: Companies operating with EMSs achieve higher performance scores in the social pillar than firms without EMSs.

2.4.3 Benefits regarding the governance pillar

Positive links have been revealed between MSs and the management of organizations. QMSs enhance internal organization and operations (e.g. Sampaio et al., 2009), increase the commitment of management to best quality practices (e.g. Arauz & Suzuki, 2004) and improve management-employee relationships (e.g. Gotzamani & Tsiotras, 2002). EMSs result in better awareness of environmental issues among both management and employees, as well as enhanced internal organization (e.g. Gotzamani & Tsiotras, 2002; Schylander & Martinuzzi, 2007). Regarding corporations’ effectiveness with respect to the equal treatment of shareholders, no academic studies revealing specific relationships were detected. Regarding CSR strategies, EMS adoption leads to improved CSR activities (e.g. Ikram et al., 2019), as incorporating CSR principles is closely related to EMS principles (e.g. Dubravská et al., 2020) and QMSs provide a structural framework that facilitates the adoption of CSR policies, strategies and activities (e.g. Frolova & Lapina, 2015). Thus, hypotheses five and six are deduced:

H5: Companies operating with QMSs achieve higher performance scores in the governance pillar than firms without QMSs.

H6: Companies operating with EMSs achieve higher performance scores in the governance pillar than firms without EMSs.

2.4.4 Benefits of operating both MSs simultaneously

Table 1 reveals that QMSs and EMSs lead to distinct CSP benefits. Consequently, operating both MSs simultaneously should enable firms to cover an even broader range of ESG issues. Moreover, having EMSs alongside QMSs could give rise to synergy effects (e.g. Casadesús et al., 2011; Zimon et al., 2021), and both MSs together could lead to stronger business performance (e.g. Ferrón Vílchez & Darnall, 2016). In addition, the benefits of MSs integration (e.g. Bernardo et al., 2015) might also play a pivotal role. Although the sample used in this study does not reveal information regarding the integration level, integration benefits should be taken into account, as most organizations with multiple MSs do actually integrate them (e.g. Karapetrovic & Casadesús, 2009). ESG-related integration advantages include the improved adoption of cleaner production technologies (e.g. Hernandez-Vivanco et al., 2018), greater motivation among staff (e.g. Abad et al., 2014), better partnerships with key stakeholders (e.g. Rebelo et al., 2014) and improvements in the organizational culture (e.g. Simon et al., 2012). Therefore, the literature makes it possible to hypothesize the following:

H7: Companies operating with both QMSs and EMSs achieve higher performance scores in the environmental pillar than firms with only either QMSs or EMSs.

H8: Companies operating with both QMSs and EMSs achieve higher performance scores in the social pillar than firms with only either QMSs or EMSs.

H9: Companies operating with both QMSs and EMSs achieve higher performance scores in the governance pillar than firms with only either QMSs or EMSs.

Figure 1 offers a graphic summary of the nine hypotheses outlined in Sect. 2 and reveals their connection to the RQs formulated in the introduction. The ESG variables displayed (V1 to V16), as well as the statistical methods used for testing the hypotheses, are further explained in the following section.

Fig. 1
figure 1

Hypotheses about QMSs and EMSs Adoption on ESG Performance Scores (source: own elaboration)

Fig. 2
figure 2

Applied Methodology (source: own elaboration)

Fig. 3
figure 3

Boxplots for the ESG Overall Score and the Three Pillar Scores (source: own elaboration)

3 Methodology

To test the hypotheses, ESG data from companies located in Europe (EU, UK and EFTA states), East Asia (China, Japan and four tiger states) and North America (USA and Canada) are retrieved and analysed. The country clustering considers geographic regions with comparable economic and human development status, shared commercial relationships and common regulatory environments (e.g. Hartmann et al., 2020; Nallari & Griffith, 2013; UNDP, 2019). The analyses consider the nineteen variables listed in Table 2. Sixteen variables aim at measuring ESG performance (V1 to V16) and three serve as control variables (CV1 to CV3), as empirical studies on both ESG ratings and MSs have shown that results are likely to be influenced by industrial sector (e.g. Garcia et al., 2017; Nadae et al., 2019), region (e.g. Tan, 2005; Thanetsunthorn, 2015) and company size (e.g. Arauz & Suzuki, 2004; Drempetic et al., 2020; Wong et al., 2020).

Table 2 Variables used in the Analysis (source: adapted from Refinitiv (2020))

3.1 Sampling process

The first step in the sampling process involves searching for reliable ESG data. Therefore, Thomson Reuters Eikon, also known as Refinitiv Eikon (formerly ASSET4), is used, as it offers one of the largest ESG databases with ratings for over 10000 companies worldwide. Refinitiv Eikon calculates ten ESG category scores, which evaluate the environmental (V5, V6, V7), social (V9, V10, V11, V12) and governance (V14, V15, V16) dimensions. The category scores are based on numerous data points and summarized in the respective pillar scores (V4, V8, V13), which together result in the overall score (V1). In addition, the ESG combined score (V2) takes into account scandals relating to any of Refinitiv Eikon’s twenty-three ESG controversy topics (V3). All scores are expressed in values between 0 (worst) and 100 (best) (Refinitiv, 2020).

The second step consists of retrieving the aforementioned data for companies headquartered in the regions of interest. Refinitiv Eikon allows users to filter by companies that use QMSs and EMS-certified organizations. The third step involves filtering these data for 2015 through to 2019 to ensure that the companies have been running their MSs for at least five consecutive years. This is done to ensure that the sample firms have accumulated experience of working with MSs to avoid distorting the ESG data with short-term influences that might occur straight after implementing MSs (e.g. Casadesús & Karapetrovic, 2005; Testa et al., 2014). In addition, the filtering by time considers the renewal of certified MSs after a three-year period. To ensure data quality, the fourth step consists of removing all companies that lack information, i.e. that present no value for any of the nineteen variables.

3.2 Sample description

The sampling process was performed on 15 November 2020 and results in data on 4292 companies, which are classified into the following four sample groups:

Group 1: Companies without a QMS or an EMS.

Group 2: Companies with a QMS but no EMS.

Group 3: Companies with an EMS but no QMS.

Group 4: Companies with both a QMS and an EMS.

As illustrated in Table 3, most companies in the sample have not been operating any QMS or EMS (74.5%) consecutively between 2015 and 2019. Firms operating both MSs represent the second largest group (17.4%), and corporations with either a QMS (2.9%) or an EMS (5.1%) constitute less than 10% of the sample.

Table 3 Sample clustered by Control Variables (source: own elaboration)

Regarding sectors, most firms are engaged in finance (27.5%), consumer cyclicals (15.2%), industry (13.5%), technology (12.0%) or healthcare (11.0%). The geographical distribution shows that the majority of the companies is from North America (53.8%), while the number of European (23.4%) and East Asian (22.8%) enterprises is roughly equal. The percentage shares of the four sample groups per region reveal that, whereas a significant portion of the sample in Europe (45.3%) and East Asia (38.8%) runs MSs, companies in North America are much more likely to operate without them (88.7%). This is consistent with the fact that the ten countries with the most ISO 9001 and ISO 14001 certifications are based predominantly in Europe and East Asia, while neither the USA nor Canada appear in the top ten ranking (ISO, 2021). Furthermore, the sample presents a well-distributed cross section of company sizes, which are measured by market capitalization (e.g. Dang et al., 2018). Small (market capitalization < USD 1 billion), medium (< 5 bn) and large companies (> 5 bn) each make up about one third of the sample.

3.3 Data analysis

The sample is analysed with IBM SPSS Statistics 25 and StataSE 16. First, a descriptive analysis is performed to describe the basic features and characteristics of the dataset (Mishra et al., 2019). This makes it possible to explain and validate the research findings and serves as a basis for further quantitative analysis, which is carried out in the framework of a cluster analysis. The cluster analysis is designed to produce a logical structure concerning ESG performance that is easy to read and interpret so that similarities can be analysed (J. Bu, Liu, et al., 2020; Bu, Qiao, et al., 2020).

The descriptive analysis consists of four steps. First, the full sample is analysed to describe the ESG performance of all four sample groups in comparison. Second, data normality is tested with the Kolmogorov–Smirnov test and the Shapiro–Wilk test. As the sample does not present a normal distribution of data, the nonparametric Kruskal–Wallis test is performed in the third step to evaluate the statistical significance of differences. Moreover, the Dunn–Bonferroni post hoc test is conducted and Cohen’s d is calculated to determine the sample groups between which these statistically significant differences exist and to what extent. Fourth, the Kruskal–Wallis test, the Dunn–Bonferroni test and Cohen’s d are performed and analysed for the single control variables; each company size, each region and each sector (except for the academic and educational services sector due to the small sample size). This is done to detect possible influences and potential biases of the control variables. The descriptive analysis is presented in Sect. 4.1.

The cluster analysis considers the ten ESG category scores and is conducted in three subsequent steps. First, the single-linkage method is applied to detect and exclude outliers that might distort the classification; furthermore, hierarchical methods are applied to produce a small number of clusters and distances are measured to evaluate similarities and dissimilarities. To obtain homogeneous groups with minimum variances, the Ward method is used. Such hierarchical clustering is the most widely applied methodology in cluster analysis (J. Bu, Liu, et al., 2020; Bu, Qiao, et al., 2020). This first step results in two clusters. Second, the Mann–Whitney U test is performed to verify the clustering after ensuring that the cluster analysis samples are also not normally distributed via the Kolmogorov–Smirnov test and the Shapiro–Wilk test. Third, the clusters are analysed. This cluster analysis is presented in Sect. 4.2.

Figure 2 summarizes these methodological steps, their application and how they fit into the structure of the paper.

4 Findings

4.1 Descriptive analysis

4.1.1 Step 1: Descriptive analysis of the full sample

The descriptive analysis of the full sample is summarized in Table 4. As shown, group 4 reveals the best performance as measured by the mean and median of the ESG score (V1) and the ESG combined score (V2), whereas group 3 performs second best, group 2 third best and group 1 exhibits the lowest values. With respect to the controversy score (V3), group 1 presents the highest mean. However, this outperformance might be due to the fact that group 1 has the highest percentage of small and medium-sized enterprises (SMEs) (74.7%), which are less likely than their bigger counterparts to feature in the global media. The environmental (V4) and social pillars (V8) show the same performance pattern as the overall score, while group 3 performs best in the governance dimension (V13). The sample groups rank nearly the same for most ESG category scores as for the respective ESG pillar scores. The only exceptions are emissions (V3) and workforce (V9) matters, which are highest in group 3. The overall score and pillar scores are illustrated in Fig. 3 in the form of four box plots.

Table 4 Descriptive Analysis for ESG Performance Variables by Sample Group (source: own elaboration)

4.1.2 Step 2: Test of data normality

Data normality is tested with the Kolmogorov–Smirnov and Shapiro–Wilk tests. Only variables V1, V2 and V13 have an approximately normal distribution for group 2, as assessed by the Kolmogorov–Smirnov test (p > 0.05). However, as assessed by the Shapiro–Wilk test, only V1 and V2 have an approximately normal distribution for group 2 (p > 0.05). When testing data normality for the full sample rather than for the four sample groups, the results of both tests indicate that the data are in fact not normally distributed.

4.1.3 Step 3: Kruskal–Wallis test, Dunn–Bonferroni post hoc test and Cohen’s d

Therefore, the nonparametric Kruskal–Wallis test is used to analyse the statistical significance of the differences between sample groups. As demonstrated in Table 5, there are differences for all sixteen ESG indicators regarding the central tendencies between the four sample groups (p < 0.05).

The Dunn–Bonferroni test is used to reveal the sample groups between which there are statistically significant differences. Table 6 provides an overview of the post hoc test. In addition, the effect size is quantitatively measured by Cohen’s d to evaluate the magnitude of these differences, as shown in Table 7.

Table 5 Independent-Samples Kruskal–Wallis Test (source: own elaboration)
Table 6 Post hoc Test for Kruskal–Wallis Test (Dunn–Bonferroni Test) (source: own elaboration)
Table 7 Cohen’s d (source: own elaboration)

The Dunn–Bonferroni test confirms H1 to H6, as companies with QMSs or EMSs achieve statistically significant higher performance scores in the environmental (V4), social (V8) and governance (V13) pillars than firms without these MSs. Furthermore, groups 2, 3 and 4 present statistically significant higher overall ESG scores (V1, V2) as compared to group 1, thereby making it possible to answer RQ1 positively. With respect to RQ2, the descriptive analysis of the full data sample reveals that group 2 has significantly higher ratings for nine areas (except V15), while groups 3 and 4 present enhanced performance in all ten ESG category scores, again compared to group 1. The values for Cohen’s d confirm these statements.

Furthermore, group 3 achieves significantly higher ESG scores (V1, V2) than group 2 due to significant outperformance in the environmental (V4) and governance (V13) dimensions; even though the management (V14) and shareholder (V15) scores do not differ significantly, companies with EMSs achieve considerably better values in the CSR strategy category (V16), which causes the outperformance in the pillar’s rating. Although the consolidated social pillar score (V8) is not significantly different between groups 2 and 3, companies with QMSs significantly outperform their counterparts with EMSs in terms of product responsibility (V12), while underperforming in the workforce (V9) and human rights (V10) categories. Thus, to answer RQ1 more precisely, it is concluded that EMSs appear to represent more effective business tools for enhanced ESG performance than QMSs. With respect to RQ2, it is important to mention that both MSs apparently share common strengths (V11, V14, V15), but also possess individual advantages (QMS: V12; EMS: V5, V6, V7, V9, V10, V16).

In terms of RQ3, group 4 statistically outperforms group 2 in the overall (V1, V2) and pillar (V4, V8, V9) scores, thus confirming H7 to H9 with respect to companies with QMSs only. There are no significant differences compared to group 3; nonetheless, the mean and median values for group 4 are higher in the overall scores (V1, V2) as well as the environmental (V4) and social (V8) dimensions, except for emissions (V6) and workforce (V9) matters. However, for the governance categories and pillar score (V13, V14, V15, V16), companies with EMSs alone present the highest mean and median values. In summary, H7 to H9 are confirmed with respect to firms with QMSs only, but not with respect to companies with EMSs only.

4.1.4 Step 4: Descriptive analysis of the control variables' sub-samples

Company size (CV1) appears to affect the magnitude of differences, as the Dunn–Bonferroni test reveals far more statistically significant differences between the four sample groups when it comes to large companies as opposed to SMEs. Furthermore, it is noticeable that large companies on average achieve higher ESG ratings than small firms. Nonetheless, companies with QMSs and/or EMSs significantly outperform firms without MSs in the overall ESG scores (V1, V2), regardless of their size. The same is true for the environmental (V4) and social (V8) dimensions, thus confirming H1 to H4. However, in the governance pillar (V13), small firms with EMSs and medium-sized firms with QMSs lack this statistically significant outperformance, thereby only partially supporting H5 and H6.

On average, European companies achieve higher ESG ratings than East Asian or North American firms, but companies with QMSs or EMSs achieve significantly better ESG performance (V1, V2) than companies without these MSs, regardless of the location (CV2). This outperformance also holds true for the social dimension (V8). However, European firms with QMSs lack this statistically significant outperformance in the governance dimension (V13) and, in East Asia, also in the environmental dimension (V4). For East Asia, the Kruskal–Wallis test even retains its null hypothesis for the shareholders score (V15). Hence, the analysis fully confirms H2, H3, H4 and H6, while only partially supporting H1 and H5.

Moreover, the nature of business operations (CV3) impacts ESG performance per sample group. For basic materials, consumer (non-)cyclicals, energy, industry and telecommunication services, the Kruskal–Wallis test retains its null hypothesis for the shareholders score (V15) and for the utilities sector also for the management category (V14) and, conclusively, the whole governance pillar score (V13). The statistically significant higher ESG performance (V1, V2) of companies with MSs holds true for all sectors except for energy, telecommunication and utilities, in which companies with QMSs do not present significantly better performance than companies without MSs. The same pattern appears for the same sectors as well as for basic materials for the environmental (V4) and social (V8) dimensions. For the energy sector, even companies with EMSs fail to outperform in the social pillar (V8). Regarding the governance pillar (V13), there are numerous sectors in which group 2 (consumer (non-)cyclicals, energy, finance, industry, technology, telecommunications, utilities) and group 3 (consumer non-cyclicals, technology, utilities) do not show statistically significant higher values than group 1. Hence, the analysis fully confirms H2 and only partially supports H1 and H3 to H6.

Although H7 to H9 are confirmed with respect to QMSs in the full sample analysis, the analyses of control variables deliver a mixed picture. Despite the fact that H7 holds true for medium and large firms (CV1) and all three regions (CV2) against group 2, statistically significant higher ESG scores in the environmental pillar (V4) are revealed only for industrial companies when it comes to business sectors (CV3). H8 does not hold true against group 2 when location is considered (CV2). Significant outperformance in the social pillar (V8) is visible only in the analysis of large firms (CV1) and companies classified as industrial (CV3). The same (CV1, CV2) accounts for H9 related to the governance dimension (V13), but for technology companies (CV3). Thus, although the full sample analysis confirms H7 to H9 with respect to firms with QMSs only, the analyses of the control variables reveal numerous exceptions, which calls for more detailed research in the future.

Table 8 shows the sample group with the highest mean value for the overall and pillar scores per control variable. This overview strengthens the tendency observed in group 4 to perform best in terms of the ESG score (V1) and the environmental (V4) and social pillars score (V8), regardless of the control variables, while the governance pillar (V13) appears to be affected most by the adoption of EMSs alone. Thus, Table 8 supports the findings of the full dataset analysis.

Table 8 Highest Mean Value by Sample Group for ESG Score and ESG Pillar Scores (source: own elaboration)

To summarize the findings of the descriptive analysis, Table 9 provides an overview of the confirmation status of the nine hypotheses, as well as exceptions detected in relation to the control variables.

Table 9 Findings from the Descriptive Analysis (source: own elaboration)

4.2 Cluster analysis

4.2.1 Step 1: Single-linkage method and ward method

The cluster analysis considers the ten ESG category scores. To detect outliers, the single-linkage method is applied. Therefore, nine data points are eliminated, which reduces the sample size from 4292 to 4283 companies. The outliers excluded are from all three regions and operate across various industries, and seven outliers have a large market capitalization. No outlier operates any QMSs or EMSs, and each company presents extremely low values for at least one ESG issue. The Ward method is applied to obtain homogenous groups with minimum variance. The resulting dendrogram, shown in Fig. 4, indicates clustering with two groups.

Fig. 4
figure 4

Retrieved Dendrogram (source: own elaboration)

4.2.2 Step 2: Test of data normality and Mann–Whitney U test

Both the Kolmogorov–Smirnov test and the Shapiro–Wilk test disprove data normality for the reduced sample with 4283 companies and for the two clusters. The Mann–Whitney U test verifies the clustering. Table 10 illustrates that there are indeed statistically significant differences in the central tendencies of all ESG indicators (p < 0.05).

Table 10 Independent-Samples Mann–Whitney U Test (source: own elaboration)

4.2.3 Step 3: Analysis of clusters

The cluster compositions are shown in Figs. 5 and 6. Cluster 1 contains 1515 companies, i.e. 35.4% of the full sample. The majority of cluster 1 has at least one MS in place. More specifically, 4.9% run QMSs, 12.5% EMSs and 42.0% operate both MSs simultaneously. Although 40.7% of the cluster does not have any MSs, the disproportionally low presence of companies without MSs is more obvious when looking at the horizontal distribution. Only 19.3% of the companies without any MSs make it into cluster 1, whereas the respective figures for companies with QMSs, EMSs and both MSs amount to 59.7%, 85.5% and 85.0%, respectively. Therefore, cluster 1 is clearly dominated by companies operating MSs. Cluster 2, on the other hand, with 2768 organizations, is clearly overpopulated by companies without any MSs (93.0%).

Fig. 5
figure 5

Description of Cluster 1 (source: own elaboration)

Fig. 6
figure 6

Description of Cluster 2 (source: own elaboration)

Regarding company size, cluster 1 in particular contains organizations with large market capitalizations (55.7%) and only a few small companies (11.6%). This tendency is underlined by figures from the horizontal analysis. Whereas 59.7% of all large companies are in cluster 1, only 13.3% of the small companies can be found there. This is clearly an anomaly, given that each company size represents approximately one third of the full sample. The vertical (32.7%) and horizontal (32.1%) share of medium-sized companies is reasonable, in light of the fact that cluster 1 makes up only around a third of the full sample. Thus, cluster 1 is dominated by large companies and, in turn, cluster 2 is characterized by small companies (41.6%) and an underrepresentation of large organizations (20.6%). This is in line with the observations and remarks concerning firm size and ESG ratings presented above.

When it comes to geography, North American (29.2%) and East Asian (27.9%) firms have almost the same weight in cluster 1, while companies from Europe are noticeably overrepresented (42.9%). Cluster 2 presents the opposite composition, with more than two thirds of enterprises located in North America (67.4%) and much smaller shares for East Asian (19.9%) and European firms (12.6%). The horizontal analysis reveals that 65.0% of European enterprises make it into cluster 1, whereas the respective figures for East Asia and North America are only 43.4% and 19.1%, respectively. This is consistent with the observations and remarks about location and ESG ratings mentioned above.

With respect to sectors, most organizations in cluster 1 operate in industry (17.0%), consumer cyclicals (17.2%) or finance (20.9%). Considering that this cluster represents only about one-third of the full sample, it is noticeable that 60.2% of the companies engaged in basic materials, 49.8% in consumer non-cyclicals and 44.6% in industry can be found here. Most organizations in cluster 2 are engaged in healthcare (14.1%), consumer cyclicals (14.2%) or finance (31.1%).

In addition to the numerous contrasts between the compositions of the clusters, there are also major ESG performance differences between clusters 1 and 2. As shown in Fig. 7, the mean values for the ESG indicators (V1 to V16) are higher for cluster 1 than for cluster 2, except for the ESG controversy score (V3). The smallest performance gap between the two clusters is detected in the shareholder score (V15).

Fig. 7
figure 7

Mean ESG Performance by Cluster (source: own elaboration)

Cluster 1 clearly presents higher ESG performance ratings. The overall ESG score (V1) achieves a mean of 63.73 and a median of 63.80; both values are more than 35 points higher than for cluster 2. The scores are comparably high with respect to the environmental (V4), social (V8) and governance (V13) pillars. At the level of single ESG issues, cluster 1 reveals particularly strong outperformance in terms of resource use (V5) and emissions (V6) in the environmental dimension; workforce (V9) and human rights (V10) in the social pillar; and CSR strategy (V16) in the governance pillar (see Table 11).

Table 11 Descriptive Analysis for ESG Performance Variables by Cluster (source: own elaboration)

Cluster 2 shows relatively low ESG ratings. In concrete terms, the overall score (V1) is only 28.02 on average, with a median value of 27.02. The respective values for the three ESG dimensions are especially low for the environmental (V4) and social (V8) dimensions, while the highest scores are detected in the governance pillar (V13). With respect to the numerous ESG issues, cluster 2 presents its highest performance in the management (V14) and shareholder categories (V15). These two indicators are also those with the lowest underperformance as opposed to cluster 1 (see Table 11).

In summary, the cluster analysis produces two large clusters; most of the companies with QMSs (59.7%), EMSs (85.5%) or both MSs (85.0%) are grouped in cluster 1, whereas most companies without MSs (80.7%) populate cluster 2. In addition, cluster 1 is characterized by a high percentage of large organizations and European companies. The first cluster shows significantly higher values for the ten ESG category scores, the three ESG pillar scores and the (combined) ESG score than the second cluster. In conclusion, the patterns detected through the cluster analysis support H1 to H6 and make it possible to answer RQ1 positively. The analysis offers insight into RQ2 by showing that cluster 1 outperforms cluster 2 regarding all ESG issues, while revealing the smallest performance gap for the shareholder category (V15). Referring to RQ3, the composition of the clusters supports H7 to H9 with respect to companies with QMSs only.

5 Discussion

The statistically significant outperformance of firms with QMSs and/or EMSs as opposed to companies without such MSs for all ESG category scores (except for V15 for group 2) aligns with previous research that revealed the positive impacts of these MSs on several issues in all three ESG pillars. Such as waste reduction (E) and improvements in customer (S) and internal (G) communication for QMSs (e.g. Sampaio et al., 2009; Zimon et al., 2021), and improved resources consumption (E), enhanced stakeholder relationships (S) and better manager involvement (G) for EMSs (e.g. Boiral et al., 2018). Therefore, the results support the literature review summarized in Table 1 and contribute to the debate regarding the positive relationship between QMSs/EMSs and CSP (e.g. Ferreira et al., 2019). Furthermore, it is noteworthy that, although both MSs have comparable benefits for certain areas, such as workforce (V9), product responsibility (V12) and management (V13) (see Table 1), the empirical results reveal varying magnitudes for these benefits as measured by ESG category scores, with group 2 significantly underperforming compared to group 3 for V9, outperforming it for V12 and presenting comparable results for V13. This contributes valuable in-depth information to the existing literature reviews about the benefits of implementing QMSs and EMSs that do not mention data-based, magnitude-related differences between both types of MSs, such as Tarí et al. (2012) and Aba and Badar (2013). Furthermore, in regard to stakeholder theory, this study evidences the MSs’ focus on specific stakeholder groups, such as QMSs’ overperformance in V12 being mainly beneficial for customers and EMSs’ V9 overperformance being favourable for employees.

In addition to discussing the results of the full sample, more light should be shed on the deviations detected in relation to the control variables. The descriptive analysis reveals more statistically significant differences between the four sample groups for large companies than for SMEs. Furthermore, cluster 1 presents strong underrepresentation of small firms, thus demonstrating that large companies are more likely to achieve higher ESG scores. These findings relating to company size are consistent with previous research on ESG ratings (e.g. Drempetic et al., 2020) and might be due to the fact that SMEs have fewer resources to implement environmental strategies (e.g. Stubblefield Loucks et al., 2010) and because firm size moderates issues such as stakeholder pressure and impacts media coverage (e.g. Darnall et al., 2010; Seroka‐Stolka & Fijorek, 2020), which, in turn, affects quality and environmental disclosure (e.g. Dienes et al., 2016; Junita & Yulianto, 2018; Solikhah & Subowo, 2020). Furthermore, the analyses confirmed that European companies tend to achieve higher ESG ratings than firms from East Asia or North America, a finding that is generally aligned with previous cross-regional sustainability research (e.g. Thanetsunthorn, 2015). The geographic heatmap of ESG performances for 2018 displayed by Daugaard and Ding (2022) visualizes the ESG scores around the globe and shows that also other providers of ESG data (these authors used Sustainalytics as data source) confirm the European ESG-related superiority. Such geographical differences in CSP might be due to different sociocultural systems, legal frameworks and stakeholder pressure for sustainability in the three regions (e.g. Camilleri, 2015; Rosati & Faria, 2019; Singhania & Saini, 2021; Tran & Beddewela, 2020; Yu & Rowe, 2017). Furthermore, it should be noted that such formal and informal institutional frameworks also play a pivotal role in facilitating or obstructing the difussion of standards (e.g. Delmas & Montes-Sancho, 2011; Orcos et al., 2018), including promotional, informational, financial and legal measures (Pantelitsa et al., 2018), which, in turn, impacts ESG scores, as demonstrated by this study. Therefore, it is worth noting that the European and Asian countries included in the sample experience greater QMS and EMS diffusion rates than North American countries (ISO, 2021).

Comparable normative and coercive pressures might also contribute to the deviations detected regarding sectors. Business sectors have varying levels of competition and stakeholder pressure (e.g. Betts et al., 2015; Yalabik & Fairchild, 2011), as well as varying needs, motivations and barriers regarding MSs implementation. As indicated in ISO (2021), the tendency to adopt QMSs and EMSs does indeed differ among sectors. Moreover, the documented impact of the nature of business operations on ESG scores might be partially explained by the differing degree of ESG transparency among sectors (e.g. Tamimi & Sebastianelli, 2017). The cluster analysis, however, with its two distinctive clusters of ESG performance patterns, clearly reveals that cluster 1 is overpopulated by companies with MSs, which holds true for every control variable (except for the industrial sector). Although even companies without QMSs or EMSs are found in the cluster with the higher ESG scores, this likelihood appears to be connected to the sector type, location and firm size. Future research should seek to gather more data on the variances identified in relation to the control variables, as well as on possible interdependencies among these.

In summary, the cluster composition supports the proposed ESG-related advantages of adopting MSs. Furthermore, companies with EMSs or both MSs are more likely to be in cluster 1 (on average 85.5% and 85%, respectively) than firms operating with QMSs only (59.7%) for most control variable inputs. This is in line with both the descriptive analysis of the full sample, which shows that group 3 outperforms group 2 in several ESG categories (see Tables 4 and 6), as well as the summarized literature review (see Table 1), which only reveals ESG-related benefits of EMSs for some areas, such as emissions (e.g. Russo, 2009) and regulatory compliance (e.g. Bravi et al., 2020; Morrow & Rondinelli, 2002). Hence, it appears reasonable that combining both MSs is significantly more favourable than operating with QMSs alone (thus confirming H7 to H9 for QMSs). However, this combination leads to slight decline in performance in the governance dimension as opposed to running EMSs only (thus refuting H7 to H9 for EMSs). This might be due to the duplication of tasks and the suboptimal use of resources when multiple separate MSs are in place (e.g. Lim et al., 2020) or the negative effects of carrying out practices with comparable goals (compare, for example, Franco et al., 2020) outweighing the potential benefits of combining the systems. This contributes to the line of discussion related to complementarities in the capabilities required for QMS and EMS adoption and their impact on business performance (e.g. Allur et al., 2018; Ferrón Vílchez & Darnall, 2016). Moreover, this result calls for more detailed studies on the ESG-related impacts of having multiple MSs, while distinguishing if companies simply add or actually integrate these systems (Sampaio et al., 2012), as integration can lead to a reduction in administrative burdens and progress in the sustainable development of corporations (Jørgensen et al., 2006), among other benefits. Regrettably, it is not possible to draw any conclusions from the study sample about either the integration level (none, partial or full) (Asif et al., 2010; Bernardo et al., 2017) nor the corresponding integration strategies (QMS or EMS implemented first or simultaneous implementation) (Karapetrovic & Willborn, 1998). Therefore, addressing the integration maturity level (Domingues et al., 2016), which evidently affects CSP (Poltronieri et al., 2018, 2019), would contribute additional knowledge related to the results of this work.

6 Conclusions

The literature suggests that ESG themes may be anchored in MSs (Schmid et al., 2017), thus leading to increased scores in certain pillars (Broadstock et al., 2021), and this paper aims to empirically prove that quality and environmental MSs are indeed suitable business tools to achieve significantly higher performance in the environmental, social and governance dimensions.

The analysis reveals two major clusters, which demonstrate quite different ESG score patterns for firms with and without the aforementioned MSs. The findings support hypotheses H1 to H6 as well as H7 to H9 for firms with QMSs, while revealing some exceptions related to the control variables. In summary, the work concludes that both QMSs and EMSs enable companies to achieve enhanced ESG performances (RQ1), thus being suitable business tools for addressing sustainability-related stakeholder demands. It is further demonstrated that, despite sharing certain comparable sustainability-related benefits, MSs present varying strengths and weaknesses when it comes to tackling specific ESG categories, while, overall, EMSs achieve a greater impact than QMSs on ESG pillar scores (RQ2). Consequently, combining both MSs leads to statistically significant improved ESG performance compared to operating QMSs alone, whereas the combination leads to slightly, albeit not significantly, improved scores in the environmental and social pillars and minor performance losses in the governance dimension compared to operating EMSs only (RQ3). Through these conclusions, this work makes three key contributions to the literature and allows to derive several academic, managerial and policy-related implications aimed at satisfying stakeholders’ needs for greater CSP.

First, this paper contributes to the literature on the impact of QMSs and EMSs on companies’ ESG performance (e.g. Chams et al., 2021; Miralles-Quirós et al., 2019) by directly linking the concept of ESG ratings to quality and environmental MSs. Thereby, the focus is on all three pillars simultaneously as opposed to one dimension alone (e.g. Alsayegh et al., 2020; Frolova & Lapina, 2015; Russo, 2009). In this context, sorting the benefits of implementing QMSs and EMSs by a detailed ESG classification, which is broadly used and accepted by practitioners, represents a valuable step. Second, to the best of the authors’ knowledge, this is the first study to quantitatively investigate the relationship between MS implementation and ESG scores. Thus, it contributes to the academic literature by empirically proving the positive impact of QMS and EMS implementation on ESG performance through a large-scale, cross-regional analysis. Thirdly, this study sheds some additional light on the advantages of MSs in the context of the stakeholder theory, as it shows that their adoption leads to positive developments in CSP-relevant organization/stakeholder relations such as workforce, customers and community as well as in the environmental dimension.

6.1 Managerial implications

The results show corporate executives that MSs adoption represents a way of successfully responding to the increasing CSP demands of stakeholders in areas such as product responsibility, which is best addressed by QMSs, and resource use and emissions, which are best addressed by EMSs. Decision-makers find out about the single ESG-related benefits of QMSs and EMSs with respect to the numerous stakeholder issues, as well as how combining them can impact CSP. This enables them to implement MSs in accordance with their firm’s individual sustainability needs. In view of the global green awakening and its influence on business success (e.g. Hoffman, 2018; Weidinger, 2014), such knowledge will likely become a competitive advantage for enterprises and a benefit for their stakeholders (e.g. Cantele & Zardini, 2018; Kahupi et al., 2021; Laszlo & Zhexembayeva, 2017).

6.2 Policy implications

The findings of this work support studies that declare MSs to foster CSP (see Table 1), thus emphasizing the importance of their international diffusion (Heras-Saizarbitoria & Boiral, 2013). Therefore, regulators should take advantage of the fact that companies view regulators as the stakeholder group with the strongest influence on organizations’ environmental sustainability efforts (Deloitte, 2021). The differences detected in ESG scores across regions and company sizes call for greater standardization in sustainability reporting (e.g. Mynhardt et al., 2017). In addition, to encourage CSP across all industries, policymakers must closely monitor which sectors are shifting towards greater sustainability due to pressure from certain stakeholder groups, and which sectors require additional institutional pressure to increase ESG practices, thus allowing coercive and regulatory forces to be balanced to foster the global diffusion of standards (e.g. Braun, 2019; Delmas & Montes-Sancho, 2011).

6.3 Academic implications

The relationship identified allows deepening the research on which MSs can lead to a better ESG performance. Thus, the importance and impact of MSs implementation as well as their internalization is still crucial to make companies more efficient and sustainable. Also, the stakeholder theory framework has been identified as important as stakeholders can be the drivers for implementing more sustainable practices, such as MSs.

6.4 Limitations and future research

Future research should be directed at overcoming this study’s limitations as well as enlarging and/or specifying the research scope. Firstly, the chosen database and its ESG classification–ESG database providers use their own methodologies (Avetisyan & Hockerts, 2017), thus conceptualising the ESG dimensions differently (Saadaoui & Soobaroyen, 2018)–impact the availability and quality of data. Hence, subsequent research should consider different databases to support the outcomes. Secondly, the study is intentionally directed at QMSs and EMSs in general, thus providing space for both either restricting this focus to specific MSSs (such as ISO 9001 and ISO 14001) or expanding it to other types of MSs (such as OHS) or related practices. Thirdly, the study’s data sample makes no statements regarding the integration level (e.g. Karapetrovic, 2002) of companies with both MSs or if other management-related practices are in place (e.g. Franco et al., 2020), which is why future investigations should shed light on the degree of integration, firm-specific circumstances and their impacts. Fourthly, albeit the country-clustering considers common economic, cultural and regulatory features, there are nevertheless likely to be certain MSs-related differences among countries from the same regions (e.g. Pan, 2003), which is why more in-detail research is needed for single countries. Fifthly, the chosen methodology implies certain limitations. Despite conducting a time filtering, this study is not longitudinal but only depicts the year 2019, thus demanding to verify the outcomes for other time periods (see, e.g. the longitudinal panel data analysis applied by Hernandez-Vivanco et al. (2019) for combinations of MSSs and firm financial performance). Moreover, applying other methodologies such as the mentioned panel data analysis (Homburg et al., 2017; Yıldırım, 2021) and structural equation modelling (SEM) (Barrett, 2007) might enable researchers to draw additional or adjusted conclusions and give a broader picture of the relationship between MSs implementation and ESG performance.