1 Introduction

The complexity of global supply chain (SC) environments and the lack of relevant information resulted in unpredictable and uncontrollable SC uncertainties in the past. Existing quantitative research about SC risk management is mostly based on the assumption that parameters of SC risks are known [7], but does not mention how to retrieve and process these values in practice. In today’s big data era, more and more shop floor and SC data are measured by integrating SC actors and external data sources (e.g., traffic data, weather forecasts) as well as by adopting advanced technologies such as sensor, identification, and positioning technologies. SCs become more visible to executives, but also more complex due to information overload. The increasing volume, velocity and variety of globally available data imply that SC planning requires accessible, on-demand, and near real-time information retrieval techniques and decision support. Big data is defined as a collection of data (sets) so large and complex that they are difficult or impossible to process with traditional database management tools or data processing applications [23]. In recent years, advanced data processing technologies for handling big data have become available, often using the well-known approach of divide and conquer applied in scalable IT infrastructures. In this context, cloud computing offers novel options to flexibly and economically use scalable technologies and services forming a basis for cloud-based decision analytics [11]. Thus, low computation time and high quality solutions of optimization methods, which were mutually exclusive in former research [2, 24], are able to get along in harmony by using cloud computing. While the data and advanced technologies are available, we identify a lack of integrative approaches to facilitate real-time monitoring and more accurate forecasting of SC risks for designing flexible SCs under uncertain environments. According to Tang [22], collaborative planning, forecasting, and replenishment (CPFR) strategy improves SC resiliency. The CPFR system generates common demand forecasting for SC partners, shares inventory information, and adopts a common ordering rule which will be promoted by incorporating big data technologies and analytics. Although more and more companies realize the importance of adopting big data techniques into supply chain management (SCM), a lack of research in this area can be identified.

In this paper, we present a framework for integrating big data into supply chain risk management (SCRM) based on analytic methods, such as multi-stage stochastic optimization techniques and cloud infrastructures. We focus on handling SC operational risks and low frequency high impact SC disruption risks. Scenario-based analysis [18], which has been successfully used in SC planning problems [25, 27], will be used to support decision making. Thus, the proposed framework provides guidelines of handling SC risks in the era of big data. To the best of our knowledge, this is the first approach to incorporate big data into SCRM. According to this framework, global SC will be able to handle SC risks with low SC costs. The framework also facilitates real-time monitoring, emergency planning, and decision support immediately when incidents happen. As such, the paper is a first step towards a new direction in SCM and interdisciplinary research in respect of information systems research and operations research.

The remainder of this paper is organized as follows. Section 2 provides a theoretic background on SCRM, big data, and reasons for using cloud technologies. SCRM relevant big data is analyzed and classified in Sect. 3. In Sect. 4, the overall framework and main modules are introduced in detail. Finally, a conclusion is presented in Sect. 5.

2 Background

SC risks are generated from SC internal and external uncertainties. SC internal uncertainties are mostly foreseeable based on SC internal data, which is collected by advanced technologies within the production and transportation systems. SC external uncertainties stem from SC external environments, such as from social, economic, and natural environments. Obviously, external uncertainties are more complicated and multifaceted so that advanced analytic methods and decision support systems are required for external uncertainty analysis. Due to the increasing complexity of global SCs, decision support systems are indeed becoming indispensable tools for SCRM. In this context, Dadfar et al. present global SC risk mitigation strategies [5]. In order to manage disruptions and mitigate risks in manufacturing SCs, Giannakis and Louis [9] propose a framework of a multi-agent based SC decision support system. An interesting viewpoint is that SC risks come from a lack of confidence in the SC [4]. Two main elements of improving SC confidence are visibility and control. Visibility is strengthened by SC information sharing, control could be enhanced through SC event management, which involves collection and exchange of data on events from and between SC partners, respectively. Both visibility and control require information and communication technologies (ICT) to retrieve relevant information. Several works focus on an efficient use of information in SCs.

For analyzing risks in the context of SCM, a widely used technique is scenario analysis. Scenario analysis is regarded as a thinking tool and a communication device that aids the managerial mind rather than replacing it [20]. A scenario is an internally consistent view of what the future might turn out to be – not a forecast, but one possible future outcome [17]. The uncertainty of the future can be appraised through the number of possible scenarios within the field of probables [10]. For instance, SC stochastic scenarios are indicated by a group of scenario indexes [7], such as possible victim locations (SC nodes or transportation links) and their possibilities, reconstruction times after risk events, extra times and extra costs for adopting alternative planning after the event, etc. The first step of scenario analysis is scenario design. According to the “iceberg” metaphor in [3], a series of factors should be thought through during the process of scenario building which includes resources, culture, information, technology, policy, policy distribution, regulation, demography, legislation, ecology, society, and territory. Since a wide range of unstructured and real-time changing data is incorporated, the scenario design process becomes a big data analysis process. Consequently, these processes must be supported by appropriate big data infrastructures and analytic methods, such as the Progressive Hedging Algorithm (PHA) [18].

Similar to other methods, the computing time of PHA is quite high for large amounts of data and respective scenarios [26]. A cloud-based infrastructure could be used to economically deploy scalable computing clusters when needed, enabling a near real-time computation of large scale problems [26] as well as on-demand applications that provide decision support for SC planners and decision makers.

Big data technologies are defined as a new generation of technologies and architectures. They are used to economically extract useful values from very large volumes of a wide variety of data. High-velocity and real-time capture, discovery, processing, and/or analysis are supported by big data technologies and analytic methods [8]. Further, an important aspect of big data technologies and analytic methods is a user-oriented presentation and visualization of data and results for supporting decision making.

The number of unpredictable “black swans [21],” which is used to describe low-frequency and high-impact events, is getting smaller with big data analytics [19]. The authors of [19] also propose that one of the most significant aspects of big data analytics is to foresee events before they happen by sensing small changes over time. “JD.com,” one of the most frequently used Chinese e-commerce companies, is forecasting customer demands 28 days in advance through big data technologies which is one of JD.com’s critical success factors [14]. Historical records see a remarkable relationship between JD.com products’ customer demand and its advertisements’ click rate. Thus, demand can be forecast through tracking products’ advertisements click frequencies. By forecasting demand four weeks in advance, the company maintains a low inventory level as well as short delivery lead time. Amazon is another example of using advanced technologies and analytics for predicting demands, such as based on the pattern of products searches [19]. These case studies demonstrate that both companies and customers can benefit from big data analytics. The more companies characterize themselves as data-driven, the better they perform on objective measures of financial and operational results [16]. According to an investigation in [1], however, most companies are far from accessing all the available data. Often, companies do not have the expertise and processes to design experiments and extract business value from big data.

Big data technologies and analytics, however, rely massively on flexible ICT infrastructures. As computational requirements of analytical systems heavily fluctuate, especially in SCM where those systems are mainly used for planning purposes, a flexible ICT infrastructure can be a huge cost saver. The effect is increased by a growing amount of data to be processed and analyzed. In recent years, cloud computing has become popular as a way of using ICT infrastructure and services on-demand. These scalable cloud services are offered through a network, mostly based on a usage-based pricing model. Cloud services are grouped into three categories: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) [15]. Consequently, computing resources can be automatically adopted to the needs of analytical tasks in SC planning and replanning processes as well as released to reduce expenses. Moreover, the accessibility and standardized interfaces of cloud services allow a better collaboration and information sharing among SC actors. Although concerns and issues, such as related to the interoperability of cloud services, are still discussed in practice and research (see, e.g., [12]), we consider cloud computing as a promising solution for supporting flexible big data analytical systems in the area of SCM.

Table 1. Examples of SC internal data and big data

3 Big Data in SCRM

In the background of fast development of information technologies, the amount of acquirable SC data increases over time. In our framework, SCRM will be improved by collecting, analyzing, and monitoring SC real-time data. SC data, which will be used to support SCRM, is classified into two categories: SC internal data and SC external data. SC internal data refers to data collected from SC partners. The multi-source internal data is unstructured due to being collected from different organizations and different data terminals, such as point of sales (POS) terminals, global positioning system (GPS), and sensors. Table 1 lists applications of SC internal data and big data in SCRM. Potential SC risks, such as product quality issues and transportation delays, are able to be forecast in advance and traced afterwards based on the analysis of SC internal data. Important data, which should be collected from each big data source is depicted in the column “Description.” The volume of SC internal data is positively related to the scale of a SC, such as the number of SC partners, products and services. SC internal data grows frequently and is required to be stored for a number of periods, such as several months, in order to trace causes after failures and predict risks. Thus, SC internal data, which is relevant for SCRM, may use different structures and increase over time.

Table 2. Examples of SC external big data
Fig. 1.
figure 1figure 1

Module overview

SC external data refers to data collected from public news, social media, etc. Table 2 depicts applications of SC external big data in SCRM. SC external data is larger and more complicated than SC internal data. It reveals potential disasters and uncertainties in external environments. For instance, exchange rate movements can be forecast from time series of Tweet counts [13]. Twitter messages can also be used for rapid detection and qualitative assessment of earthquakes [6]. Information from media is diversified not only in its formats and contents, but also in its languages and reliabilities. Furthermore, media data increases faster than SC internal data. These bring about toughness for information discrimination and analysis. Our intention is to find out potential risks and ongoing disasters from SC external big data as early as possible. Due to the requirements of SC external big data collection and analysis, we suggest to outsource SC external big data processing tasks to professional third-party analysis (3PA) companies.

4 Framework

The focus of our framework is to make a robust SC plan under stochastic environments. Monitoring and planning are two crucial parts of the framework. Environment monitoring and analysis provide stochastic parameters for the SC planning process. As depicted in Fig. 1, the SCM system in our framework includes three main modules: SC planning module, SC internal module, SC external module. A SC plan is generated from a SC planning module. SC internal and external modules monitor SC internal and external environments, respectively. Once an emerging risk is detected by an internal or external module, a new risk report is generated based on risk analysis and is sent to the SC planning module. The SC planning module will be activated and a renewed SC plan is generated based on the renewed risk report. The whole process runs in a circle to maintain flexibility of a SC under stochastic environments. The functions of each module will be demonstrated in the following.

Fig. 2.
figure 2figure 2

SC planning/replanning module

4.1 SC Planning/Replanning Module

As SC flexibility is decided at the planning stage, we start to consider related tasks before regarding risks. In order to design a flexible SC plan allowing a smooth supply under stochastic environments, SC risks should be taken into account at the SC planning phase (see Fig. 2). The process of the SC planning module and the SC replanning module is the same. Scenarios, which represent SC uncertainties, should be designed based on SC internal and external risk reports. Then, a stochastic model is built based on proposed scenarios. Analytic methods are applied to solve the model after putting SC parameters into the model. Solutions of the model are used to support the SC plan at last. The whole process is demonstrated in detail in the following.

SC internal reports should be provided by companies of the SC since internal data is normally privately owned. The process of how to generate a SC internal risk reports will be introduced in detail in Sect. 4.2. SC external reports should be provided by 3PA companies since external data processing work is complicated and will be a waste of resources and energies for a single SC or company to do. The detailed process of how to generate a SC external report will be demonstrated in Sect. 4.3.

Scenario Design. The scenario analysis, as a common approach to deal with stochastic problems in practice, is adopted. Scenarios are designed based on SC internal and external risk reports. Each scenario refers to a distinguished kind of consequence after disruptions. Probability and costs are two essential features for each scenario. The costs depend on a series of factors: geographic areas involved in the scenario, time of duration of the scenario, extra costs under the realization of each scenario, etc. In order to calculate the costs, an emergency plan is taken into consideration for each scenario.

Modeling. A two-stage multi-scenario model is built based on proposed scenarios and scenario features. The first stage refers to the safe period of a SC without any disruption or catastrophe. The second stage refers to the uncertain period when a SC may suffer any of the proposed scenarios. Values of scenario parameters and other SC parameters are inputs of the model.

Analytic Methods. Analytic methods, such as optimization, simulation, heuristics, and metaheuristics, can be chosen to solve the model. Due to the complexity of multi-scenario models, metaheuristics seem to be good approaches to be adopted in our framework. PHA is an option for solving the proposed two-stage multi-scenario model. For models with simple SC structures and a small amount of scenarios, solutions are generated in tolerable amount of time. A cloud platform serves as the underlying computing infrastructure and can be flexibly adopted to the computational requirements of solvers, in particular important for complex SC structures with a large amount of stochastic scenarios. That is, computational tasks are sent to a cluster of computing nodes in order to accelerate solving the model. The related cloud computing nodes are purchased and released on-demand. An SaaS solution provides an interface for setting up the model and presenting results in different views for different stakeholders. A set of two-stage solutions will be generated for the two-stage model. The first stage solution refers to the safe period SC plan. Second stage solutions refer to the emergency plans, which are related to the realization of scenarios.

SC Plan. Based on the solutions of the multi-scenario model, a two-stage SC plan, including a safe period plan and emergency plans for uncertain periods, can be generated.

4.2 SC Internal Module

A SC internal risk in the SC internal monitoring and risk analysis framework (see Fig. 3) refers to foreseeable risks based on analysis of SC internal data (see Table 1). SC internal risks can be forecast based on the analysis of SC internal data. SC monitoring is used to detect emerging risks. The corresponding emergency plan is adopted immediately after a stochastic event is detected. The SC internal risk report will be renewed and sent to the SC replanning module if the stochastic event is not a short-term event. SC internal data refers to purchasing, production, transportation, final demand, etc. (see Table 1). Detailed data is helpful for forecasting and controlling. For example, an Intelligent Maintenance System (IMS) is able to predict and prevent machines’ potential failures by an analysis of collected data from the machinery. In the following, components of the SC internal module are explained briefly.

Fig. 3.
figure 3figure 3

SC internal monitoring and risk analysis

Data Analysis. Data analysis provides a risk report and benchmarks of SC parameters for the SC monitoring process. Data analysis methods, such as data mining and machine learning, can be adopted at this stage.

SC Monitoring. SC real-time monitoring is used for sensing SC changes and foreseeing SC risks. Monitoring helps SC managers to figure out sudden events and forecast SC risks as early as possible.

Adopting the Emergency Plan. Once sudden events or potential uncertainties are detected, the corresponding emergency plan, which is specified in the SC planning module, is utilized in order to get more available time. For short-term disruption events, SC restores the original SC plan after the short disruption period.

Renewing Internal Risk Report. For long-term impact events, the SC internal risk report is modified in order to activate the SC replanning module. An internal risk report should at least contain probabilities and a description of uncertainties for each SC partner, impacts, duration, and costs of each uncertain event.

Request for SC Replanning. Once the internal risk report is renewed, the SC replanning module is activated. A new SC plan will be generated and launched afterwards.

4.3 SC External Module

SC external big data is mostly unstructured and growing fast since it has a wide range of aspects from various channels, such as public media, social networks, and professional databases (see Table 2). Professional infrastructures and personnel are needed for big data processing and forming an external risk report. However, a company’s external risks depend on its geographic location and industry background. It means that external risks for companies, which locate at the same geographic area and provide similar products, are similar. In order to fully utilize resources, the SC external risk analysis in our framework (see Fig. 4) is outsourced to a professional 3PA company that monitors the environment based on data mining and data analysis. The domain of the external environment should be defined by the 3PA company according to its customers’ background at first. SC external risk reports are formed based on external data collection and risk analysis. Monitoring and sensing are used to detect emerging risks in the external environment. Risk analysis is adopted again once an emerging risk is captured. The renewed risk reports will be sent to customers after risk analysis. The SC replanning module will be activated once a new risk report is achieved. Explanations of main processes of the SC external module are listed as follows.

Fig. 4.
figure 4figure 4

SC external monitoring and risk analysis

Defining/Updating Monitoring Domain. In order to abstract valuable data efficiently, a domain of external environments should be defined at first. Only data which will have explicit impacts on a SC is analyzed.

Data Collection and Risk Analysis. The characteristics of external environments data are large volume, unstructured, and increasing over time. Thus, data collection techniques, such as web crawling and text mining, are used to extract information from websites and web services. Advanced data analysis technology is required for analysis of SC external big data. The purpose of external risk analysis is to find out external threats and parameters of each threat. External threats include bad weather, policy changes, economic changes, social changes, terrorist attacks etc. Parameters of each threat may refer to its geographic region, possibility, and severity.

Monitoring and Sensing. Real-time monitoring and sensing of external data is required to detect emerging risks by 3PA companies. The risk analysis task is triggered once an emerging risk is discovered.

Risk Analysis. The risk analysis process is used to form new risk reports for customers of 3PA companies. External risk reports should at least include information on uncertainties at the location of SC partners and during transportation of products. This encompasses parameters of probabilities, duration, impacts of uncertain events. The cost for each uncertain event should be designed and calculated by the SC itself since it depends on emergency plans, which should be decided by companies of the SC.

Sending Reports to Customers. Risk reports are sent to customers of 3PA companies after being renewed. Once a renewed external risk report is received by the SC planning module, the SC planning module is triggered to generate a new SC plan.

5 Conclusion

SC risks relevant big data is analyzed and classified into SC internal big data and SC external big data. Based on the SC big data classification, a framework that incorporates big data technologies into the SCRM system is proposed. Research about the application of big data in SCM is very rare. This paper provides a guidance of utilizing big data to improve SCRM. Big data technologies provide opportunities of prediction and detecting potential SC risks as early as possible so that the SC becomes more visible and flexible. Further research needs to be done for the implementation of our framework into practice. Technologies and approaches for abstracting valuable information efficiently and accurately from big data resources should be applied. In this context, we plan to implement and evaluate a prototype that provides decision support for SCRM by utilizing the proposed framework. For this purpose, cloud technologies should be combined with PHA.