Keywords

1 Introduction

With the continuous development of the Internet and cloud computing, companies use databases to store and perform analyses on large data-sets in cloud environments. These companies demand high performance databases when reading and writing data. In addition, they want to benefit from best practices encoded in the form of cloud patterns [6], which are general and reusable “good” solutions to recurring design problems for cloud-based applications. Design patterns have been applied to all fields of software engineering, including cloud computing. These patterns were refined to take into account the specificities and requirements of the cloud. To the best of our knowledge, none of previous works, that have benchmarked cloud applications [5], investigated the combined impact of databases and cloud patterns on the energy consumption of cloud-based applications. Consequently the benefits and trade-offs of different databases and combinations of cloud patterns are mostly intuitive and not validated. In this paper, we evaluate the impact on energy consumption of three cloud patterns: Local Database Proxy, Local Sharding-Based Router, and Priority Message Queue, with three databases: two relational databases, PostgreSQL and MySQL, and one NoSQL database, MongoDB. To achieve this goal, we use three versions of three cloud-based applications (a RESTful multi-threaded application, DVD Store, and jPETStore) that use respectively MySQL, PostgreSQL, and MongoDB databases. We also implement the three studied patterns in each version of these applications. We choose these databases because they are the most popular relational databases in the last few years [2, 4]. We measure energy consumption using the Power-API [3], which estimates the energy consumed by an application at the process-level. Our results show that MySQL database is the least energy consuming among the three databases and PostgreSQL is the most energy consuming among them. MongoDB consumes more energy than MySQL but less than PostgreSQL. We also show that various combinations of patterns impact energy consumption. The rest of the paper is structured as follows. Section 2 provides the most closely related works to our study. Section 3 presents the cloud-based applications used by our study and the design of our experiments. Section 4 discusses the results of our experiments. Section 5 discusses our results and possible threats to their validity. Section 6 concludes with some future works.

2 Related Work

The most closely related work to ours is by Abtahizadeh et al. [1]. They conducted an empirical study that aimed to compare the energy efficiency of the same three cloud patterns performed in our study. However, they only considered MySQL database in their work. Their results show that cloud patterns can reduce the energy consumption of a cloud-based application but only in some specific cases. In the same direction, Manotas et al. [7] conducted an empirical study in which they investigated the impact of four web servers on the energy consumption of a web application. They showed that the energy consumption of a web application depends on the web server used to handle requests, where each web server can increase or decrease the energy consumption of the web application, depending on the features for which it is executed. Sahin et al. [8] investigated the energy efficiency of 15 structural, behavioral and creational design patterns, implemented in an application. For each pattern, they examined the energy consumption of the versions of the application before and after applying design pattern. Their results show that design patterns have a significant impact on energy consumption, where certain design patterns like Decorator can increase the energy usage of an application by up to 700 %.

3 Study Design

In this section, we introduce our research questions, describe the objects and the design of our study, and analysis method. Our research questions are:

  • RQ1: Does the choice of MySQL, PostgreSQL and MongoDB Databases affect the energy consumption of cloud applications (when no cloud patterns are implemented)?

  • RQ2: Does the implementation of Local Database Proxy, Local Sharding Based Router or Priority Message Queue patterns affect the energy consumption of cloud applications using MySQL, PostgreSQL and MongoDB Databases?

  • RQ3: Do the interactions between Local Database Proxy, Local Sharding Based Router and Priority Message Queue patterns affect the energy consumption of cloud applications using MySQL, PostgreSQL and MongoDB Databases?

3.1 Objects and Design

In our experiments, we use a combination of databases and cloud patterns encoded using a letter and a number. The Local Database Proxy pattern has three implementation strategies: Random Allocation (P1), Round-Robin (P2), and Custom Load Balancing (P3). The Local Sharding Based Router pattern also has three strategies: Modulo Algorithm (P4), Consistent Hashing (P5), and Lookup Algorithm (P6). The Priority Message Queue pattern is called P7. The databases are named: MySQL (D1), PostgreSQL (D2), and MongoDB (D3). We performed each experiment on three different systems, because one system could be intrinsically more complex to understand. We deployed them on 10 virtual machines (2 master nodes and 8 slaves nodes) in a private cloud. At first, for Experiment 1, we implement and deploy a multi-threaded distributed application that communicates through REST calls. The application interacts with one of the three chosen databases. Sakila sample database is used as it contains a large number of records, making it interesting for experiments. We adapted The schema of the Sakila database to PostgreSQL and MongoDB databases. For Experiment 2 and 3, we use DVDStore and JPetStore systems. DVDStoreFootnote 1 is provided with the implementation of MySQL and PostgreSQL databases. We refactor the code of DVD Store to allow it to connect with a MongoDB database. Similarly, we also modified the code of JPetStoreFootnote 2 to implement connections to MySQL, PostgreSQL and MongoDB databases. We perform our experiments using different numbers of clients, which are simulated using a multi-threaded architecture. The number of clients simulated varies from 100 to 1500 clients. Each execution is done using different databases and different cloud patterns.

3.2 Independent and Dependent Variables

MySQL, PostgreSQL and MongoDB databases are the independent variables of our study. Also, the three studied cloud patterns, as well as the strategies of these patterns are considered as independent variables. The application response time (measured in milliseconds) and the energy consumption measured by Power-API (measured in joules) are considered as dependent variables.

3.3 Hypotheses

To answer our research questions, we formulate the following null hypotheses, where P0 is the experiment consisting in comparing the energy consumption and response time of the three versions of each application using respectively MySQL, PostgreSQL, and MongoDB databases. Px (x \(\in \) {1 ... 6}), and P7 are the different patterns. In each experiment we compare two versions of a same application implementing two different databases Dy, Dz (y, z \(\in \) {1, 2, 3} and y \(\ne \) z), with the same (combination) of patterns.

  • \( H^{1}_{0yz}\): There is no difference between the average amount of energy consumed by applications implementing databases \(D_{y}\) and \(D_{z}\) (without any cloud pattern).

  • \(H^{1}_{xyz}\): There is no difference between the average amount of energy consumed by applications implementing databases \(D_{y}\) and \(D_{z}\) in conjunction with patterns Px.

  • \(H^{1}_{xyz7}\): There is no difference between the average amount of energy consumed by applications implementing databases \(D_{y}\) and \(D_{z}\) in conjunction with the combination of patterns Px and P7.

To have more clear comprehension regarding the trade-offs between the energy consumption and the performance of a cloud-based application measured in terms of response time, we also formulate the following null hypotheses:

  • \(H^{2}_{0yz}\): There is no difference between the average response time of databases \(D_{y}\) and \(D_{z}\) by applying the design P0.

  • \(H^{2}_{xyz}\): There is no difference between the average response time of databases \(D_{y}\) and \(D_{z}\) by applying the design Px.

  • \(H^{2}_{xyz7}\): There is no difference between the average response time of databases \(D_{y}\) and \(D_{z}\) by applying the combination of designs Px and P7.

3.4 Analysis Method

To analyze our collected data (i.e., response time and energy consumption measurements), we performed the Mann-Whitney U test [9] to test the aforementioned hypotheses. Mann-Whitney U test is a non-parametric statistical test where its relevance is reflected in the assessment of two independent distributions. We also computed the Cliff’s \(\delta \) effect size because effect sizes are very important to understand the magnitude of the difference between 2 distributions.

4 Study Results

This section presents and discusses the results of our research questions.

4.1 Results and Answers to RQ1

Tables 1 and 2 summarizes the results of Mann-Whitney U test and Cliff’s \(\delta \) effect sizes for the energy consumption and the response time.

Average Amount of Consumed Energy: Results presented in Table 1 show that, without using any pattern (in other words, by applying the design P0), there is a statistically significant difference between the average amount of energy consumed by application using MySQL and application using PostgreSQL. The effect size in this case is medium. Therefore, we reject \( H^{1}_{0yz}\) for \(D_{y}\), \(D_{z}\) (y = 1, z = 2). However, there is not a statistically significant difference between the average amount of energy consumed by application using MySQL and application using MongoDB. Therefore, we cannot reject \( H^{1}_{0yz}\) for \(D_{y}\), \(D_{z}\) (y = 1, z = 3). Similarly, there is not a statistically significant difference between the average amount of energy consumed by application using PostgreSQL database and application using MongoDB database. In these two cases the effect size is small. Therefore, we cannot reject \( H^{1}_{0yz}\) for \(D_{y}\), \(D_{z}\) (y = 2, z = 3).

Table 1. Energy consumption p-value and Cliff’s \(\delta \)
Table 2. Response time p-value and Cliff’s \(\delta \)

Average Response Time: Results presented in Table 1 show that, by applying the design P0, there is not a statistically significant difference between the average response time of application using MySQL database and application using PostgreSQL database. Therefore, we cannot reject \( H^{2}_{0yz}\) for \(D_{y}\), \(D_{z}\) (y = 1, z = 2). However, there is a statistically significant difference between the average response time of application using MySQL database and application using MongoDB database. Similarly, there is a statistically significant difference between the average response time of application using PostgreSQL database and application using MongoDB database. Therefore, we cannot reject \( H^{2}_{0yz}\) for \(D_{y}\), \(D_{z}\) ((y = 1, z = 3), (y = 2, z = 3)).

4.2 Results and Answers to RQ2

Average Amount of Consumed Energy: These results show that by applying the Local Database Proxy pattern, there is a statistically significant difference between the average amount of energy consumed by application using MySQL database and application using PostgreSQL database. Similarly, also, between application using MySQL and application using MongoDB. Similarly also by application using PostgreSQL database and application using MongoDB database (where the effect size is large). But, except for the case where the proxy pattern is implemented using the random strategy, there is not a statistically significant difference between application using PostgreSQL database and application using MongoDB database. Therefore we reject \( H^{1}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {2, 3}, (y = 1, z = 2), (y = 1, z = 3)), but we cannot reject \( H^{1}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x = 1, y = 2, z = 3). By applying the Local Sharding Based Router, there is a statistically significant difference between the average amount of energy consumed by application using MySQL database and application using PostgreSQL database. Similarly also between application using MySQL and application using MongoDB (the effect size is large). But, there is not a significant difference between application using PostgreSQL database and application using MongoDB database. Therefore, we reject \( H^{1}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, (y = 1, z = 2), (y = 1, z = 3)), but we cannot reject \( H^{1}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, y = 2, z = 3).

Average Response Time: Results show that by applying the Local Database Proxy pattern, there is not a statistically significant difference between the average response time of application using MySQL database and application using PostgreSQL database. Therefore, we cannot reject \( H^{2}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {1, 2, 3}, (y = 1, z = 2)). However, there is a statistically significant difference between the average response time of application using MySQL database and application using MongoDB database. Similarly, there is a statistically significant difference between the average response time of application using PostgreSQL database and application using MongoDB database. Therefore, we reject \( H^{2}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {1, 2, 3}, (y = 1, z = 3), (y = 2, z = 3)). Further results, by applying the Local Sharding Based Router, there is not a statistically significant difference between the average response time of application using MySQL database and application using PostgreSQL database. Therefore, we cannot reject \( H^{2}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, (y = 1, z = 2)). However, there is a statistically significant difference between the average response time of application using MySQL database and application using MongoDB database. Similarly, there is a statistically significant difference between the average response time of application using PostgreSQL database and application using MongoDB database. Therefore, we reject \( H^{2}_{xyz}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, (y = 1, z = 3), (y = 2, z = 3)).

4.3 Results and Answers to RQ3

Average Amount of Consumed Energy: When we combine the Local Database Proxy pattern with the priority Message Queue pattern, results show that there is a statistically significant difference between the average amount of energy consumed by application using MySQL database and application using PostgreSQL database. Similarly also between application using MySQL and application using MongoDB (the effect size is large). The same is true for application using PostgreSQL database and application using MongoDB database (where the effect size is large). However, except applying the combination of the custom strategy with the Priority Message Queue pattern, there is not a statistically significant difference between application using PostgreSQL database and application using MongoDB database. Therefore, we reject \( H^{1}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {1, 2, 3}, (y = 1, z = 2), (y = 1, z = 3)), but we cannot reject \( H^{1}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x = 3, y = 2, z = 3). Also, when we combine the Local Sharding Based Router pattern with the priority Message Queue pattern, results show that there is a statistically significant difference between the average amount of energy consumed by application using MySQL database and application using PostgreSQL database. Similarly also between application using MySQL and application using MongoDB (the effect size is large). However, there is no a significant difference between application using PostgreSQL database and application using MongoDB database. Therefore, we reject \( H^{1}_{xyz7}\) for \(P_{x}\) \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, (y = 1, z = 2), (y = 1, z = 3)), but we cannot reject \( H^{1}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, y = 2, z = 3).

Average Response Time: By applying the Local Database Proxy pattern with the priority Message Queue pattern, there is not a statistically significant difference between the average response time of application using MySQL database and application using PostgreSQL database. Therefore, we cannot reject \( H^{2}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {1, 2, 3}, (y = 1, z = 2)). However, there is a statistically significant difference between the average response time of application using MySQL database and application using MongoDB database. Similarly, there is a statistically significant difference between the average response time of application using PostgreSQL database and application using MongoDB database. Therefore, we reject \( H^{2}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {1, 2, 3}, (y = 1, z = 3), (y = 2, z = 3)). Besides that, when we combine the Local Sharding Based Router pattern with the priority Message Queue pattern, results show that there is not a statistically significant difference between the average response time of application using MySQL database and application using PostgreSQL database. Therefore, we cannot reject \( H^{2}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5, 6}, (y = 1, z = 2)). However, there is a statistically significant difference between the average response time of application using MySQL database and application using MongoDB database. The combination of the Lookup strategy and the Priority Message Queue pattern there is not a significant difference. Similarly, there is a statistically significant difference between the average response time of application using PostgreSQL database and application using MongoDB database. The combination of the Lookup strategy and the Priority Message Queue pattern there is not a significant difference. Therefore, we reject \( H^{2}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x \(\in \) {4, 5}, (y = 1, z = 3), (y = 2, z = 3)), and we cannot reject \( H^{2}_{xyz7}\) for \(P_{x}\), \(D_{y}\), \(D_{z}\) (x = 6, (y = 1, z = 3), (y = 2, z = 3)).

5 Discussions and Threats to Validity

We showed that the implementation of the Local Database Proxy pattern does not impact the behavior of the databases but can significantly improve the energy efficiency of MySQL. Concerning the Local Sharding Based Router pattern, the Modulo strategy has a strong effect on the energy consumption of PostgreSQL and MongoDB databases but a small one for MySQL. Moreover, the Consistent strategy has a strong effect on the energy consumption of PostgreSQL but improves slightly the energy efficiency of MySQL and MongoDB. The Lookup strategy can significantly improve the energy efficiency of PostgreSQL and MongoDB. In addition, we showed that combining Local Database Proxy pattern with the Priority Message Queue pattern has no significant impact neither on the application response time nor on the energy consumed by the application, when it interacts with MySQL. This combination only has a small effect on the energy consumption of PostgreSQL and MongoDB. Interestingly, the implementation of the Local Sharding Based Router pattern with the Priority Message Queue pattern has a strong effect on the response time of the three Databases but without a significant impact on the energy consumption.

Our experiments, as any other experiment, are subject to threats to their validity. We now discuss these threats based on the guidelines provided by Wohlin et al. [10].

Construct validity threats concern the relation between theory and observations. In this study, they could be due to measurement errors. These measurements are subject to variation and perturbations depending of hardware and network. For this reason, we did several experiments, we conducted each experiment five times, and computed average values of these measurements.

Internal validity threats concern our selection of subject systems and analysis methods. Despite of using the three studied databases, the three cloud patterns and the two standard cloud applications, some of our findings may still be specific to our studied application which was designed specifically for the experiments. Future studies should consider using different RDBMS and NoSQL databases, and also other cloud applications implementing the cloud patterns.

External validity threats concern the possibility to generalize our findings. Further validation should be done on different cloud applications and with different relational and NoSQL databases and applying different cloud patterns to these databases can extend our understanding of the impact of databases on the energy consumption of cloud applications.

Reliability validity threats concern the possibility of replicating this study. We attempt to provide all the necessary details to replicate our study.

Finally, the conclusion validity threats refer to the relation between the treatment and the outcome. We mainly used non-parametric tests that do not require making assumptions about the distribution of the metrics.

6 Conclusion and Future Work

Nowadays, reducing energy consumption is a challenge for cloud-based applications. We contrasted the performance of various combinations of databases and cloud patterns in terms of energy consumption and response time of the cloud-based applications, with the aim to provide some guidance to software engineers about the usage of databases and cloud patterns for cloud-based applications. We carried on a series of experiments on different versions of a RESTful multi-threaded application implemented with three different databases and three different cloud patterns. We also used two standard cloud applications (DVD Store and JPetStore) because one system could be intrinsically more complex to understand. We showed that MySQL database is the least energy consuming but is the slowest among the three databases. PostgreSQL is the most energy consuming among the three databases, but is faster than MySQL but slower than MongoDB. MongoDB consumes more energy than MySQL but less than PostgreSQL and is the fastest among the three databases. As future work, we plan to examine how a match/mismatch between the selected database and the workload characteristic affects energy efficiency.