Adding long-term availability, obfuscation, and encryption to multi-cloud storage systems

https://doi.org/10.1016/j.jnca.2014.09.021Get rights and content

Abstract

Nowadays, storage services offer a new way for Cloud providers to do business. This new trend is proved by the number of Cloud storage providers that are continuously appearing on the market. By now, using Cloud storage services is becoming a common practice for end-users. However, the current Cloud storage providers do not offer any guarantees regarding long-term availability and privacy. In fact, data stored in the Cloud could be locked-in, lost, or violated in terms of privacy. In this work, we present an innovative system that on one hand allows end-users to simultaneously rely on different Cloud storage providers in a transparent way and on the other hand to enforce long-term availability, obfuscation, and encryption. Our system is highly reliable, in fact, if a provider is not temporarily or permanently available, end-users continue accessing their data in a secure way. In addition, only the end-users have the full control of the overall security of their data and no sensitive information are disclosed to Cloud storage providers. Several experiments allow us to discuss the performance of our system compared against existing solutions.

Introduction

Nowadays, Cloud Computing and the Internet of Things (IoT) generate a huge amount of data and that require to be stored in an efficient and secure fashion. Data may include personal users’ files or pervasive information coming from heterogeneous “everyday” devices belonging to Cyber-Physical Systems. Storing data over the Cloud represents a good opportunity for providers to increase their revenues. The business behind Cloud storage services is motivated by the increasing number of providers working in such a context. The well known providers include Dropbox, Google Drive, Copy, Amazon S3, and SkyDrive. Dropbox was the first provider experiencing with this business model, followed by Google with its Drive and many others. The service is quite simple. It consists of a Cloud storage space assigned to a user for free or with reasonable fee. Typically, the users can store their data directly on the provider, by means of a web 2.0 interface, or can store them in a given directory of their devices (e.g., Smartphone, Tablets, and Personal Computers) synchronized with the Cloud storage provider through a client application.

Despite the obvious advantages for the clients to use Cloud storage services to store any kind of data, several issues need to be addressed. With the current model, files are stored in a single Cloud provider. If the provider is not available anymore due to software/hardware failures or it becomes overloaded, the clients will temporarily not be able to access their files. A similar situation may occur when the contract established between the client and the provider expires. However, in these cases, the clients have a chance to recover their data. Instead, a more critical situation takes place if the Cloud provider suddenly disappears from the market. In this case, the clients will permanently lose their data. For these reasons, users do not have any warranty regarding “long-term availability” of their storage service.

Another critical aspect regards “privacy”. In fact, clients do not have any guarantee regarding the privacy of their data. This threat is really true if we think that Cloud operators are able to access all the files stored in their servers. Unfortunately, there is not any mechanism that prevents possible misusing of sensitive data. This concern is more concrete if we consider that the NSA-PRISM, surveillance program (2013) (i.e., US surveillance program) has recently claimed a direct access to Cloud storage providers including Google, Apple, and Facebook (NSA, 2013). They motivated their intrusion for national security and safety purposes.

In this paper, we propose an approach to make the utilization of Cloud storage services more reliable in terms of both long-term availability and privacy. As depicted in Fig. 1, our basic idea consists in simultaneously using different Cloud storage providers. Different from the traditional approach, each file is not stored as a whole in a single service provider, but it is divided into several pieces that are spread over different providers.

In order to achieve such goals, we introduce a redundancy factor. Using the Redundant Residue Number System (RRNS), we split a file into p+r chunks, where p represents the number of chunks required to reconstruct the original file, and r is the degree of redundancy.

For example, if we consider p=5 and r=4 for a total of 9 pieces of file, and 3 different Cloud storage providers, i.e., A, B, and C, we can store on each provider 3 chunks. Assuming that provider C is not available anymore, the end-user can retrieve his/her file downloading the required chunks from providers A and B. The RRNS allows us to achieve long-term availability. In addition, the RRNS enables “data obfuscation”, because each provider will have only a partial view of each file. In fact, if we consider p=5 and we store on each provider q<p pieces of file, each provider will be not able to reconstruct the original file. However, since each provider can have a partial view of the file, in order to improve the security, we also encrypted each chunk using a symmetric algorithm. The RRNS data obfuscation combined with data encryption allow our system to guarantee user privacy.

An interesting feature of our approach is that no Cloud provider can have full access to the stored files. In fact, each file is described by means of an XML metadata Map file indicating where the different chunks are stored and how to access them. The Map file must be safeguarded by the end-user. Thus, only the end-user can reconstruct the original file.

The various chunks are firstly encoded in BASE-64 and then encrypted in order to be enclosed in an XML file. BASE-64 encoding is typically used to encapsulate binary data in the payload of messages sent through the web. The end-user is able to select the degree of redundancy r according to his/her requirements. Only the end-user will able to gather all pieces of files and to reconstruct the original file according to the associated XML metadata file. Furthermore, as our system is able to operate both parallel upload/download tasks on different providers, we analyzed the behavior of several Cloud storage providers (i.e., Google Drive, Copy, and Dropbox), considering different file sizes.

The remainder of the paper is organized as follows: Section 2 describes related works, highlighting the lack of reliability mechanisms in terms of both long-term availability and privacy. The reference scenario is presented in Section 3. Section 4 briefly describes the RRNS on which our approach is based. In Section 5, we describe how to achieve both long-term availability and privacy. Experiments are described in Section 6. Conclusions and lights to the future are summarized in Section 7.

Section snippets

Related work

Many works in the literature deal with data reliability in datacenters and in Cloud Infrastructure as a Service (IaaS). A well known solution is the Google File System, where file chunk replication is used and modelled (Ghemawat et al., 2013). Specifically, Google thought to make up a redundant storage of massive amounts of data on cheap and unreliable computers. The file chunk replication strategy is also at the basis of our solution.

Bhagwat et al. (2006) claim the improvement of file

Reference scenario

Cloud storage services offer the end users the possibility to subscribe several providers. Considering the possibility to simultaneously access several Cloud storage providers, we identified three different approaches: (1) Store Data Over the Cloud, (2) Store Data Over the Cloud with RRNS and (3) Store Data Over the Cloud with RRNS and Encryption. Figure 2 shows an example of how the three approaches work.

The approach 1 assumes that a file W is entirely stored in a Cloud storage provider. If

How the redundant residue number system works

The Redundant Residue Number System (RRNS) bases its fundamentals on the Residue Number System (RNS). In this section, we describe the RNS before, followed by the description of the RRNS.

Let us consider p prime, pairwise and positive integers m1,m2,,mp called modulus such as M=i=1pmi and mi>mi1 for each i[2,p]. Given W0, we can define wi=Wmodmi the residue of W modulo mi. The p-tuple (w1,w2,,wp) is named the Residue Representation of W with the given modulus and each tuple element wi is

Long-term availability and privacy through an rrns-based approach with enhanced security

In this section, we focus on the approach 3 discussed in Section 3, i.e., “Store Data Over the Cloud with RRNS and Encryption”. More specifically, we will discuss how the RRNS algorithm and data encryption have been integrated. Our approach ensures a high level of reliability in terms of both long-term availability and privacy in scenarios including different Cloud storage providers.

Figure 6 shows how our system works. When a user needs to store a file over the Cloud, our system compresses it,

Analytical evaluation of the RRNS overhead in term of file size

With regard to the storage overhead, both RRNS and BASE-64 encoding cause an increment in the file size. In this section, we analyze the impact of the proposed solution in terms of memory overhead, comparing our experimental results specifically focusing on approaches 1 and 2.

Let us consider the following parameters:

  • m is the minimum number of modules necessary to reconstruct a file (following the syntax in Section 4, it is equal to pd),

  • r is the redundancy degree,

  • 4 is the compression rate due

Conclusion and future works

In this paper, we discussed an approach to improve the long-term availability and privacy considering different Cloud storage providers. Using the RRNS, our approach consists in splitting a file in different residue-segments, encrypt and upload them in different providers. The advantage of such an approach is twofold: a user can retrieve his/her file even if a provider is not temporarily or permanently available. On the other hand, providers cannot access the files stored within them. In our

References (18)

  • Bhagwat D, Pollack K, Long DDE, Schwarz T, Miller EL, Paris J-F. Providing high reliability in a minimum redundancy...
  • Fan K, Zhao L, Shen X, Li H, Yang Y. Smart-blocking file storage method in cloud computing. In: The 2012 first IEEE...
  • Freed N, Borenstein N. MIME: Multipurpose internet mail extensions, Technical report RFC2045. URL...
  • Ghemawat S, Gobioff H, Leung S-T, The google file system....
  • Hai-Jia W, Peng L, Wei-wei C. The optimization theory of file partition in network storage environment. In: The 2010...
  • Nahar P, Joshi A, Saupp A. Data migration using active cloud engine. In: The 2012 IEEE international conference on...
  • NSA. Prism program taps in to user data of apple, google and others,...
  • NSA-PRISM, surveillance program,...
  • Rahumed A, Chen H, Tang Y, Lee P, Lui J-S. A secure cloud backup system with assured deletion and version control. In:...
There are more references available in the full text version of this article.

Cited by (79)

  • RSA based encryption approach for preserving confidentiality of big data

    2022, Journal of King Saud University - Computer and Information Sciences
  • Steganography over Redundant Residue Number System Codes

    2020, Journal of Information Security and Applications
  • A reliability and security enhanced framework for cloud-based storage systems

    2023, International Journal of Information Technology and Management
View all citing articles on Scopus
View full text