ABSTRACT
Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system management challenges. A large system of fault-prone machines with local persistent state can easily degenerate into a mess of unreferenced data and runaway computations. Our solution to this problem is DataLab, a software framework for running data parallel workloads on active storage clusters. DataLab provides a simple language for expressing workloads, works with legacy application codes, and achieves robustness through the use of distributed transactions. Our prototype implementation scales to 250 nodes on a large biometric image processing workload.
- E. Riedel, G. Gibson, and C. Faloutsos. Active storage for large scale data mining and multimedia. VLDB 1998. Google ScholarDigital Library
- X. Ma and A. Reddy. Implementation and evaluation of an active storage system prototype. Workshop on Novel Uses of System Area Networks, 2002.Google Scholar
- R. Wickremisinghe, J. Vitter, and J. Chase. Dist. comp. with load managed active storage. HPDC 2002. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large cluster. OSDI 2004. Google ScholarDigital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data parallel programs from sequential building blocks. EuroSys 2007. Google ScholarDigital Library
Index Terms
- DataLab: transactional data-parallel computing on an active storage cloud
Recommendations
Implementation and evaluation of active storage in modern parallel file systems
Active Storage is a technology aimed at reducing the bandwidth requirements of current supercomputing systems, and leveraging the processing power of the storage nodes used by some modern file systems. To achieve both objectives, Active Storage moves ...
Building a large-scale object-based active storage platform for data analytics in the internet of things
Due to consistent improvements in memory and processor technology, object storage devices (OSDs) have greater memory space and more powerful processing power, which allow the OSDs to execute user-defined programs. Shifting part of an application's ...
DOSAS: Mitigating the Resource Contention in Active Storage Systems
CLUSTER '12: Proceedings of the 2012 IEEE International Conference on Cluster ComputingActive storage provides an effective method to mitigate the I/O bottleneck problem of data intensive high performance computing applications. It can reduce the amount of data transferred as the application runs by moving appropriate computations close ...
Comments