skip to main content
research-article

BlueDBM: an appliance for big data analytics

Published:13 June 2015Publication History
Skip Abstract Section

Abstract

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data and daily twitter feeds where the datasets of interest are 5TB to 20 TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GBs of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. In this paper we present BlueDBM, a new system architecture which has flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller network. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a ram-cloud system falls sharply even if only 5%~10% of the references are to the secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost-performance trade-off for Big Data analytics.

References

  1. N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy, "Design tradeoffs for ssd performance," in USENIX 2008 Annual Technical Conference on Annual Technical Conference, ser. ATC'08. Berkeley, CA, USA: USENIX Association, 2008, pp. 57--70. Available: http://dl.acm.org/citation.cfm?id=1404014.1404019 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. T. Association, Infiniband, 2014 (Accessed November 18, 2014). Available: http://www.infinibandta.orgGoogle ScholarGoogle Scholar
  3. J. Banerjee, D. Hsiao, and K. Kannan, "Dbc: A database computer for very large databases," Computers, IEEE Transactions on, vol. C-28, no. 6, pp. 414--429, June 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. M. Caulfield, A. De, J. Coburn, T. I. Mollow, R. K. Gupta, and S. Swanson, "Moneta: A high-performance storage array architecture for next-generation, non-volatile memories," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO '43. Washington, DC, USA: IEEE Computer Society, 2010, pp. 385--395. Available: http://dx.doi.org/10.1109/MICRO.2010.33 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. M. Caulfield and S. Swanson, "Quicksan: A storage area network for fast, distributed, solid state disks," SIGARCH Comput. Archit. News, vol. 41, no. 3, pp. 464--474, Jun. 2013. Available: http://doi.acm.org/10.1145/2508148.2485962 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Y. Cho, W. S. Jeong, D. Oh, and W. W. Ro, "Xsd: Accelerating mapreduce by harnessing the gpu inside an ssd," 2013.Google ScholarGoogle Scholar
  7. E. S. Chung, J. D. Davis, and J. Lee, "Linqits: Big data on little clients," in Proceedings of the 40th Annual International Symposium on Computer Architecture, ser. ISCA '13. New York, NY, USA: ACM, 2013, pp. 261--272. Available: http://doi.acm.org/10.1145/2485922.2485945 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dai, "Toward efficient provisioning and performance tuning for hadoop," Proceedings of the Apache Asia Roadshow, vol. 2010, pp. 14--15, 2010.Google ScholarGoogle Scholar
  9. J. Do, Y.-S. Kee, J. M. Patel, C. Park, K. Park, and D. J. DeWitt, "Query processing on smart ssds: Opportunities and challenges," in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '13. New York, NY, USA: ACM, 2013, pp. 1221--1230. Available: http://doi.acm.org/10.1145/2463676.2465295 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FusionIO, Using HBase with ioMemory, 2012 (Accessed November 22, 2014). Available: http://www.fusionio.com/white-papers/using-hbase-with-iomemoryGoogle ScholarGoogle Scholar
  11. FusionIO, FusionIO, 2014 (Accessed November 18, 2014). Available: http://www.fusionio.comGoogle ScholarGoogle Scholar
  12. A. Gionis, P. Indyk, R. Motwani et al., "Similarity search in high dimensions via hashing," in VLDB, vol. 99, 1999, pp. 518--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Google, Google Flu Trends, 2011 (Accessed November 18, 2014). Available: http://www.google.org/flutrendsGoogle ScholarGoogle Scholar
  14. S. Hardock, I. Petrov, R. Gottstein, and A. Buchmann, "Noftl: Database systems on ftl-less flash storage," Proc. VLDB Endow., vol. 6, no. 12, pp. 1278--1281, Aug. 2013. Available: http://dx.doi.org/10.14778/2536274.2536295 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Honda, F. Huici, C. Raiciu, J. Araujo, and L. Rizzo, "Rekindling network protocol innovation with user-level stacks," SIGCOMM Comput. Commun. Rev., vol. 44, no. 2, pp. 52--58, Apr. 2014. Available: http://doi.acm.org/10.1145/2602204.2602212 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Intel, Intel Solid-State Drive Data Center Family for PCIe, 2014 (Accessed November 18, 2014). Available: http://www.intel.com/content/www/us/en/solid-state-drives/intel-ssd-dc-family-for-pcie.htmlGoogle ScholarGoogle Scholar
  17. N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda, "High performance rdma-based design of hdfs over infiniband," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC '12. Los Alamitos, CA, USA: IEEE Computer Society Press, 2012, pp. 35:1--35:35. Available: http://dl.acm.org/citation.cfm?id=2388996.2389044 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. István, L. Woods, and G. Alonso, "Histograms as a side effect of data movement for big data," in Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014, pp. 1567--1578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Y. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park, "mtcp: A highly scalable user-level tcp stack for multicore systems," in Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI'14. Berkeley, CA, USA: USENIX Association, 2014, pp. 489--502. Available: http://dl.acm.org/citation.cfm?id=2616448.2616493 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S.-W. Jun, M. Liu, K. E. Fleming, and Arvind, "Scalable multi-access flash store for big data analytics," in Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, ser. FPGA '14. New York, NY, USA: ACM, 2014, pp. 55--64. Available: http://doi.acm.org/10.1145/2554688.2554789 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S.-H. Kang, D.-H. Koo, W.-H. Kang, and S.-W. Lee, "A case for flash memory ssd in hadoop applications," International Journal of Control and Automation, vol. 6, no. 1, 2013.Google ScholarGoogle Scholar
  22. Y. Kang, Y.-s. Kee, E. L. Miller, and C. Park, "Enabling cost-effective data processing with smart ssd," in Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on. IEEE, 2013, pp. 1--12.Google ScholarGoogle Scholar
  23. K. Keeton, D. A. Patterson, and J. M. Hellerstein, "A case for intelligent disks (idisks)," SIGMOD Rec., vol. 27, no. 3, pp. 42--52, Sep. 1998. Available: http://doi.acm.org/10.1145/290593.290602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. King, J. Hicks, and J. Ankcorn, "Software-driven hardware development," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '15. New York, NY, USA: ACM, 2015, pp. 13--22. Available: http://doi.acm.org/10.1145/2684746.2689064 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Kocberber, B. Grot, J. Picorel, B. Falsafi, K. Lim, and P. Ranganathan, "Meet the walkers: Accelerating index traversals for in-memory databases," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-46. New York, NY, USA: ACM, 2013, pp. 468--479. Available: http://doi.acm.org/10.1145/2540708.2540748 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Lee, J. Kim, and Arvind., "Refactored design of i/o architecture for flash storage," Computer Architecture Letters, vol. PP, no. 99, pp. 1--1, 2014.Google ScholarGoogle Scholar
  27. S.-W. Lee, B. Moon, C. Park, J.-M. Kim, and S.-W. Kim, "A case for flash memory ssd in enterprise database applications," in Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008, pp. 1075--1086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Leilich, G. Stiege, and H. C. Zeidler, "A search processor for data base management systems," in Fourth International Conference on Very Large Data Bases, September 13--15, 1978, West Berlin, Germany., 1978, pp. 280--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Liu, J. Wu, S. P. Kini, P. Wyckoff, and D. K. Panda, "High performance rdma-based mpi implementation over infiniband," in Proceedings of the 17th Annual International Conference on Supercomputing, ser. ICS '03. New York, NY, USA: ACM, 2003, pp. 295--304. Available: http://doi.acm.org/10.1145/782814.782855 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Memory, Violin Memory, 2014 (Accessed November 18, 2014). Available: http://www.violin-memory.comGoogle ScholarGoogle Scholar
  31. J. Morris Jr and V. Pratt, A linear pattern-matching algorithm, 1970.Google ScholarGoogle Scholar
  32. R. Mueller, J. Teubner, and G. Alonso, "Streams on wires: A query compiler for fpgas," Proc. VLDB Endow., vol. 2, no. 1, pp. 229--240, Aug. 2009. Available: http://dx.doi.org/10.14778/1687627.1687654 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Oracle, Exadata Database Machine, 2014 (Accessed November 18, 2014). Available: https://www.oracle.com/engineered-systems/exadata/index.htmlGoogle ScholarGoogle Scholar
  34. J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman, "The case for ramclouds: Scalable high-performance storage entirely in dram," SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 92--105, Jan. 2010. Available: http://doi.acm.org/10.1145/1713254.1713276 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. A. Ozkarahan, S. A. Schuster, and K. C. Smith, "RAP - an associative processor for database management," in American Federation of Information Processing Societies: 1975 National Computer Conference, 19-22 May 1975, Anaheim, CA, USA, 1975, pp. 379--387. Available: http://doi.acm.org/10.1145/1499949.1500024 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. Prashanth, G. Jan, G. Michael, H. S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Yi, and X. D. Burger, "A reconfigurable fabric for accelerating large-scale datacenter services," SIGARCH Comput. Archit. News, vol. 42, no. 3, pp. 13--24, Jun. 2014. Available: http://doi.acm.org/10.1145/2678373.2665678 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. W.-u. Rahman, N. Islam, X. Lu, J. Jose, H. Subramoni, H. Wang, and D. Panda, "High-performance rdma-based design of hadoop mapreduce over infiniband," in International Workshop on High Performance Data Intensive Computing (HPDIC), in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. W.-u. Rahman, X. Lu, N. S. Islam, and D. K. D. Panda, "Homr: A hybrid approach to exploit maximum overlapping in mapreduce over high performance interconnects," in Proceedings of the 28th ACM International Conference on Supercomputing, ser. ICS '14. New York, NY, USA: ACM, 2014, pp. 33--42. Available: http://doi.acm.org/10.1145/2597652.2597684 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. M. Rumble, A. Kejriwal, and J. Ousterhout, "Log-structured memory for dram-based storage," in Proceedings of the 12th USENIX Conference on File and Storage Technologies, ser. FAST'14. Berkeley, CA, USA: USENIX Association, 2014, pp. 1--16. Available: http://dl.acm.org/citation.cfm?id=2591305.2591307 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. SanDisk, Sandisk ZetaScale Software, 2014 (Accessed November 22, 2014). Available: http://www.sandisk.com/enterprise/zetascale/Google ScholarGoogle Scholar
  41. S. Seshadri, M. Gahagan, S. Bhaskaran, T. Bunker, A. De, Y. Jin, Y. Liu, and S. Swanson, "Willow: A user-programmable ssd," in Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI'14. Berkeley, CA, USA: USENIX Association, 2014, pp. 67--80. Available: http://dl.acm.org/citation.cfm?id=2685048.2685055 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Singh and B. Leonhardi, "Introduction to the ibm netezza warehouse appliance," in Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, ser. CASCON '11. Riverton, NJ, USA: IBM Corp., 2011, pp. 385--386. Available: http://dl.acm.org/citation.cfm?id=2093889.2093965 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Spiegel, M. McKenna, G. Lakshman, and P. Nordstrom, "Method and system for anticipatory package shipping," Dec. 27 2011, uS Patent 8,086,546. Available: http://www.google.com/patents/US8086546Google ScholarGoogle Scholar
  44. B. Sukhwani, H. Min, M. Thoennes, P. Dube, B. Iyer, B. Brezzo, D. Dillenberger, and S. Asaad, "Database analytics acceleration using fpgas," in Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '12. New York, NY, USA: ACM, 2012, pp. 411--420. Available: http://doi.acm.org/10.1145/2370816.2370874 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Technologies, Diablo Technologies, 2014 (Accessed November 18, 2014). Available: http://www.diablo-technologies.com/Google ScholarGoogle Scholar
  46. T. S. Woodall, G. M. Shipman, G. Bosilca, R. L. Graham, and A. B. Maccabe, "High performance rdma protocols in hpc," in Proceedings of the 13th European PVM/MPI User's Group Conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface, ser. EuroPVM/MPI'06. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 76--85. Available: http://dx.doi.org/10.1007/11846802_18 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. L. Woods, Z. Istvan, and G. Alonso, "Hybrid fpga-accelerated sql query processing," in Field Programmable Logic and Applications (FPL), 2013 23rd International Conference on, Sept 2013, pp. 1--1.Google ScholarGoogle Scholar
  48. L. Woods, Z. Istvan, and G. Alonso, "Ibex - an intelligent storage engine with support for advanced sql off-loading," in Proceedings of the 40th International Conference on Very Large Data Bases, ser. VLDB '14, 2014, pp. 963--974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross, "Q100: The architecture and design of a database processing unit," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '14. New York, NY, USA: ACM, 2014, pp. 255--268. Available: http://doi.acm.org/10.1145/2541940.2541961 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BlueDBM: an appliance for big data analytics

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 43, Issue 3S
            ISCA'15
            June 2015
            745 pages
            ISSN:0163-5964
            DOI:10.1145/2872887
            Issue’s Table of Contents
            • cover image ACM Conferences
              ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
              June 2015
              768 pages
              ISBN:9781450334020
              DOI:10.1145/2749469

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 June 2015

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader