skip to main content
research-article
Open Access

S, R, and data science

Published:12 June 2020Publication History
Skip Abstract Section

Abstract

Data science is increasingly important and challenging. It requires computational tools and programming environments that handle big data and difficult computations, while supporting creative, high-quality analysis. The R language and related software play a major role in computing for data science. R is featured in most programs for training in the field. R packages provide tools for a wide range of purposes and users. The description of a new technique, particularly from research in statistics, is frequently accompanied by an R package, greatly increasing the usefulness of the description.

The history of R makes clear its connection to data science. R was consciously designed to replicate in open-source software the contents of the S software. S in turn was written by data analysis researchers at Bell Labs as part of the computing environment for research in data analysis and collaborations to apply that research, rather than as a separate project to create a programming language. The features of S and the design decisions made for it need to be understood in this broader context of supporting effective data analysis (which would now be called data science). These characteristics were all transferred to R and remain central to its effectiveness. Thus, R can be viewed as based historically on a domain-specific language for the domain of data science.

References

  1. H. Abelson and G. J. Sussman. 1983. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  2. ACM. 1998. ACM Software System Award. https://awards.acm.org/award_winners/chambers_6640862 .Google ScholarGoogle Scholar
  3. Richard A. Becker and John M. Chambers. 1976. GR-Z: A System of Graphical Subroutines for Data Analysis. In Proc. 9th Interface Symp. Computer Science and Statistics .Google ScholarGoogle Scholar
  4. Richard A. Becker and John M. Chambers. 1984. S: An Interactive Environment for Data Analysis and Graphics. Wadsworth, Belmont CA.Google ScholarGoogle Scholar
  5. Richard A. Becker and John M. Chambers. 1985. Extending the S System. Wadsworth, Belmont CA.Google ScholarGoogle Scholar
  6. Richard A. Becker, John M. Chambers, and Allan R. Wilks. 1988. The New S Language. Chapman & Hall, Boca Raton, FL.Google ScholarGoogle Scholar
  7. John M. Chambers. 1998. Programming with Data: A Guide to the S Language. Springer, New York.Google ScholarGoogle Scholar
  8. John M. Chambers. 2016. Extending R. Chapman & Hall/CRC.Google ScholarGoogle Scholar
  9. John M. Chambers and Trevor Hastie (Eds.). 1992. Statistical Models in S. Chapman & Hall, Boca Raton, FL.Google ScholarGoogle Scholar
  10. F. J. Corbató and V. A. Vyssotsky. 1965. Introduction and overview of the Multics system. In Proceedings of the November 30–December 1, 1965, Fall Joint Computer Conference, Part I (AFIPS ’65 (Fall, part I)) . ACM, New York, NY, USA, 185–196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Donoho. 2017. 50 Years of Data Science. Journal of Computational and Graphical Statistics 26, 4 (2017), 745–766. Google ScholarGoogle ScholarCross RefCross Ref
  12. Dirk Eddelbuettel and Romain François. 2011. Rcpp: seamless R and C++ integration. Journal of Statistical Software 40, 8 (2011), 1–18. Google ScholarGoogle ScholarCross RefCross Ref
  13. A. E. Freeny and J. D. Gabbe. 1969. A statistical description of intense rainfall. Bell System Technical Journal 48 (1969), 1789–1851.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jon Gertner. 2013. The Idea Factory: Bell Labs and the Great Age of American Innovation. Penguin.Google ScholarGoogle Scholar
  15. Ross Ihaka. 1998. R : Past and Future History. (draft for Interface Symp. Computer Science and Statistics): https://cran.rproject.org/doc/html/interface98-paper/paper.html .Google ScholarGoogle Scholar
  16. Ross Ihaka and Robert Gentleman. 1996. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5 (1996), 299–314.Google ScholarGoogle ScholarCross RefCross Ref
  17. Louis Jaeckel and John Gabbe. 1974. Crawford Hill rainfall data. In Exploring Data Analysis: The Computer Revolution in Statistics . University of California Press, Chapter 3.Google ScholarGoogle Scholar
  18. S.C. Johnson and D. M. Ritchie. 1978. UNIX time-sharing system: portability of C programs and the UNIX system. Bell System Technical Journal 57, 6 (1978), 2021–2048.Google ScholarGoogle ScholarCross RefCross Ref
  19. Daniel Kaplan and Deborah Nolan. 2015. Modeling Runners’ Times in the Cherry Blossom Race. In Data Science in R, Deborah Nolan and Duncan Temple Lang (Eds.). Chapman and Hall/CRC, Chapter 2, 45–103.Google ScholarGoogle Scholar
  20. D. M. Ritchie. 1984. The evolution of the UNIX time-sharing system. AT&T Bell Laboratories Technical Journal 63, 8 (1984), 1577–1593.Google ScholarGoogle Scholar
  21. Duncan Temple Lang. 1997. A Multi Threaded Extension to a High Level Interactive Statistical Computing Environment. Ph.D. Dissertation. University of California, Berkeley.Google ScholarGoogle Scholar
  22. Nick Thieme. 2018. R Generation. Significance 15, 4 (August 2018), 14–19.Google ScholarGoogle ScholarCross RefCross Ref
  23. John W. Tukey. 1962. The future of data analysis. The Annals of Mathematical Statistics 33, 1 (1962), 1–67.Google ScholarGoogle ScholarCross RefCross Ref
  24. John W. Tukey. 1977. Exploratory Data Analysis. Addison-Wesley, Reading, Massachusetts.Google ScholarGoogle Scholar
  25. Hadley Wickham and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly.Google ScholarGoogle Scholar
  26. Martin B. Wilk and Ram Gnanadesikan. 1968. Probability plotting methods for the analysis of data. Biometrika 55, 1 (1968), 1–17.Google ScholarGoogle Scholar

Index Terms

  1. S, R, and data science

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Programming Languages
        Proceedings of the ACM on Programming Languages  Volume 4, Issue HOPL
        June 2020
        1524 pages
        EISSN:2475-1421
        DOI:10.1145/3406494
        Issue’s Table of Contents

        Copyright © 2020 Owner/Author

        This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 June 2020
        Published in pacmpl Volume 4, Issue HOPL

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader