Abstract
Academic knowledge building has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions. It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds. This approach is presently being challenged by the development of big data. Small data studies will however, we argue, continue to be popular and valuable in the future because of their utility in answering targeted queries. Importantly, however, small data will increasingly be made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and reuse, and open them up to combination with big data and analysis using big data analytics. This paper examines the logic and value of small data studies, their relationship to emerging big data and data science, and the implications of scaling small data into data infrastructures, with a focus on spatial data examples.
Similar content being viewed by others
References
Amin, A., & Thrift, N. (2002). Cities: Reimagining the urban. London: Polity.
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired, June 23, 2008, http://www.wired.com/science/discoveries/magazine/16-07/pb_theo-ry. Accessed 12 Oct 2012.
Batty, M. (2013). The new science of cities. Cambridge, MA: MIT Press.
Berry, D. (2011). The computational turn: Thinking about the digital humanities. Culture Machine 12. http://www.culturemachine.net/index.php/cm/article/view/440/470. Accessed 3 Dec 2012.
Bollier, D. (2010). The promise and peril of big data. The Aspen Institute. http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Promise_and_Peril_of_Big_Data.pdf. Accessed 1 Oct 2012.
Borgman, C. L. (2007). Scholarship in the digital age. Cambridge, MA: MIT Press.
boyd, D., & Crawford, K. (2012). Critical questions for big data. Information, Communication and Society, 15(5), 662–679.
Brooks, D. (2013). What data can’t do. New York Times, http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html. Accessed 18 Feb 2013.
Canadian Internet Public Policy Interest Clinic (CIPPIC). (2006). On the data trail: How detailed information about you gets into the hands of organizations with whom you have no relationship. Ottawa: A Report on the Canadian Data Brokerage Industry. https://www.cippic.ca/sites/default/files/May1-06/DatabrokerReport.pdf.
Clarke, R. (1988). Information technology and dataveillance. Communications of ACM, 31(5 May 1988), 498–512.
Cohen, D. (2008). Contribution to: The promise of digital history (roundtable discussion). Journal of American History, 95(2), 452–491.
Constine, J. (2012). How big is facebook’s data? 2.5 billion pieces of content and 500 + terabytes ingested every day, 22 August 2012, http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/. Accessed 28 Jan 2013.
Crampton, J., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M. W., et al. (2012). Beyond the Geotag? Deconstructing “big data” and leveraging the potential of the geoweb. http://www.uky.edu/~tmute2/geography_methods/readingPDFs/2012-Beyond-the-Geotag-2012.10.01.pdf. Accessed 21 Feb 2013.
Cyberinfrastructure Council. (2007). Cyberinfrastructure vision for 21st century discovery. http://www.nsf.gov/pubs/2007/nsf0728/index.jsp?org=EEC Washington, DC: National Science Foundation. Accessed 17 Jan 2014.
Dasish. (2012). Roadmap for preservation and curation in the social sciences and humanities. http://dasish.eu/publications/projectreports/D4.1_-_Roadmap_for_Preservation_and_Curation_in_the_SSH.pdf/. Accessed 15 Oct 2013.
Dodge, M., & Kitchin, R. (2005). Codes of life: Identification codes and the machine-readable world. Environment and Planning D: Society and Space, 23(6), 851–881.
Edwards, J. (2013). Facebook is about to launch a huge play in ‘big data’ analytics. Business insider, May 10th http://www.businessinsider.com/facebook-is-about-to-launch-a-huge-play-in-big-data-analytics-2013-5. Accessed 18 Sept 2013.
Environics Analytics. (2013a). Wealth$capes: Dollars and sense, http://www.environicsanalytics.ca/environics-analytics/data/financial-data/wealthscapes. Accessed 26 Nov 2013.
Environics Analytics. (2013b). PRiZMc2 segmentation lifestyle lookup tool, http://www.environicsanalytics.ca/prizm-c2-cluster-lookup. Accessed 26 Nov 2013.
Fry, J., Lockyer, S., Oppenheim, C., Houghton, J. W., & Rasmussen, B. (2008). Identifying benefits arising from the curation and open sharing of research data produced by UK higher education and research institutes. London and Bristol: JISC. http://repository.jisc.ac.uk/279/. Accessed 8 Oct 2014.
Graham, S. (2005). Software-sorted geographies. Progress in Human Geography, 29(5), 562–580.
Hacking, I. (1975). The emergence of probability. Cambridge: Cambridge University Press.
Hacking, I. (1990). The taming of chance. Cambridge: Cambridge University Press.
Hacking, I. (2007). Kinds of people, moving targets. In Proceedings of the British Academy (Vol. 151, pp. 285–318), 2006 Lectures. British Academy Lecture, Read at the Academy 11 April 2006.
Han, J., Kamber, M., & Pei, (2011). Data mining: Concepts and techniques (3rd ed.). Waltham: Morgan Kaufmann.
Haraway, D. (1991). Simians, cyborgs and women: The reinvention of nature. New York: Routledge.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd edition ed.). Berlin: Springer.
Innes, M. (2001). Control creep. Sociological Research Online, 6(3), http://www.socresonline.org.uk/6/3/innes.html. Accessed 8 Oct 2014.
Kelling, S., Hochachka, W., Fink, D., Riedewald, M., Caruana, R., Ballard, G., et al. (2009). Data-intensive science: A new paradigm for biodiversity studies. BioScience, 59(7), 613–620.
Kitchin, R. (2013). Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography, 79(1), 1–14.
Kitchin, R. (2014a). Big data, new epistemologies and paradigm shifts. Big Data and Society, 1(1), 1–12.
Kitchin, R. (2014b). The real-time city? Big data and smart urbanism. GeoJournal, 3(3), 262–267.
Kitchin, R., & Dodge, M. (2011). Code/space: Software and everyday life. Cambridge, MA: MIT Press.
Koops, B. J. (2011). Forgetting footprints, shunning shadows: A critical analysis of the ‘right to be forgotten’ in big data practice. SCRIPTed, 8(3), 229–256.
Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. Meta Group. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 16 Jan 2013.
Lauriault, T.P. (2012). Data, infrastructures and geographical imaginations: Mapping data access discourses in Canada. PhD Thesis, Ottawa: Carleton University.
Lauriault, T. P., Craig, B. L., Taylor, D. R. F., & Pulsifier, P. L. (2007). Today’s data are part of tomorrow’s research: Archival issues in the sciences. Archivaria, 64, 123–179.
Lauriault, T. P., Hackett, Y., & Kennedy, E. (2013). Geospatial data preservation primer. Arthurs and Low: Hickling.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., et al. (2009). Computational social science. Science, 323, 721–733.
Loukides, M. (2010). What is data science? O’Reilly Radar, 2 June 2010, http://radar.oreilly.com/2010/06/what-is-data-science.html. Accessed 28 Jan 2013.
Lyon, D. (2002). Everyday surveillance: Personal data and social classifications. Information, Communication and Society, 5, 242–257.
Manovich, L. (2011). Trending: The promises and the challenges of big social data. http://www.manovich.net/DOCS/Manovich_trending_paper.pdf. Accessed 9 Nov 2012.
Manyika, J., Chiu, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.
Marz, N., & Warren, J. (2012). Big data: Principles and best practices of scalable realtime data systems. Manning: MEAP edition.
Mayer-Schonberger, V., & Cukier, K. (2013). Big data: A revolution that will change how we live. John Murray: Work and Think.
Miller, H. J. (2010). The data avalanche is here. Shouldn’t we be digging? Journal of Regional Science, 50(1), 181–201.
Moretti, F. (2005). Graphs, maps, trees: Abstract models for a literary history. London: Verso.
O’Carroll, A., Collins, S., Gallagher, D., Tang, J., & Webb, S. (2013). Caring for digital content, mapping international approaches Nui Maynooth. Dublin: Trinity College Dublin, Royal Irish Academy and Digital Repository of Ireland.
Rameriz, E. (2013). The privacy challenges of big data: A view from the lifeguard’s chair. Technology Policy Institute Aspen Forum, August 19th. http://ftc.gov/speeches/ramirez/130819bigdataaspen.pdf. Accessed 11 Oct 2013.
Ramsay, S. (2010). Reading machines: Towards an algorithmic criticism. Champaign, IL: University of Illinois Press.
Ruppert, E. (2013). Rethinking empirical social sciences. Dialogues in Human Geography, 3(3), 268–273.
Sawyer, S. (2008). Data wealth, data poverty, science and cyberinfrastructure. Prometheus: Critical Studies in Innovation, 26(4), 355–371.
Siegel, E. (2013). Predictive analytics. Hoboken, NJ: Wiley.
Singer, N. (2012). You for sale: Mapping, and sharing, the consumer genome. New York Times, 17th June, www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-of-consumer-database-marketing.html. Accessed 11 Oct 2013.
Solove, D. J. (2006). A taxonomy of privacy. University of Pennsylvania Law Review, 154(3), 477–560.
Wyly, E. (2014). Automated (post) positivism. Urban Geography, 35(5), 669–690.
Acknowledgments
The research conducted for this paper was made possible with funding from the European Research Council (ERC-2012-AdG-323636) and Science Foundation Ireland.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kitchin, R., Lauriault, T.P. Small data in the era of big data. GeoJournal 80, 463–475 (2015). https://doi.org/10.1007/s10708-014-9601-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10708-014-9601-7