ABSTRACT
The Archives Unleashed project aims to improve scholarly access to web archives through a multi-pronged strategy involving tool creation, process modeling, and community building---all proceeding concurrently in mutually-reinforcing efforts. As we near the end of our initially-conceived three-year project, we report on our progress and share lessons learned along the way. The main contribution articulated in this paper is a process model that decomposes scholarly inquiries into four main activities: filter, extract, aggregate, and visualize. Based on the insight that these activities can be disaggregated across time, space, and tools, it is possible to generate "derivative products", using our Archives Unleashed Toolkit, that serve as useful starting points for scholarly inquiry. Scholars can download these products from the Archives Unleashed Cloud and manipulate them just like any other dataset, thus providing access to web archives without requiring any specialized knowledge. Over the past few years, our platform has processed over a thousand different collections from over two hundred users, totaling around 300 terabytes of web archives.
- Maria José Afanador-Llach, James Baker, Adam Crymble, Víctor Gayol, Martin Grandjean, Jennifer Isasi, Francois Dominic Laramée, Zoe LeBlanc, Matthew Lincoln, Sarah Melton, Jose Antonio Motilla, Joshua G. Ortiz Baco, Sofia Papastamkou, Jessica Parr, Marie Puren, Riva Quiroga, Antonio Rojas Castro, Anna-Maria Sichani, Anandi Silva Knuppel, Amanda Visconti, and Brandon Walsh. 2019. 2019 Programming Historian Deposit release. https://doi.org/10.5281/zenodo.3525082Google Scholar
- Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. 2009. Gephi: An Open Source Software for Exploring and Manipulating Networks. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media. San Jose, California, 361--362.Google Scholar
- Neils Brügger. 2018. The Archived Web. Doing History in the Digital Age .MIT Press, Cambridge, Massachusetts.Google Scholar
- Niels Brügger and Ian Milligan (Eds.). 2018. The SAGE Handbook of Web History .SAGE Publications Limited.Google Scholar
- Niels Brügger and Ralph Schroeder (Eds.). 2017. The Web as History: Using Web Archives to Understand the Past and the Present .UCL Press.Google ScholarCross Ref
- Ryan Deschamps, Samantha Fritz, Jimmy Lin, Ian Milligan, and Nick Ruest. 2019 a. The Cost of a WARC: Analyzing Web Archives in the Cloud. In Proceedings of the 19th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2019). Urbana-Champaign, Illinois, 261--264.Google ScholarDigital Library
- Ryan Deschamps, Nick Ruest, Jimmy Lin, Samantha Fritz, and Ian Milligan. 2019 b. The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration of Web Archives. In Proceedings of the 19th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2019). Urbana-Champaign, Illinois, 337--338.Google ScholarDigital Library
- Gabriel A. Devenyi, Rémi Emonet, Rayna M. Harris, Kate L. Hertweck, Damien Irving, Ian Milligan, and Greg Wilson. 2018. Ten Simple Rules for Collaborative Lesson Development. PLOS Computational Biology, Vol. 14, 3 (03 2018), 1--8.Google ScholarCross Ref
- Matthew Farrell, Edward McCain, Maria Praetzellis, Grace Thomas, and Paige Walker. 2017. Web Archiving in the United States: A 2017 Survey. Technical Report. National Digital Stewardship Alliance. https://osf.io/ht6ay/Google Scholar
- Helge Holzmann, Vinay Goel, and Avishek Anand. 2016. ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries (JCDL 2016). Newark, New Jersey, 83--92.Google ScholarDigital Library
- Andrew Jackson, Jimmy Lin, Ian Milligan, and Nick Ruest. 2016. Desiderata for Exploratory Search Interfaces to Web Archives in Support of Scholarly Activities. In Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2016). Newark, New Jersey, 103--106.Google ScholarDigital Library
- Jimmy Lin, Ian Milligan, Jeremy Wiebe, and Alice Zhou. 2017. Warcbase: Scalable Analytics Infrastructure for Exploring Web Archives. ACM Journal on Computing and Cultural Heritage, Vol. 10, 4 (2017), Article 22.Google Scholar
- Ian Milligan. 2019. History in the Age of Abundance? How the Web is Transforming Historical Research. McGill-Queen's University Press.Google Scholar
- Ian Milligan, Nathalie Casemajor, Samantha Fritz, Jimmy Lin, Nick Ruest, Matthew S. Weber, and Nicholas Worby. 2019. Building Community and Tools for Analyzing Web Archives through Datathons. In Proceedings of the 19th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2019). Urbana-Champaign, Illinois, 265--268.Google ScholarDigital Library
- Franco Moretti. 2007. Graphs, Maps, Trees: Abstract Models for Literary History .Verso.Google Scholar
- Nick Ruest. 2020. Ministry of Environment of Québec (2011--2014) Web Archive Collection Derivatives. https://doi.org/10.5281/zenodo.3599771Google Scholar
- Matthew S. Weber and Philip M. Napoli. 2018. Journalism History, Web Archives, and New Methods for Understanding the Evolution of Digital Journalism. Digital Journalism, Vol. 6, 9 (2018), 1186--1205.Google ScholarCross Ref
- Jane Winters. 2017. Coda: Web Archives for Humanities Research -- Some Reflections. In The Web as History: Using Web Archives to Understand the Past and the Present, Niels Brügger and Ralph Schroeder (Eds.). UCL Press, 238--248.Google Scholar
- Hsiu-Wei Yang, Linqing Liu, Ian Milligan, Nick Ruest, and Jimmy Lin. 2019. Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit. In Proceedings of the 19th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2019). Urbana-Champaign, Illinois, 436--437.Google ScholarDigital Library
Index Terms
- The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives
Recommendations
Observing Web Archives: The Case for an Ethnographic Study of Web Archiving
WebSci '17: Proceedings of the 2017 ACM on Web Science ConferenceThis paper makes the case for studying the work of web archivists, in an effort to explore the ways in which practitioners shape the preservation and maintenance of the archived Web in its various forms. An ethnographic approach is taken through the use ...
Annotating the web archives – an exploration of web archives cataloging and semantic web
ICADL'06: Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and OpportunitiesDespite the success of Internet access via search technology, it has become increasing plain that such a mode is inadequate when applied to holdings in a Web Archives. A greater amount of relevant contextual information is essential in accessing Web ...
Open access LIS periodicals and digital archives
Purpose - The purpose of this paper is to report results of a study which investigated the growth of open access OA journals across the world with reference to the Asian region. Details of 117 OA journals were collected from the Directory of Open Access ...
Comments