Published March 24, 2021 | Version v1.0.0
Software Open

DeXTER (DeepTextMiner): A deep learning, critical workflow to contextually enrich digital collections and visualise them

  • 1. University of Luxembourg

Description

Source code.zip

DeXTER (DeepTextMiner) is a deep learning, critical workflow to contextually enrich digital collections and interactively visualise them. It is task-oriented (as opposed to result-oriented) and it is is designed to be generalisable and interoperable (i.e., it is data-set independent). To implement its interoperability, we used language-agnostic algorithms to encourage scholars to replicate the methodology with their own data-sets. Currently, DeXTER is based on ChroniclItaly 3.0 (Viola and Fiscarelli 2021), an open access digital heritage collection of Italian immigrant newspapers published in the United States from 1898 to 1936. Methodologically, 1) we experimented with different state-of-the-art NLP techniques (e.g., deep learning Named Entity Recognition, deep learning sentiment analysis) to enrich the collection with further layers of information (e.g., referential, geographical, emotional, relational); and 2) we developed a Shiny app to visualise the enriched material and facilitate analysis in an intuitive, interactive, and reproduceable way. This documentation is the step-by-step description of the project. You can find the code and the tutorial in the code folder of this repository. Part of this documentation is taken from Viola and Fiscarelli (2021).

Files

lorellav/DeXTER-DeepTextMiner-v1.0.0.zip

Files (2.3 MB)

Name Size Download all
md5:c9045cdd1fba9bb3905fcf6f634b6ead
2.3 MB Preview Download

Additional details

Related works

Is documented by
Conference paper: http://ceur-ws.org/Vol-2810/paper5.pdf (URL)
Is source of
Dataset: 10.5281/zenodo.4596345 (DOI)
Is supplement to
Software documentation: https://github.com/lorellav/DeXTER-DeepTextMiner/tree/v1.0.0 (URL)

References

  • Viola, L. and Fiscarelli, A. 2021a. "From digitised sources to digital data: Behind the scenes of (critically) enriching a digital heritage collection". In Proceedings of the International Conference Collect and Connect: Archives and Collections in a Digital Age, edited by Weber, A. Heerlien, M., Miracle, E. G. and Wolstencroft, K. CEUR – Workshops Proceedings
  • Viola, L. and Fiscarelli, A. 2021b. ChroniclItaly 3.0. A deep-learning, contextually enriched digital heritage collection of Italian immigrant newspapers published in the USA, 1898-1936. (Version v3.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4596345