Skip to main content

From Original Sources to Linguistic Analysis: Tools and Datasets for the Investigation of Multilingualism in Medieval English

  • Chapter
  • First Online:
Medieval English in a Multilingual Context

Part of the book series: New Approaches to English Historical Linguistics ((NAEHL))

  • 112 Accesses

Abstract

This chapter presents an outline of some of the different types of digital datasets and tools that are currently available to help researchers in the analysis of multilingualism and its effects on the development of medieval English. A complete survey of all existing material is infeasible and will rapidly become outdated, and so the goal is instead to discuss representative examples, in order to illustrate what is currently possible with the broad range of methodologies and tools that is at our disposal. Rather than presenting a simple list, the discussion instead focuses on different stages of typical workflows and seeks to provide a realistic assessment of the potential and limits, as well as suggestions of where we might usefully go in future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Some relevant examples include the Cambridge Digital Library (Search > Advanced Search), the Digital Bodleian (Advanced Search) and the joint Bibliothèque Nationale de France–British Library project France Angleterre 700–1200 (Recherche avancée). This is also possible for Parker on the Web, but it is not easy to find: one way is to click on ‘Search’ while leaving the search box empty, then one may or may not see options to ‘Limit your search’ depending on the size of your browser window. If not, one can click not on the text ‘Limit your search’ but instead on the box on the far right at the same horizontal level, and then the options appear. For full details of websites and software cited in this chapter, see ‘Digital resources and websites’ in the bibliography below.

  2. 2.

    The Biblissima project has recently received a second major grant and so has started its next phase as Biblissima+.

  3. 3.

    See Footnote 1, as well as Gallica (Recherche avancée), and the James Catalogue of Western Manuscripts from the Wrenn Library in Cambridge (Browse Catalogue), among many others.

  4. 4.

    Archetype is the software used for the DigiPal project cited above, as well as others including Exon Domesday, Models of Authority, and VisigothicPal, among others.

  5. 5.

    Two examples are Exon Domesday (see Labs > Codicological Visualisation and Labs > Collation Visualisations) and BiblioPhilly, where some digitised manuscripts come with interactive quire diagrams.

  6. 6.

    The distinction between OCR and HTR is not clearly defined: it can refer to the source material (OCR for print and HTR for manuscript), or the underlying process (recognition character-by-character or recognition line-by-line), but in modern systems there is little practical distinction. For this reason, we refer to OCR / HTR throughout.

  7. 7.

    Kraken is an OCR / HTR software which runs from the command line, and eScriptorium is a web-based graphical user interface that interacts with kraken. The two are therefore often used together and so will be referred to collectively as a single package unless only one of the two parts is intended.

  8. 8.

    These corpora do not reflect multilingualism primarily for technical reasons, for instance because paragraphs written in Latin that are part of the original texts were not included. However, traces are left in the form of foreign language sequences that are tagged FW (foreign word).

  9. 9.

    This project was funded by the Deutsche Forschungsgemeinschaft, DFG, research grant TRI555/6-1, TRI555/6-2 Sachbeihilfe. For further information, see https://tinyurl.com/dfgbasics/ [accessed 29 September 2021].

  10. 10.

    Although the verb-lemmatised version of the PPCME2 cannot be provided for licensing reasons, the lemmatiser script is available for download on the BASICS website so that everyone who owns a copy of the corpus can perform the lemmatisation himself / herself.

  11. 11.

    The video tutorial on the BASICS Toolkit website provides an introduction to all resources and tools.

  12. 12.

    Since copying stresses the non-identicality of copied material, Percillier opts for Johanson’s (2002) term for borrowing.

  13. 13.

    For a study of the copying of -able in Middle English, see Trips and Stein (2008).

  14. 14.

    This distinction is also called ‘reflexive’ and ‘inherent reflexive’ (cf. Steinbach 2002; see also Trips 2020).

  15. 15.

    The query for argument reflexives using the ‘simple strategy’ is limited to subject pronouns. For details about the limitations of querying these patterns, see Percillier and Trips (2020).

  16. 16.

    The Programming Historian > ‘Lessons’. At the time of writing, the categories have thirty-three, twenty, twenty-one, twelve and two lessons respectively in English; other languages have different numbers of lessons but follow the same broad pattern.

References

  • Allen, Cynthia. 2003. Deflexion and the Development of the Genitive in English. English Language and Linguistics 7: 1–28.

    Article  Google Scholar 

  • ———. 2008. Genitives in Early English: Typology and Evidence. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Andrist, Patrick, Paul Canart, and Marilena Maniaci. 2013. La syntaxe du codex: Essai de codicologie structurale, Bibliologia, 36. Turnhout: Brepols.

    Google Scholar 

  • Baayen, R. Harald. 1992. Quantitative Aspects of Morphological Productivity. In Yearbook of Morphology 1991, ed. Geert Booij and Jaap van Marle, 109–149. Dordrecht: Kluwer Academic Publishers.

    Chapter  Google Scholar 

  • ———. 1993. On Frequency, Transparency, and Productivity. In Yearbook of Morphology 1992, ed. Geert Booij and Jaap van Marle, 181–208. Dordrecht: Kluwer Academic Publishers.

    Google Scholar 

  • Blockley, Marion. 1982. Further Addenda and Corrigenda to N.R. Ker’s Catalogue. Notes and Queries 228 (n.s. 29): 1–3.

    Google Scholar 

  • Cameron, Angus, Roberta Frank, and John Leyerle, eds. 1970. Computers and Old English Concordances. Toronto: University of Toronto Press.

    Google Scholar 

  • Clérice, Thibault. 2020. Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin. Journal of Data Mining and Digital Humanities [Online] 5581. https://doi.org/10.46298/jdmdh.5581.

  • Da Rold, Orietta, Takako Kato, Mary Swan, and Elaine Treharne, eds. 2010. The Production and Use of English Manuscripts 1060 to 1220. Leicester. https://www.le.ac.uk/english/em1060to1220/index.html.

  • Davis, Godfrey R. C. 2010. Medieval Cartularies of Great Britain and Ireland, revised by Claire Breay, Julian Harrison, and David M. Smith. London: British Library.

    Google Scholar 

  • Einenkel, Eugen. 1916. Geschichte der englischen Sprache, II: Historische Syntax. Strasbourg: Trübner.

    Google Scholar 

  • van Gelderen, Elly. 2000. A History of the English Reflexive Pronouns: Person, Self, Interpretability, Linguistics Today, 39. Amsterdam: John Benjamins.

    Google Scholar 

  • Gneuss, Helmut. 2001. Handlist of Anglo-Saxon Manuscripts: A List of Manuscripts and Manuscript Fragments Written or Owned in England up to 1100, Medieval and Renaissance Texts and Studies, 241. Tempe, AZ: Arizona Center for Medieval and Renaissance Studies.

    Google Scholar 

  • ———. 2003. Addenda and Corrigenda to the Handlist of Anglo-Saxon Manuscripts. Anglo-Saxon England 32: 293–305.

    Article  Google Scholar 

  • ———. 2011. Second Addenda and Corrigenda to the Handlist of Anglo-Saxon Manuscripts. Anglo-Saxon England 40: 293–306.

    Article  Google Scholar 

  • Gneuss, Helmut, and Malcolm Lapidge. 2014. Anglo-Saxon Manuscripts: A Bibliographical Handlist of Manuscripts and Manuscript Fragments Written Or Owned in England up to 1100, Toronto Anglo-Saxon Series, 15. Toronto: University of Toronto Press.

    Book  Google Scholar 

  • Haeberli, Eric. 2018. Syntactic Effects of Contact in Translations: Evidence from Object Pronoun Placement in Middle English. English Language and Linguistics 22: 301–321.

    Article  Google Scholar 

  • Ingham, Richard. 2012. The Transmission of Anglo-Norman: Language History and Language Acquisition. Amsterdam: John Benjamins.

    Book  Google Scholar 

  • Jefferson, Judith A., and Ad Putter, eds. 2013. Multilingualism in Medieval Britain (c. 1066-1520): Sources and Analysis. Turnhout: Brepols.

    Google Scholar 

  • Jenkyns, Joy. 1991. The Toronto Dictionary of Old English Resources: A User’s View. Review of English Studies 42 (167): 380–416.

    Article  Google Scholar 

  • Johanson, Lars. 2002. Contact-Induced Change in a Code-Copying Framework. In Language Change: The Interplay of Internal, External and Extra-Linguistic Factors, ed. Mari C. Jones and Edith Esch, 285–313. Berlin: de Gruyter.

    Chapter  Google Scholar 

  • Keenan, Edward L. 2009. Linguistic Theory and the Historical Creation of English Reflexives. In Historical Syntax and Linguistic Theory, ed. Paola Crisma and Giuseppe Longobardi, 17–40. Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Ker, Neil R. 1957. Catalogue of Manuscripts Containing Anglo-Saxon. Oxford: Clarendon Press.

    Google Scholar 

  • ———. 1976. A Supplement to Catalogue of Manuscripts Containing Anglo-Saxon. Anglo-Saxon England 5: 121–131.

    Article  Google Scholar 

  • Kestemont, Mike, Vincent Christlein, and Dominique Stutzmann. 2017. Artificial Paleography: Computational Approaches to Identifying Script Types in Medieval Manuscripts. Speculum 92: 86–109.

    Article  Google Scholar 

  • König, Ekkehard, and Pieter Siemund. 2000. The Development of Complex Reflexives and Intensifiers in English. Diachronica 17: 39–84.

    Article  Google Scholar 

  • Kroch, Anthony, and Ann Taylor. 1997. Verb Movement in Old and Middle English: Dialect Variation and Language Contact. In Parameters of Morphosyntactic Change, ed. A. van. Kemenade and N. Vincent, 297–325. Cambridge: Cambridge University Press.

    Google Scholar 

  • Levin, Beth. 1993. English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press.

    Google Scholar 

  • Mustanoja, Tauno F. 1960. A Middle English Syntax, Part I: Parts of Speech, Mémoires de la Société Néophilologique de Helsinki, 23. Helsinki: Société Néophilologique.

    Google Scholar 

  • Myers, Sara Mae. 2009. The Evolution of the Genitive Noun Phrase in Early Middle English. Unpublished MPhil thesis, University of Glasgow.

    Google Scholar 

  • Page, Raymond I. 1992. On the Feasibility of a Corpus of Anglo-Saxon Glosses: The View from the Library. In Anglo-Saxon Glossography: Papers Read at the International Conference Held in the Koninklijke Academie voor Wetenschappen Letteren en Schone Kunsten van België, Brussels, 8 and 9 September 1986, ed. René Derolez, 77–96. Brussels: Paleis der Academiën.

    Google Scholar 

  • Parkes, Malcolm B. 1969. English Cursive Bookhands 1250–1500. Oxford: Clarendon Press.

    Google Scholar 

  • Peitsara, Kirsti. 1997. The Development of Reflexive Strategies in English. In Grammaticalization at Work: Studies of Long-Term Developments in English, ed. Matti Rissanen, Merja Kytö, and Kirsi Heikkonen, 277–370. Berlin: Mouton de Gruyter.

    Chapter  Google Scholar 

  • Pelteret, David A.E. 1990. Catalogue of English Post-Conquest Vernacular Documents. Woodbridge: Boydell.

    Google Scholar 

  • Percillier, Michael. 2018. A Toolkit for Lemmatising, Analysing and Visualising Middle English Data. In Proceedings of the Second Workshop on Corpus-Based Research in the Humanities CRH-2, ed. Andrew U. Frank, Christine Ivanovic, Francesco Mambrini, Marco Passarotti, and Caroline I. Sporleder, 153–160. Vienna: TU Wien.

    Google Scholar 

  • ———. 2019. Dynamic Modelling of Medieval Language Contact: The Case of Anglo-Norman and Middle English. In Diachrone Migrationslinguistik: Mehrsprachigkeit in historischen Sprachkontaktsituationen, ed. R. Schöntag and S. Massicot, 79–99. Berlin: Peter Lang.

    Google Scholar 

  • ———. 2020. Allostructions, Homostructions or a Constructional Family?: Changes in the Network of Secondary Predicate Constructions in Middle English. In Nodes and Networks in Diachronic Construction Grammar, ed. L. Sommerer and E. Smirnova, 214–242. Amsterdam: John Benjamins. https://doi.org/10.1075/cal.27.06per

  • ———. 2022. Adapting the Dynamic Model to Historical Linguistics: Case Studies on the Middle English and Anglo-Norman Contact Situation. In English Historical Linguistics: Historical English in Contact, ed. Bettelou Los, Chris Cummins, Lisa Gotthard, Alpo Honkapohja, and Benjamin Molineaux, 5–33. Amsterdam: John Benjamins.

    Google Scholar 

  • Percillier, Michael, and Carola Trips. 2020. Lemmatising Verbs in Middle English Corpora: The Benefit of Enriching the Penn-Helsinki Parsed Corpus of Middle English 2 (PPCME2), the Parsed Corpus of Middle English Poetry (PCMEP), and A Parsed Linguistic Atlas of Early Middle English (PLAEME). In Proceedings of LREC2020, 7170–7178. https://www.aclweb.org/anthology/2020.lrec-1.886. Accessed 29 September 2021.

  • Perek, Florent, and Martin Hilpert. 2017. A Distributional Semantic Approach to the Periodization of Change in the Productivity of Constructions. International Journal of Corpus Linguistics 22: 490–520.

    Article  Google Scholar 

  • Pierazzo, Elena, and Peter A. Stokes. 2022. Old Books, New Books and Digital Publishing. In The Bloomsbury Handbook to the Digital Humanities, ed. James O’Sullivan, 233–44. London: Bloomsbury Academic.

    Google Scholar 

  • Robinson, Peter, and Elizabeth Solopova. 1993. Guidelines for Transcription of the Manuscripts of the Wife of Bath’s Prologue. In The Canterbury Tales Project: Occasional Papers, ed. Norman F. Blake and Peter Robinson, 19–52. Oxford: Oxford University Computing Services, Office for Humanities Communication.

    Google Scholar 

  • Sawyer, Peter H., ed. 1968. Anglo-Saxon Charters: An Annotated List and Bibliography, Royal Historical Society Guides and Handbooks, 8. London: Royal Historical Society.

    Google Scholar 

  • Schauwecker, Yela. 2019. Le faus françeis d’Angleterre en tant que langue seconde? Quelques phénomènes syntaxiques indicatifs: The faus franceis d’Angleterre as an L2? – Some Distinctive Syntactic Features. Revue des Langues Romanes 123: 45–68.

    Google Scholar 

  • ———. 2022. Anglo-Französisch als Zweitsprache: Eine quantitative Untersuchung von Sprachkontakteinflüssen anhand der Lexikalisierung von Bewegungsereignissen. Unpublished Habilitation thesis, University of Stuttgart.

    Google Scholar 

  • Schneider, E. W. 2003. The Dynamics of New Englishes: From Identity Construction to Dialect Birth. Language 79: 233–281.

    Google Scholar 

  • ———. 2007. Postcolonial English: Varieties around the World. Cambridge University Press. https://doi.org/10.1017/CBO9780511618901

  • Scragg, Donald G. 2012. A Conspectus of Scribal Hands Writing English 960–1100, Publications of the Manchester Centre for Anglo-Saxon Studies, 11. Cambridge: D. S. Brewer.

    Google Scholar 

  • Smithies, James, Carina Westling, Anna-Maria Sichani, Pamela Mellen, and Arianna Ciula. 2019. Managing 100 Digital Humanities Projects: Digital Scholarship and Archiving in King’s Digital Lab. Digital Humanities Quarterly 13/1. http://www.digitalhumanities.org/dhq/vol/13/1/000411/000411.html. Accessed 14 December 2021.

  • Steinbach, Markus. 2002. Middle Voice: A Comparative Study in the Syntax-Semantics Interface of German, Linguistics Today, 50. Amsterdam: John Benjamins.

    Book  Google Scholar 

  • Stokes, Peter A. 2009. The Digital Dictionary. Florilegium 26: 37–69.

    Article  Google Scholar 

  • ———. 2011. The Problem of Grade in Post-Conquest Vernacular Minuscule. New Medieval Literatures 13: 23–47.

    Article  Google Scholar 

  • ———. 2020a. Cambridge, Corpus Christi College, 367 Part II: A Study in (Digital) Codicology. In Medieval Manuscripts in the Digital Age, ed. Benjamin Albritton, Georgia Henley, and Elaine Treharne, 64–73. Abingdon: Routledge.

    Google Scholar 

  • ———. 2020b. Palaeography, Codicology and Stemmatollogy. In Handbook of Stemmatology, ed. Philipp Roelli, 46–56. Berlin: De Gruyter.

    Google Scholar 

  • Stokes, Peter A., and Geoffroy Noël. 2019. Exon Domesday: Méthodes numériques appliquées à la codicologie pour l’étude d’un manuscrit anglo-normand. Tabularia [En ligne]. https://doi.org/10.4000/tabularia.4118.

  • Stokes, Peter A., and Elena Pierazzo. 2009. Encoding the Language of Landscape: XML and Databases at the Service of Anglo-Saxon Lexicography. In Perspectives on Lexicography in Italy and Europe, ed. Silvia Bruti, Roberta Cella, and Marina F. Albert, 203–239. Newcastle upon Tyne: Cambridge Scholars.

    Google Scholar 

  • Stutzmann, Dominique. 2016. Clustering of Medieval Scripts through Computer Image Analysis: Towards an Evaluation Protocol. Digital Medievalist [On line] 10. https://doi.org/10.16995/dm.61.

  • Taylor, Ann. 2008. Contact Effects of Translation: Distinguishing Two Kinds of Influence in Old English. Language Variation and Change 20: 341–365.

    Article  Google Scholar 

  • Trips, Carola. 2002. From OV to VO in Early Middle English. Amsterdam: John Benjamins.

    Book  Google Scholar 

  • ———. 2020. Impersonal and Reflexive Uses of Middle English Psych Verbs under Contact Influence with Old French. Linguistics Vanguard 6 (Special Issue on Language Contact, ed. N. Lavidas and A. Bergs). https://doi.org/10.1515/lingvan-2019-0016.

  • Trips, Carola, and Achim, Stein. 2008. Was French -able Borrowable? A Diachronic Study of Word-Formation Processes Due to Language Contact. In English Historical Linguistics 2006, II: Lexical and Semantic Change, ed. Richard Dury and Maurizio Gotti, 217–41. Amsterdam/Philadelphia: John Benjamins.

    Google Scholar 

  • Trotter, David, ed. 2000. Multilingualism in Later Medieval Britain. Cambridge: D.S. Brewer.

    Google Scholar 

  • Truswell, Robert, Rhona Alcorn, James Donaldson, and Joel Wallenberg. 2019. A Parsed Linguistic Atlas of Early Middle English. In Historical Dialectology in the Digital Age, ed. Rhona Alcorn, Johanna Kopaczyk, Bettelou Los, and Benjamin Molineaux, 19–38. Edinburgh: Edinburgh University Press.

    Chapter  Google Scholar 

  • Visser, Fredericus T. 1963. An Historical Syntax of the English Language. Leiden: Brill.

    Google Scholar 

  • ———. 2002. An Historical Syntax of the English Language, 4th edn. Leiden: Brill.

    Book  Google Scholar 

Digital Resources and Websites (Last Verified 28 February 2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carola Trips .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Trips, C., Stokes, P.A. (2023). From Original Sources to Linguistic Analysis: Tools and Datasets for the Investigation of Multilingualism in Medieval English. In: Pons-Sanz, S.M., Sylvester, L. (eds) Medieval English in a Multilingual Context. New Approaches to English Historical Linguistics . Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-031-30947-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30947-2_3

  • Published:

  • Publisher Name: Palgrave Macmillan, Cham

  • Print ISBN: 978-3-031-30946-5

  • Online ISBN: 978-3-031-30947-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics