Published March 4, 2019 | Version v1
Dataset Open

Dataset of Middle Dutch lexical stress patterns and syllabifications

  • 1. University of Antwerp

Contributors

  • 1. University of Antwerp

Description

This dataset consists of 48.219 Middle Dutch words taken from in total 205 rhymed texts of the Cd-rom Middelnederlands (1998). All of these words have been assigned a syllabification and lexical stress pattern.

E.g.: proevede is syllabified as proe-ve-de and has a stress index set at -3, which means that – counting from the rightmost syllable – the third syllable receives stress.

This upload contains the following files:

  • The JSON-file (compressed), which was used as input data for a machine learning algorithm trained for the automatic syllabification and stress assignment of Middle Dutch polysyllabic words (for the code of this experiment, see GitHub)
  • An Excel-file, containing the same data as the JSON (for more convenient reference)
  • A split file (compressed), used in the training proces of the above-mentioned experiment
  • A pdf-file with some insightful illustrations about the contents of the dataset

This dataset is part of the research of Wouter Haverals (FWO, University of Antwerp), carried out under the supervision of prof. Mike Kestemont and em. prof. Frank Willaert.

Files

data_insights.pdf

Files (6.7 MB)

Name Size Download all
md5:4d1605f51b2bcab3f5ac0b2275aef5a8
106.4 kB Preview Download
md5:5ab58aa1fb7e3e7498ac1d2049a7cab4
697.0 kB Preview Download
md5:868f3fd2de2865ee276e4b60c39b3ac3
520.4 kB Preview Download
md5:c0d7683dce0f3c7e4493626b65217624
5.4 MB Download

Additional details

References

  • Cd-rom Middelnederlands (Sdu, 1998)