The role of semantic class in English base/-ly pairs. A distributional analysis
The scripts and data here accompany the paper "Splitting ‐ly's: using word embeddings to distinguish derivation and inflection". Python scripts are used to calculate the cosine similarities. R scripts are used for the statistical analysis and to produce the figures. More information on the files is provided in the README file.
The scripts build on pretrained vectorspaces published with Levy &
Goldberg (2014) and Mikolov et al. (2017), see the links below.
Required pretrained vectorspaces [Links last checked 2022-03-03]:
bow and deps vectorspaces (Study 1 and Study 2):
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
There:
Dependency-Based [words]
http://u.cs.biu.ac.il/~yogo/data/syntemb/deps.words.bz2
Bag of Words (k = 5) [words]
http://u.cs.biu.ac.il/~yogo/data/syntemb/bow5.words.bz2
fasttext vectorspace with subword information:
File "wiki-news-300d-1M-subword.vec.zip" from
https://fasttext.cc/docs/en/english-vectors.html
References
Levy, O. and Y. Goldberg (2014, June). Dependency-based word embeddings. In Pro-
ceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
(Volume 2: Short Papers), Baltimore, Maryland, pp. 302–308. Association for Computa-
tional Linguistics.
Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2017). Advances in pre-
training distributed word representations. CoRR abs/1712.09405.
Schäfer, Martin (accepted). Splitting ‐ly's: using word embeddings to
distinguish derivation and inflection. To appear In: Kotowski, S. & I. Plag (eds.), Semantics of derivation. Linguistische Arbeiten. Berlin, New York: de Gruyter.