figshare
Browse
1/1
2 files

The role of semantic class in English base/-ly pairs. A distributional analysis

Version 2 2022-08-30, 15:45
Version 1 2022-04-09, 16:19
online resource
posted on 2022-08-30, 15:45 authored by Martin SchäferMartin Schäfer

The scripts and data here accompany the paper "Splitting ‐ly's: using word embeddings to distinguish derivation and inflection". Python scripts are used to calculate the cosine similarities. R scripts are used for the statistical analysis and to produce the figures. More information on the files is provided in the README file.

The scripts build on pretrained vectorspaces published with Levy &
Goldberg (2014) and Mikolov et al. (2017), see the links below.


Required pretrained vectorspaces [Links last checked 2022-03-03]:

bow and deps vectorspaces (Study 1 and Study 2):
https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/
There:
Dependency-Based [words]
http://u.cs.biu.ac.il/~yogo/data/syntemb/deps.words.bz2

Bag of Words (k = 5) [words]
http://u.cs.biu.ac.il/~yogo/data/syntemb/bow5.words.bz2

fasttext vectorspace with subword information:
File "wiki-news-300d-1M-subword.vec.zip" from
https://fasttext.cc/docs/en/english-vectors.html


References

Levy, O. and Y. Goldberg (2014, June). Dependency-based word embeddings. In Pro-
ceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
(Volume 2: Short Papers), Baltimore, Maryland, pp. 302–308. Association for Computa-
tional Linguistics.

Mikolov, T., E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2017). Advances in pre-
training distributed word representations. CoRR abs/1712.09405.

Schäfer, Martin (accepted). Splitting ‐ly's: using word embeddings to
distinguish derivation and inflection. To appear In: Kotowski, S. & I. Plag (eds.), Semantics of derivation. Linguistische Arbeiten. Berlin, New York: de Gruyter.

Funding

This work was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – SFB 833 – Project ID 75650358

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC