Abstract
The article describes general regularities of frequency dynamics of syntactic bigrams and the method used to analyse them. The work objective is to quantitatively estimate the typical rate of change in frequency of syntactic bigrams in English and Russian. Both changes in frequency of words contained in syntactic bigrams and changes in the co-occurrence of these words influence the total rate of changes in frequency of syntactic bigrams. Their contribution to the total rate of frequency changes was estimated using decomposition of the Kullback-Leibler symmetrized divergence. It was also determined to what extent frequencies of the syntactic bigrams respond to major social events. Data on frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram were used as a study material. It was found that the regularities of the syntactic bigram usage are similar in English and Russian. The proposed approach can be used in other fields of science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ng, V., Cardie, C.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 104–111 (2014)
Michel, J.-B., Shen, Y., Aiden, A., Veres, A., Gray, M., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Gerlach, M., Altmann, E.: Stochastic model for the vocabulary growth in natural languages. Phys. Rev. X 10(3), 021006 (2013)
Hilpert, M., Gries, S.: Assessing frequency changes in multistage diachronic corpora: applications for historical corpus linguistics and the study of language acquisition. Lit. Linguist. Comput. 24(4), 385–401 (2009)
Petersen, A.M., Tenenbaum, J.N., Havlin, S., Stanley, H.E., Perc, M.: Languages cool as they expand: allometric scaling and the decreasing need for new words. Sci. Rep. 2, 943 (2012). PMID 23230508
Juola, P.: Using the Google N-Gram corpus to measure cultural complexity. Lit. Linguist. Comput. 28(4), 668–675 (2013)
Bochkarev, V., Solovyev, V., Shevlyakova, A.: Analysis of dynamics of the number of syntactic dependencies in Russian and English using Google Books Ngram. In: CEUR Workshop Proceedings, vol. 2303, pp. 14–25 (2018)
Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguistics. 33(2), 161–199 (2007)
Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic dependency-based N-grams as classification features. In: Batyrshin, I., Mendoza, M.G. (eds.) MICAI 2012. LNCS (LNAI), vol. 7630, pp. 1–11. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37798-3_1
Bochkarev, V., Solovyev, V., Wichmann, S.: Universals versus historical contingencies in lexical evolution. J. R. Soc. Interface 11, 20140841 (2014)
Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google Books Ngram corpus. In: Li, H., Lin, C.-Y., Osborne, M., Lee, G.G., Park, J.C. (eds.) 2012 Proceedings of the Conference on 50th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 238–242. Association for Computational Linguistics, Jeju Island (2012)
Buntinx, V., Bornet, C., Kaplan, F.: Studying linguistic changes over 200 years of newspapers through resilient words analysis. Front. Digit. Hum. 4, 1–10 (2017)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Solovyev, V., Bochkarev, V., Shevlyakova, A.: Dynamics of core of language vocabulary. In: CEUR Workshop Proceedings, vol. 1886, pp. 122–129 (2016)
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Acknowledgements
This research was financially supported by the Russian Government Program of Competitive Growth of Kazan Federal University, state assignment of Ministry of Education and Science, grant agreement â„– 34.5517.2017/6.7, and by RFBR, grant â„– 17-29-09163.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bochkarev, V., Solovyev, V., Shevlyakova, A. (2019). A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian. In: MartÃnez-Villaseñor, L., Batyrshin, I., MarÃn-Hernández, A. (eds) Advances in Soft Computing. MICAI 2019. Lecture Notes in Computer Science(), vol 11835. Springer, Cham. https://doi.org/10.1007/978-3-030-33749-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-33749-0_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33748-3
Online ISBN: 978-3-030-33749-0
eBook Packages: Computer ScienceComputer Science (R0)