Skip to main content

A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian

  • Conference paper
  • First Online:
Advances in Soft Computing (MICAI 2019)

Abstract

The article describes general regularities of frequency dynamics of syntactic bigrams and the method used to analyse them. The work objective is to quantitatively estimate the typical rate of change in frequency of syntactic bigrams in English and Russian. Both changes in frequency of words contained in syntactic bigrams and changes in the co-occurrence of these words influence the total rate of changes in frequency of syntactic bigrams. Their contribution to the total rate of frequency changes was estimated using decomposition of the Kullback-Leibler symmetrized divergence. It was also determined to what extent frequencies of the syntactic bigrams respond to major social events. Data on frequencies of syntactic bigrams from the English and Russian sub-corpora of Google Books Ngram were used as a study material. It was found that the regularities of the syntactic bigram usage are similar in English and Russian. The proposed approach can be used in other fields of science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ng, V., Cardie, C.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 104–111 (2014)

    Google Scholar 

  2. Michel, J.-B., Shen, Y., Aiden, A., Veres, A., Gray, M., et al.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Article  Google Scholar 

  3. Gerlach, M., Altmann, E.: Stochastic model for the vocabulary growth in natural languages. Phys. Rev. X 10(3), 021006 (2013)

    Google Scholar 

  4. Hilpert, M., Gries, S.: Assessing frequency changes in multistage diachronic corpora: applications for historical corpus linguistics and the study of language acquisition. Lit. Linguist. Comput. 24(4), 385–401 (2009)

    Article  Google Scholar 

  5. Petersen, A.M., Tenenbaum, J.N., Havlin, S., Stanley, H.E., Perc, M.: Languages cool as they expand: allometric scaling and the decreasing need for new words. Sci. Rep. 2, 943 (2012). PMID 23230508

    Article  Google Scholar 

  6. Juola, P.: Using the Google N-Gram corpus to measure cultural complexity. Lit. Linguist. Comput. 28(4), 668–675 (2013)

    Article  Google Scholar 

  7. Bochkarev, V., Solovyev, V., Shevlyakova, A.: Analysis of dynamics of the number of syntactic dependencies in Russian and English using Google Books Ngram. In: CEUR Workshop Proceedings, vol. 2303, pp. 14–25 (2018)

    Google Scholar 

  8. Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Comput. Linguistics. 33(2), 161–199 (2007)

    Article  Google Scholar 

  9. Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic dependency-based N-grams as classification features. In: Batyrshin, I., Mendoza, M.G. (eds.) MICAI 2012. LNCS (LNAI), vol. 7630, pp. 1–11. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37798-3_1

    Chapter  Google Scholar 

  10. Bochkarev, V., Solovyev, V., Wichmann, S.: Universals versus historical contingencies in lexical evolution. J. R. Soc. Interface 11, 20140841 (2014)

    Article  Google Scholar 

  11. Lin, Y., Michel, J.-B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google Books Ngram corpus. In: Li, H., Lin, C.-Y., Osborne, M., Lee, G.G., Park, J.C. (eds.) 2012 Proceedings of the Conference on 50th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 238–242. Association for Computational Linguistics, Jeju Island (2012)

    Google Scholar 

  12. Buntinx, V., Bornet, C., Kaplan, F.: Studying linguistic changes over 200 years of newspapers through resilient words analysis. Front. Digit. Hum. 4, 1–10 (2017)

    Google Scholar 

  13. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  14. Solovyev, V., Bochkarev, V., Shevlyakova, A.: Dynamics of core of language vocabulary. In: CEUR Workshop Proceedings, vol. 1886, pp. 122–129 (2016)

    Google Scholar 

  15. Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)

    Google Scholar 

Download references

Acknowledgements

This research was financially supported by the Russian Government Program of Competitive Growth of Kazan Federal University, state assignment of Ministry of Education and Science, grant agreement â„– 34.5517.2017/6.7, and by RFBR, grant â„– 17-29-09163.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Bochkarev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bochkarev, V., Solovyev, V., Shevlyakova, A. (2019). A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian. In: Martínez-Villaseñor, L., Batyrshin, I., Marín-Hernández, A. (eds) Advances in Soft Computing. MICAI 2019. Lecture Notes in Computer Science(), vol 11835. Springer, Cham. https://doi.org/10.1007/978-3-030-33749-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33749-0_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33748-3

  • Online ISBN: 978-3-030-33749-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics