Skip to main content
Log in

High-frequency words have higher frequencies in Turkish social sciences article

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Words, sentences, and paragraphs are the basis of texts. When we consider texts as data and want to establish a relationship between qualitative and quantitative perspectives, we can do this with the word frequencies in a text. We aim to examine to what extent the relative frequencies of the words differ in Turkish and English scientific articles. Using R software, 120 articles, including Turkish-Social Sciences, Turkish-Science, English-Social Sciences, and English-Science articles, were analyzed in terms of word frequencies by using a random sampling technique. The articles were analyzed based on the relative frequencies of the first 20 frequently used words. It was determined that the word frequencies in the four different categories we examined descended from top left to right down similarly. Still, the relative frequency averages of the first four words in the Turkish-Social Sciences category were very different from the other three groups. In addition, the number of words in English articles is higher than in Turkish articles in terms of average and variability higher. This situation shows that there is an excessive focusing problem in Turkish-Social Sciences articles. The results of excessive focus in Turkish-Social Sciences articles can be seen in the way that the articles tend to be focused on a single topic, and they often lack a broader perspective. This can lead to several problems, including a lack of understanding of the issues at hand, and a lack of ability to see the bigger picture. Additionally, this focus can also lead to a lack of objectivity and a lack of critical thinking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. There are many phenomena that can be explained by power law distributions, one of the most famous being the 80:20 rule. Also known as the Pareto principle, this rule was conceptualized by J. M. Juran and states that 20% of causes lead to 80% of phenomena. It should be noted that the 80:20 ratio only corresponds to one specific value of the power-law exponent, a = 2.16 (Milojević 2010).

  2. p-Value was found as 0.702 so there is no statistically significant difference, taking significance level as 0.05.

References

  • Alhawarat, M., Hegazi, M., Hilal, A.: Processing the text of the holy Quran: a text mining study. Int. J. Adv. Comput. Sci. Appl. 6(2), 262–267 (2015)

    Google Scholar 

  • Aureli, S.: A comparison of content analysis usage and text mining in csr corporate disclosure. Int. J. Digit. Account. Res. 17, 1–32 (2017). https://doi.org/10.4192/1577-8517-v17_1

    Article  Google Scholar 

  • Azam, N., Yao, JingTao: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39(5), 4760–4768 (2012). https://doi.org/10.1016/j.eswa.2011.09.160

    Article  Google Scholar 

  • Baus, C., Strijkers, K., Costa, A.: When does word frequency influence written production? Front. Psychol. (2013). https://doi.org/10.3389/fpsyg.2013.00963

    Article  Google Scholar 

  • Bozkurt, O., Nida: Academics’ opinions regarding the quality of scientific publications and their quality problems. J. High. Educ. Sci. 11(1), 128–137 (2021). https://doi.org/10.5961/jhes.2021.435

    Article  Google Scholar 

  • Brysbaert, M., Mandera, P., Keuleers, E.: The word frequency effect in word processing: an updated review. Curr. Dir. Psychol. Sci. 27(1), 45–50 (2018). https://doi.org/10.1177/0963721417727521

    Article  Google Scholar 

  • Çelik, S.: Metin Madenciliği Ile Shakespeare Külliyatının Incelenmesi. MANAS Sosyal Araştırmalar Dergisi (2020). https://doi.org/10.33206/mjss.561919

    Article  Google Scholar 

  • Corral, Á., Boleda, G., Ferrer-i-Cancho, R.: “Zipf’s Law for word frequencies: word forms versus lemmas in long texts” edited by B. Jiang. PLOS ONE 10(7), e0129031 (2015). https://doi.org/10.1371/journal.pone.0129031

    Article  Google Scholar 

  • Coşkun, R.: “Türkçe Nitel Araştırmalarda Nitelik Sorunu: Nitel Araştırmalar Ne Kadar Bilimsel?” In 6th International Congress of Multidisciplinary Studies: Multicongress. Hasan Kalyoncu University, Gaziantep (2019)

    Google Scholar 

  • Dale, R.: GPT-3: what’s it good for? Nat. Lang. Eng. 27(1), 113–118 (2021). https://doi.org/10.1017/S1351324920000601

    Article  Google Scholar 

  • Day, A., Peters, J.: Quality indicators in academic publishing. Lib. Rev. 43(7), 4–72 (1994). https://doi.org/10.1108/00242539410068015

    Article  Google Scholar 

  • Demircioğlu, M.Y.: İdari Yargı Kararları Çerçevesinde Bilimsel Yayın Etiği Soruşturmaları. Ankara Barosu Dergisi 1, 145–218 (2014)

    Google Scholar 

  • Feldman, R., Dagan, I., Hirsh, H.: Mining text using keyword distributions. J. Intell. Inf. Syst. 10, 281–300 (1998)

    Article  Google Scholar 

  • Gentzkow, M., Kelly, B., Taddy, M.: Text as data. J. Econ. Lit. 57(3), 535–574 (2019). https://doi.org/10.1257/JEL.20181020

    Article  Google Scholar 

  • Haight, W.L., Taylor, E.H.: Human behavior for social work practice, 2nd edn. Oxford University Press (2013)

    Google Scholar 

  • Halpern, M., O’Rourke, M.: Power in science communication collaborations. J. Sci. Commun. 19(04), C02 (2020). https://doi.org/10.22323/2.19040302

    Article  Google Scholar 

  • Hao, Qi.: Relation-ontology driven topic classification. (2020)

    Google Scholar 

  • Jacobs, C., Tullis, J.G., Undorf, M., Cao, L., Li, W., Jia, X., Li, P., Li, X., Zhang, Y., Cao, W.: The effect of word frequency on judgments of learning: contributions of beliefs and processing fluency. Front. Psychol (2016). https://doi.org/10.3389/fpsyg.2015.01995

    Article  Google Scholar 

  • Kettunen, K.: Can type-token ratio be used to show morphological complexity of languages? J. Quant. Linguist. 21(3), 223–245 (2014). https://doi.org/10.1080/09296174.2014.911506

    Article  Google Scholar 

  • Kim, S.-W., Gil, J.-M.: research paper classification systems based on TF-IDF and LDA schemes. HCIS 9(1), 30 (2019). https://doi.org/10.1186/s13673-019-0192-7

    Article  Google Scholar 

  • Mendes, P.S., Luna Pedro, K., Albuquerque, B.B.: Word frequency effects on judgments of learning: more than just beliefs. J. Gen. Psychol. 148(2), 124–148 (2021). https://doi.org/10.1080/00221309.2019.1706073

    Article  Google Scholar 

  • Milojević, S.: Power law distributions in information science: making the case for logarithmic binning. J. Am. Soc. Inform. Sci. Technol. 61(12), 2417–2425 (2010). https://doi.org/10.1002/asi.21426

    Article  Google Scholar 

  • Özdoğan, A.G., Turan, M.: English document classification using text mining. J. Technol. Applied Sci. 2(1), 37–46 (2019)

    Google Scholar 

  • Özlem, E.: Metin Madenciliği Yaklaşımıyla Işverenlerin Nitelik Taleplerinin Incelenmesi. İstanbul Ticaret Üniversitesi Sosyal Bilimler Dergisi 20(40), 138–157 (2021). https://doi.org/10.46928/iticusbe.763191

    Article  Google Scholar 

  • Öztoprak, N.: Türkiye’de Sosyal Bilimlerde Akademik Dergiciliğin Meseleleri Çalıştayi Raporu. Türk Kültürü İncelemeleri Dergisi 43, 475–488 (2020)

    Google Scholar 

  • Piantadosi, S.T.: “Zipf’s word frequency law in natural language: a critical review and future directions. Psychonomic Bull. Rev. 21(5), 1112–30 (2014). https://doi.org/10.3758/s13423-014-0585-6

    Article  Google Scholar 

  • Pournia, Y.: A study on the most frequent academic words in high impact factor english nursing journals: a corpus-based study. Iran. J. Nurs. Midwifery Res. 24(1), 11 (2019). https://doi.org/10.4103/ijnmr.IJNMR_190_17

    Article  Google Scholar 

  • Riviere, M., Duprez, V., Dufoort, H., van Hecke, A., Beeckman, D., Verhaeghe, S., Deschodt, M.: The interpersonal care relationship between nurses and older patients: a cross-sectional study in three hospitals. J. Adv. Nurs. (2022). https://doi.org/10.1111/JAN.15182

    Article  Google Scholar 

  • Sinha, D.: The social sciences in a global age: decoding knowledge politics. Routledge India (2020). https://doi.org/10.4324/9781003110316

    Book  Google Scholar 

  • Slavec, A., Vehovar, V.: The role of word frequencies in detecting unfamiliar terms and their effect on response quality. Psihologija 48(4), 327–344 (2015). https://doi.org/10.2298/PSI1504327S

    Article  Google Scholar 

  • Sonkaya, Z.Z.: The examination of scientific language in academic manuscripts on the fields of social and phsical ciences. J. Turk. Lang. Lit. Surv. 5(2), 233–241 (2020)

    Google Scholar 

  • Vongpumivitch, V., yu Huang, J., Chang, Y.C.: Frequency analysis of the words in the academic word list (AWL) and non-awl content words in applied linguistics research papers. Engl. Specif. Purp. 28(1), 33–41 (2009). https://doi.org/10.1016/J.ESP.2008.08.003

    Article  Google Scholar 

  • Yusupova, N.I., Bogdanova, D.R., Komendantova, N.P.: Artificial intelligence tools for analyzing emotionally colored information from customer reviews in the service sector. IOP Conf. Series Mater. Sci. Eng. 1069(1), 012013 (2021). https://doi.org/10.1088/1757-899X/1069/1/012013

    Article  Google Scholar 

  • Zipf George, K.: Selected studies of the principle of relative frequency in language. Sel. Stud. Princ. Relat. Freq. Lang. (1932). https://doi.org/10.4159/HARVARD.9780674434929/HTML

    Article  Google Scholar 

  • Zou, J.B., Hudson, J.L., Rapee, R.M.: The effect of attentional focus on social anxiety. Behav. Res. Ther. 45(10), 2326–2333 (2007). https://doi.org/10.1016/J.BRAT.2007.03.014

    Article  Google Scholar 

  • Adamic, Lada A. 2000. “Zipf, Power-Laws, and Pareto-a Ranking Tutorial.” Information Dynamics Lab, HP Labs Palo Alto, CA 94304

  • Chen, Xiaobin, D Meurers: “Characterizing Text Difficulty with Word Frequencies.” Pp. 84–94 in Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. California: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. (2016)

  • Chung, Cindy, J Pennebaker: “The Psychological Functions of Function Words.” Pp. 343–59 in Social Communication , edited by K. Fiedler. Psychology Press (2011)

  • Çimen, Hacer, E Çimen: “International Academic Publications and Turkey’s Scientific Productivity.” Pp. 145–62 in, edited by A. Yıldızeli and H. Bahşişoğlu. ÜNAK, (2006)

  • Coats, Steven. 2020. “Comparing Word Frequencies and Lexical Diversity with the ZipfExplorer Tool.”

  • GeeksforGeeks. 2021. “Natural Language Processing - Overview.” Retrieved March 21, 2022 (https://origin.geeksforgeeks.org/natural-language-processing-overview/).

  • Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2021. “HistWords: Word Embeddings for Historical Text.” Retrieved January 1, 2022 (https://nlp.stanford.edu/projects/histwords/).

  • İlhan, Sevinç, Nevcihan Duru, Şenol Karagöz, and Merve Sağır. 2008. “Metin Madenciliği Ile Soru Cevaplama Sistemi.” Pp. 26–30 in Elektronik ve Bilgisayar Mühendisliği Sempozyumu (ELECO). Bursa.

  • İnci, Osman. 2009. “Bilimsel Yayın Etiği İlkeleri, Yanıltmalar Yanıltmaları Önlemeye Yönelik Öneriler.” Sağlık Bilimlerinde Süreli Yayıncılık 69–89.

  • Kadhim, Ammar Ismael, Yu-N. Cheah, and Nurul Hashimah Ahamed. 2014. “Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering.” Pp. 69–73 in 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology. IEEE.

  • Maximova, Alina. 2020. “Cool Is the New Black»: An Investigation of Some Drivers and Outcomes of Brand Coolness in Luxury Fashion Realm and Analysis of the Influence of Power Distance on the Perception of Coolness across Three Cultural Identities: Anglo-Saxon.” Lisboa.

  • Nakov, Preslav, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2019. SemEval-2016 Task 4: Sentiment Analysis in Twitter. https://doi.org/10.48550/arXiv.1912.01973.

  • Olkun, Sinan. 2006. “Eğitim Ile Lgili Uluslararası Bilimsel Dergilerde Yayın Yapma Süreci: Fırsatlar, Sorunlar ve Çözüm Önerileri.” I. Ulusal Sosyal Bilimlerde Süreli Yayıncılık Kurultayı.

  • Sebe, N., Ira Cohen, Ashutosh Garg, and Thomas S. Huang. 2005. Machine Learning in Computer Vision. Springer.

  • Seker, Sadi Evren. 2015. “Metin Madenciliği (Text Mining).” YBS Ansiklopedi 2(2).

  • Stefaner, Moritz, Laraine Daston, and Jen Christiansen. 2020. “The Language of Science - Scientific American.” Retrieved January 1, 2022 (https://www.scientificamerican.com/article/the-language-of-science/).

  • Yaping, Lei, ed. 2020. International Journal of Advanced Network, Monitoring and Controls. Vol. 5. Xi’an Technological University.

Download references

Funding

The authors received no specific funding for this work.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. NG: Conceptualization, Writing—review and editing. SÖ: Data curation, Software, Formal Analysis, Visualization. SÇ: Conceptualization, Data curation, Writing—review and editing, Methodology.

Corresponding author

Correspondence to Sadullah Çelik.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All ethical standards has followed in this research paper. No formal approval is required.

Consent to participate

The research is not on human and animal subjects.

Consent to publication

We are willing to publish the research paper in Quality & Quantity.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gürsakal, N., Çelik, S. & Özdemir, S. High-frequency words have higher frequencies in Turkish social sciences article. Qual Quant 57, 1865–1887 (2023). https://doi.org/10.1007/s11135-022-01444-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-022-01444-3

Keywords

Navigation