Abstract
Words, sentences, and paragraphs are the basis of texts. When we consider texts as data and want to establish a relationship between qualitative and quantitative perspectives, we can do this with the word frequencies in a text. We aim to examine to what extent the relative frequencies of the words differ in Turkish and English scientific articles. Using R software, 120 articles, including Turkish-Social Sciences, Turkish-Science, English-Social Sciences, and English-Science articles, were analyzed in terms of word frequencies by using a random sampling technique. The articles were analyzed based on the relative frequencies of the first 20 frequently used words. It was determined that the word frequencies in the four different categories we examined descended from top left to right down similarly. Still, the relative frequency averages of the first four words in the Turkish-Social Sciences category were very different from the other three groups. In addition, the number of words in English articles is higher than in Turkish articles in terms of average and variability higher. This situation shows that there is an excessive focusing problem in Turkish-Social Sciences articles. The results of excessive focus in Turkish-Social Sciences articles can be seen in the way that the articles tend to be focused on a single topic, and they often lack a broader perspective. This can lead to several problems, including a lack of understanding of the issues at hand, and a lack of ability to see the bigger picture. Additionally, this focus can also lead to a lack of objectivity and a lack of critical thinking.
Similar content being viewed by others
Notes
There are many phenomena that can be explained by power law distributions, one of the most famous being the 80:20 rule. Also known as the Pareto principle, this rule was conceptualized by J. M. Juran and states that 20% of causes lead to 80% of phenomena. It should be noted that the 80:20 ratio only corresponds to one specific value of the power-law exponent, a = 2.16 (Milojević 2010).
p-Value was found as 0.702 so there is no statistically significant difference, taking significance level as 0.05.
References
Alhawarat, M., Hegazi, M., Hilal, A.: Processing the text of the holy Quran: a text mining study. Int. J. Adv. Comput. Sci. Appl. 6(2), 262–267 (2015)
Aureli, S.: A comparison of content analysis usage and text mining in csr corporate disclosure. Int. J. Digit. Account. Res. 17, 1–32 (2017). https://doi.org/10.4192/1577-8517-v17_1
Azam, N., Yao, JingTao: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39(5), 4760–4768 (2012). https://doi.org/10.1016/j.eswa.2011.09.160
Baus, C., Strijkers, K., Costa, A.: When does word frequency influence written production? Front. Psychol. (2013). https://doi.org/10.3389/fpsyg.2013.00963
Bozkurt, O., Nida: Academics’ opinions regarding the quality of scientific publications and their quality problems. J. High. Educ. Sci. 11(1), 128–137 (2021). https://doi.org/10.5961/jhes.2021.435
Brysbaert, M., Mandera, P., Keuleers, E.: The word frequency effect in word processing: an updated review. Curr. Dir. Psychol. Sci. 27(1), 45–50 (2018). https://doi.org/10.1177/0963721417727521
Çelik, S.: Metin Madenciliği Ile Shakespeare Külliyatının Incelenmesi. MANAS Sosyal Araştırmalar Dergisi (2020). https://doi.org/10.33206/mjss.561919
Corral, Á., Boleda, G., Ferrer-i-Cancho, R.: “Zipf’s Law for word frequencies: word forms versus lemmas in long texts” edited by B. Jiang. PLOS ONE 10(7), e0129031 (2015). https://doi.org/10.1371/journal.pone.0129031
Coşkun, R.: “Türkçe Nitel Araştırmalarda Nitelik Sorunu: Nitel Araştırmalar Ne Kadar Bilimsel?” In 6th International Congress of Multidisciplinary Studies: Multicongress. Hasan Kalyoncu University, Gaziantep (2019)
Dale, R.: GPT-3: what’s it good for? Nat. Lang. Eng. 27(1), 113–118 (2021). https://doi.org/10.1017/S1351324920000601
Day, A., Peters, J.: Quality indicators in academic publishing. Lib. Rev. 43(7), 4–72 (1994). https://doi.org/10.1108/00242539410068015
Demircioğlu, M.Y.: İdari Yargı Kararları Çerçevesinde Bilimsel Yayın Etiği Soruşturmaları. Ankara Barosu Dergisi 1, 145–218 (2014)
Feldman, R., Dagan, I., Hirsh, H.: Mining text using keyword distributions. J. Intell. Inf. Syst. 10, 281–300 (1998)
Gentzkow, M., Kelly, B., Taddy, M.: Text as data. J. Econ. Lit. 57(3), 535–574 (2019). https://doi.org/10.1257/JEL.20181020
Haight, W.L., Taylor, E.H.: Human behavior for social work practice, 2nd edn. Oxford University Press (2013)
Halpern, M., O’Rourke, M.: Power in science communication collaborations. J. Sci. Commun. 19(04), C02 (2020). https://doi.org/10.22323/2.19040302
Hao, Qi.: Relation-ontology driven topic classification. (2020)
Jacobs, C., Tullis, J.G., Undorf, M., Cao, L., Li, W., Jia, X., Li, P., Li, X., Zhang, Y., Cao, W.: The effect of word frequency on judgments of learning: contributions of beliefs and processing fluency. Front. Psychol (2016). https://doi.org/10.3389/fpsyg.2015.01995
Kettunen, K.: Can type-token ratio be used to show morphological complexity of languages? J. Quant. Linguist. 21(3), 223–245 (2014). https://doi.org/10.1080/09296174.2014.911506
Kim, S.-W., Gil, J.-M.: research paper classification systems based on TF-IDF and LDA schemes. HCIS 9(1), 30 (2019). https://doi.org/10.1186/s13673-019-0192-7
Mendes, P.S., Luna Pedro, K., Albuquerque, B.B.: Word frequency effects on judgments of learning: more than just beliefs. J. Gen. Psychol. 148(2), 124–148 (2021). https://doi.org/10.1080/00221309.2019.1706073
Milojević, S.: Power law distributions in information science: making the case for logarithmic binning. J. Am. Soc. Inform. Sci. Technol. 61(12), 2417–2425 (2010). https://doi.org/10.1002/asi.21426
Özdoğan, A.G., Turan, M.: English document classification using text mining. J. Technol. Applied Sci. 2(1), 37–46 (2019)
Özlem, E.: Metin Madenciliği Yaklaşımıyla Işverenlerin Nitelik Taleplerinin Incelenmesi. İstanbul Ticaret Üniversitesi Sosyal Bilimler Dergisi 20(40), 138–157 (2021). https://doi.org/10.46928/iticusbe.763191
Öztoprak, N.: Türkiye’de Sosyal Bilimlerde Akademik Dergiciliğin Meseleleri Çalıştayi Raporu. Türk Kültürü İncelemeleri Dergisi 43, 475–488 (2020)
Piantadosi, S.T.: “Zipf’s word frequency law in natural language: a critical review and future directions. Psychonomic Bull. Rev. 21(5), 1112–30 (2014). https://doi.org/10.3758/s13423-014-0585-6
Pournia, Y.: A study on the most frequent academic words in high impact factor english nursing journals: a corpus-based study. Iran. J. Nurs. Midwifery Res. 24(1), 11 (2019). https://doi.org/10.4103/ijnmr.IJNMR_190_17
Riviere, M., Duprez, V., Dufoort, H., van Hecke, A., Beeckman, D., Verhaeghe, S., Deschodt, M.: The interpersonal care relationship between nurses and older patients: a cross-sectional study in three hospitals. J. Adv. Nurs. (2022). https://doi.org/10.1111/JAN.15182
Sinha, D.: The social sciences in a global age: decoding knowledge politics. Routledge India (2020). https://doi.org/10.4324/9781003110316
Slavec, A., Vehovar, V.: The role of word frequencies in detecting unfamiliar terms and their effect on response quality. Psihologija 48(4), 327–344 (2015). https://doi.org/10.2298/PSI1504327S
Sonkaya, Z.Z.: The examination of scientific language in academic manuscripts on the fields of social and phsical ciences. J. Turk. Lang. Lit. Surv. 5(2), 233–241 (2020)
Vongpumivitch, V., yu Huang, J., Chang, Y.C.: Frequency analysis of the words in the academic word list (AWL) and non-awl content words in applied linguistics research papers. Engl. Specif. Purp. 28(1), 33–41 (2009). https://doi.org/10.1016/J.ESP.2008.08.003
Yusupova, N.I., Bogdanova, D.R., Komendantova, N.P.: Artificial intelligence tools for analyzing emotionally colored information from customer reviews in the service sector. IOP Conf. Series Mater. Sci. Eng. 1069(1), 012013 (2021). https://doi.org/10.1088/1757-899X/1069/1/012013
Zipf George, K.: Selected studies of the principle of relative frequency in language. Sel. Stud. Princ. Relat. Freq. Lang. (1932). https://doi.org/10.4159/HARVARD.9780674434929/HTML
Zou, J.B., Hudson, J.L., Rapee, R.M.: The effect of attentional focus on social anxiety. Behav. Res. Ther. 45(10), 2326–2333 (2007). https://doi.org/10.1016/J.BRAT.2007.03.014
Adamic, Lada A. 2000. “Zipf, Power-Laws, and Pareto-a Ranking Tutorial.” Information Dynamics Lab, HP Labs Palo Alto, CA 94304
Chen, Xiaobin, D Meurers: “Characterizing Text Difficulty with Word Frequencies.” Pp. 84–94 in Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. California: Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. (2016)
Chung, Cindy, J Pennebaker: “The Psychological Functions of Function Words.” Pp. 343–59 in Social Communication , edited by K. Fiedler. Psychology Press (2011)
Çimen, Hacer, E Çimen: “International Academic Publications and Turkey’s Scientific Productivity.” Pp. 145–62 in, edited by A. Yıldızeli and H. Bahşişoğlu. ÜNAK, (2006)
Coats, Steven. 2020. “Comparing Word Frequencies and Lexical Diversity with the ZipfExplorer Tool.”
GeeksforGeeks. 2021. “Natural Language Processing - Overview.” Retrieved March 21, 2022 (https://origin.geeksforgeeks.org/natural-language-processing-overview/).
Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2021. “HistWords: Word Embeddings for Historical Text.” Retrieved January 1, 2022 (https://nlp.stanford.edu/projects/histwords/).
İlhan, Sevinç, Nevcihan Duru, Şenol Karagöz, and Merve Sağır. 2008. “Metin Madenciliği Ile Soru Cevaplama Sistemi.” Pp. 26–30 in Elektronik ve Bilgisayar Mühendisliği Sempozyumu (ELECO). Bursa.
İnci, Osman. 2009. “Bilimsel Yayın Etiği İlkeleri, Yanıltmalar Yanıltmaları Önlemeye Yönelik Öneriler.” Sağlık Bilimlerinde Süreli Yayıncılık 69–89.
Kadhim, Ammar Ismael, Yu-N. Cheah, and Nurul Hashimah Ahamed. 2014. “Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering.” Pp. 69–73 in 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology. IEEE.
Maximova, Alina. 2020. “Cool Is the New Black»: An Investigation of Some Drivers and Outcomes of Brand Coolness in Luxury Fashion Realm and Analysis of the Influence of Power Distance on the Perception of Coolness across Three Cultural Identities: Anglo-Saxon.” Lisboa.
Nakov, Preslav, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2019. SemEval-2016 Task 4: Sentiment Analysis in Twitter. https://doi.org/10.48550/arXiv.1912.01973.
Olkun, Sinan. 2006. “Eğitim Ile Lgili Uluslararası Bilimsel Dergilerde Yayın Yapma Süreci: Fırsatlar, Sorunlar ve Çözüm Önerileri.” I. Ulusal Sosyal Bilimlerde Süreli Yayıncılık Kurultayı.
Sebe, N., Ira Cohen, Ashutosh Garg, and Thomas S. Huang. 2005. Machine Learning in Computer Vision. Springer.
Seker, Sadi Evren. 2015. “Metin Madenciliği (Text Mining).” YBS Ansiklopedi 2(2).
Stefaner, Moritz, Laraine Daston, and Jen Christiansen. 2020. “The Language of Science - Scientific American.” Retrieved January 1, 2022 (https://www.scientificamerican.com/article/the-language-of-science/).
Yaping, Lei, ed. 2020. International Journal of Advanced Network, Monitoring and Controls. Vol. 5. Xi’an Technological University.
Funding
The authors received no specific funding for this work.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. NG: Conceptualization, Writing—review and editing. SÖ: Data curation, Software, Formal Analysis, Visualization. SÇ: Conceptualization, Data curation, Writing—review and editing, Methodology.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
All ethical standards has followed in this research paper. No formal approval is required.
Consent to participate
The research is not on human and animal subjects.
Consent to publication
We are willing to publish the research paper in Quality & Quantity.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gürsakal, N., Çelik, S. & Özdemir, S. High-frequency words have higher frequencies in Turkish social sciences article. Qual Quant 57, 1865–1887 (2023). https://doi.org/10.1007/s11135-022-01444-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-022-01444-3