ABSTRACT
Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing in various fields, including academic research. The rapid proliferation of generated texts has bolstered the need for automatic identification to uphold transparency and trust in the information. However, these generated texts closely resemble human writing and often have subtle differences in the grammatical structure, tones, and patterns, which makes systematic scrutinization challenging. In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded. We extract the text’s semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.
- Taha Bin Arif, Uzair Munaf, and Ibtehaj Ul-Haque. 2023. The future of medical education and research: Is ChatGPT a blessing or blight in disguise?, 2181052 pages.Google Scholar
- Heba Askr, Ashraf Darwish, Aboul Ella Hassanien, and ChatGPT. 2023. The Future of Metaverse in the Virtual Era and Physical World: Analysis and Applications. In The Future of Metaverse in the Virtual Era and Physical World. Springer, 59–75.Google Scholar
- Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: Pretrained Language Model for Scientific Text. In EMNLP. arXiv:arXiv:1903.10676Google Scholar
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.Google ScholarDigital Library
- David Roxbee Cox and E Joyce Snell. 1989. Analysis of binary data. Vol. 32. CRC press.Google Scholar
- Scott A Crossley, Kristopher Kyle, and Mihai Dascalu. 2019. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior research methods 51 (2019), 14–27.Google Scholar
- Scott A Crossley, Kristopher Kyle, and Danielle S McNamara. 2016. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior research methods 48 (2016), 1227–1237.Google Scholar
- Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A Smith, and Yejin Choi. 2021. Is GPT-3 text indistinguishable from human text? SCARECROW: A framework for scrutinizing machine text. arXiv preprint arXiv:2107.01294 (2021).Google Scholar
- Holly Else. 2023. Abstracts written by ChatGPT fool scientists. Nature 613, 7944 (2023), 423–423.Google Scholar
- Ronald A Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 2 (1936), 179–188.Google ScholarCross Ref
- Catherine A Gao, Frederick M Howard, Nikolay S Markov, Emma C Dyer, Siddhi Ramesh, Yuan Luo, and Alexander T Pearson. 2023. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digital Medicine 6, 1 (2023), 75.Google ScholarCross Ref
- Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63 (2006), 3–42.Google Scholar
- Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu. 2023. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597 (2023).Google Scholar
- Ioana-Raluca Johansson. 2023. A Tale of Two Texts, a Robot, and Authorship: a comparison between a human-written and a ChatGPT-generated text.Google Scholar
- Michael R King and ChatGPT. 2023. A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cellular and Molecular Bioengineering 16, 1 (2023), 1–2.Google ScholarCross Ref
- Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin. 2023. Linear Classifier: An Often-Forgotten Baseline for Text Classification. arXiv preprint arXiv:2306.07111 (2023).Google Scholar
- Chung Kwan Lo. 2023. What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences 13, 4 (2023), 410.Google ScholarCross Ref
- Yongqiang Ma, Jiawei Liu, and Fan Yi. 2023. Is this abstract generated by ai? a research for the gap between ai-generated scientific text and human-written scientific text. arXiv preprint arXiv:2301.10416 (2023).Google Scholar
- Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305 (2023).Google Scholar
- Robert Poole, Andrew Gnann, and Gus Hahn-Powell. 2019. Epistemic stance and the construction of knowledge in science writing: A diachronic corpus study. Journal of English for Academic Purposes 42 (2019), 100784.Google ScholarCross Ref
- Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887.Google Scholar
- Sandra P Thomas. 2023. Grappling with the Implications of ChatGPT for Researchers, Clinicians, and Educators., 141–142 pages.Google Scholar
- ChatGPT Generative Pre-trained Transformer and Alex Zhavoronkov. 2022. Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective. Oncoscience 9 (2022), 82.Google ScholarCross Ref
- Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.Google ScholarDigital Library
- Mingxin Yao, Ying Wei, and Huiyu Wang. 2023. Promoting research by reducing uncertainty in academic writing: a large-scale diachronic case study on hedging in Science research articles across 25 years. Scientometrics (2023), 1–18.Google Scholar
- Peipeng Yu, Jiahan Chen, Xuan Feng, and Zhihua Xia. 2023. CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts. arXiv preprint arXiv:2304.12008 (2023).Google Scholar
Index Terms
- Deep dive into language traits of AI-generated Abstracts
Recommendations
Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms
This article investigates approaches to effectively harness source-side linguistic features for low-resource multilingual neural machine translation (MNMT). Previous works focus on using various features of a word such as lemma, part-of-speech tag, ...
Exploring extensive linguistic feature sets in near-synonym lexical choice
CICLing'12: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part IIIn the near-synonym lexical choice task, the best alternative out of a set of near-synonyms is selected to fill a lexical gap in a text. We experiment on an approach of an extensive set, over 650, linguistic features to represent the context of a word, ...
Comparing Academic Papers of Students and Experts in terms of Linguistic Features with Natural Language Processing
ICSET'20: 2020 The 4th International Conference on E-Society, E-Education and E-TechnologyIn graduate education, the quality of academic papers can reflect individual scientific research achievements. This study compared the differences in linguistic features between the papers of experts and graduate students with natural language ...
Comments