skip to main content
10.1145/3632410.3632471acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
short-paper

Deep dive into language traits of AI-generated Abstracts

Published:04 January 2024Publication History

ABSTRACT

Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing in various fields, including academic research. The rapid proliferation of generated texts has bolstered the need for automatic identification to uphold transparency and trust in the information. However, these generated texts closely resemble human writing and often have subtle differences in the grammatical structure, tones, and patterns, which makes systematic scrutinization challenging. In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded. We extract the text’s semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.

References

  1. Taha Bin Arif, Uzair Munaf, and Ibtehaj Ul-Haque. 2023. The future of medical education and research: Is ChatGPT a blessing or blight in disguise?, 2181052 pages.Google ScholarGoogle Scholar
  2. Heba Askr, Ashraf Darwish, Aboul Ella Hassanien, and ChatGPT. 2023. The Future of Metaverse in the Virtual Era and Physical World: Analysis and Applications. In The Future of Metaverse in the Virtual Era and Physical World. Springer, 59–75.Google ScholarGoogle Scholar
  3. Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: Pretrained Language Model for Scientific Text. In EMNLP. arXiv:arXiv:1903.10676Google ScholarGoogle Scholar
  4. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David Roxbee Cox and E Joyce Snell. 1989. Analysis of binary data. Vol. 32. CRC press.Google ScholarGoogle Scholar
  6. Scott A Crossley, Kristopher Kyle, and Mihai Dascalu. 2019. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior research methods 51 (2019), 14–27.Google ScholarGoogle Scholar
  7. Scott A Crossley, Kristopher Kyle, and Danielle S McNamara. 2016. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior research methods 48 (2016), 1227–1237.Google ScholarGoogle Scholar
  8. Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A Smith, and Yejin Choi. 2021. Is GPT-3 text indistinguishable from human text? SCARECROW: A framework for scrutinizing machine text. arXiv preprint arXiv:2107.01294 (2021).Google ScholarGoogle Scholar
  9. Holly Else. 2023. Abstracts written by ChatGPT fool scientists. Nature 613, 7944 (2023), 423–423.Google ScholarGoogle Scholar
  10. Ronald A Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 2 (1936), 179–188.Google ScholarGoogle ScholarCross RefCross Ref
  11. Catherine A Gao, Frederick M Howard, Nikolay S Markov, Emma C Dyer, Siddhi Ramesh, Yuan Luo, and Alexander T Pearson. 2023. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digital Medicine 6, 1 (2023), 75.Google ScholarGoogle ScholarCross RefCross Ref
  12. Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63 (2006), 3–42.Google ScholarGoogle Scholar
  13. Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu. 2023. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597 (2023).Google ScholarGoogle Scholar
  14. Ioana-Raluca Johansson. 2023. A Tale of Two Texts, a Robot, and Authorship: a comparison between a human-written and a ChatGPT-generated text.Google ScholarGoogle Scholar
  15. Michael R King and ChatGPT. 2023. A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cellular and Molecular Bioengineering 16, 1 (2023), 1–2.Google ScholarGoogle ScholarCross RefCross Ref
  16. Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin. 2023. Linear Classifier: An Often-Forgotten Baseline for Text Classification. arXiv preprint arXiv:2306.07111 (2023).Google ScholarGoogle Scholar
  17. Chung Kwan Lo. 2023. What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences 13, 4 (2023), 410.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yongqiang Ma, Jiawei Liu, and Fan Yi. 2023. Is this abstract generated by ai? a research for the gap between ai-generated scientific text and human-written scientific text. arXiv preprint arXiv:2301.10416 (2023).Google ScholarGoogle Scholar
  19. Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305 (2023).Google ScholarGoogle Scholar
  20. Robert Poole, Andrew Gnann, and Gus Hahn-Powell. 2019. Epistemic stance and the construction of knowledge in science writing: A diachronic corpus study. Journal of English for Academic Purposes 42 (2019), 100784.Google ScholarGoogle ScholarCross RefCross Ref
  21. Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887.Google ScholarGoogle Scholar
  22. Sandra P Thomas. 2023. Grappling with the Implications of ChatGPT for Researchers, Clinicians, and Educators., 141–142 pages.Google ScholarGoogle Scholar
  23. ChatGPT Generative Pre-trained Transformer and Alex Zhavoronkov. 2022. Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective. Oncoscience 9 (2022), 82.Google ScholarGoogle ScholarCross RefCross Ref
  24. Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mingxin Yao, Ying Wei, and Huiyu Wang. 2023. Promoting research by reducing uncertainty in academic writing: a large-scale diachronic case study on hedging in Science research articles across 25 years. Scientometrics (2023), 1–18.Google ScholarGoogle Scholar
  26. Peipeng Yu, Jiahan Chen, Xuan Feng, and Zhihua Xia. 2023. CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts. arXiv preprint arXiv:2304.12008 (2023).Google ScholarGoogle Scholar

Index Terms

  1. Deep dive into language traits of AI-generated Abstracts
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
            January 2024
            627 pages

            Copyright © 2024 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 January 2024

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)60
            • Downloads (Last 6 weeks)15

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format