short-paper

Deep dive into language traits of AI-generated Abstracts

Authors:
Vikas Kumar

University of Delhi, India

University of Delhi, India

0000-0001-9882-7310
View Profile

,
Amisha Bharti

University of Delhi, India

University of Delhi, India

0009-0008-4018-6117
View Profile

,
Devanshu Verma

University of Delhi, India

University of Delhi, India

0009-0009-3526-2338
View Profile

,
Vasudha Bhatnagar

University of Delhi, India

University of Delhi, India

0000-0002-9706-9340
View Profile

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)January 2024Pages 237–241https://doi.org/10.1145/3632410.3632471

Published:04 January 2024Publication History

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

Pages 237–241

ABSTRACT

Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing in various fields, including academic research. The rapid proliferation of generated texts has bolstered the need for automatic identification to uphold transparency and trust in the information. However, these generated texts closely resemble human writing and often have subtle differences in the grammatical structure, tones, and patterns, which makes systematic scrutinization challenging. In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded. We extract the text’s semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.

References

Taha Bin Arif, Uzair Munaf, and Ibtehaj Ul-Haque. 2023. The future of medical education and research: Is ChatGPT a blessing or blight in disguise?, 2181052 pages.Google Scholar
Heba Askr, Ashraf Darwish, Aboul Ella Hassanien, and ChatGPT. 2023. The Future of Metaverse in the Virtual Era and Physical World: Analysis and Applications. In The Future of Metaverse in the Virtual Era and Physical World. Springer, 59–75.Google Scholar
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: Pretrained Language Model for Scientific Text. In EMNLP. arXiv:arXiv:1903.10676Google Scholar
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.Google ScholarDigital Library
David Roxbee Cox and E Joyce Snell. 1989. Analysis of binary data. Vol. 32. CRC press.Google Scholar
Scott A Crossley, Kristopher Kyle, and Mihai Dascalu. 2019. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior research methods 51 (2019), 14–27.Google Scholar
Scott A Crossley, Kristopher Kyle, and Danielle S McNamara. 2016. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior research methods 48 (2016), 1227–1237.Google Scholar
Yao Dou, Maxwell Forbes, Rik Koncel-Kedziorski, Noah A Smith, and Yejin Choi. 2021. Is GPT-3 text indistinguishable from human text? SCARECROW: A framework for scrutinizing machine text. arXiv preprint arXiv:2107.01294 (2021).Google Scholar
Holly Else. 2023. Abstracts written by ChatGPT fool scientists. Nature 613, 7944 (2023), 423–423.Google Scholar
Ronald A Fisher. 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 2 (1936), 179–188.Google ScholarCross Ref
Catherine A Gao, Frederick M Howard, Nikolay S Markov, Emma C Dyer, Siddhi Ramesh, Yuan Luo, and Alexander T Pearson. 2023. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digital Medicine 6, 1 (2023), 75.Google ScholarCross Ref
Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63 (2006), 3–42.Google Scholar
Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu. 2023. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597 (2023).Google Scholar
Ioana-Raluca Johansson. 2023. A Tale of Two Texts, a Robot, and Authorship: a comparison between a human-written and a ChatGPT-generated text.Google Scholar
Michael R King and ChatGPT. 2023. A conversation on artificial intelligence, chatbots, and plagiarism in higher education. Cellular and Molecular Bioengineering 16, 1 (2023), 1–2.Google ScholarCross Ref
Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu, and Chih-Jen Lin. 2023. Linear Classifier: An Often-Forgotten Baseline for Text Classification. arXiv preprint arXiv:2306.07111 (2023).Google Scholar
Chung Kwan Lo. 2023. What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences 13, 4 (2023), 410.Google ScholarCross Ref
Yongqiang Ma, Jiawei Liu, and Fan Yi. 2023. Is this abstract generated by ai? a research for the gap between ai-generated scientific text and human-written scientific text. arXiv preprint arXiv:2301.10416 (2023).Google Scholar
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305 (2023).Google Scholar
Robert Poole, Andrew Gnann, and Gus Hahn-Powell. 2019. Epistemic stance and the construction of knowledge in science writing: A diachronic corpus study. Journal of English for Academic Purposes 42 (2019), 100784.Google ScholarCross Ref
Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887.Google Scholar
Sandra P Thomas. 2023. Grappling with the Implications of ChatGPT for Researchers, Clinicians, and Educators., 141–142 pages.Google Scholar
ChatGPT Generative Pre-trained Transformer and Alex Zhavoronkov. 2022. Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective. Oncoscience 9 (2022), 82.Google ScholarCross Ref
Vladimir Vapnik. 1999. The nature of statistical learning theory. Springer science & business media.Google ScholarDigital Library
Mingxin Yao, Ying Wei, and Huiyu Wang. 2023. Promoting research by reducing uncertainty in academic writing: a large-scale diachronic case study on hedging in Science research articles across 25 years. Scientometrics (2023), 1–18.Google Scholar
Peipeng Yu, Jiahan Chen, Xuan Feng, and Zhihua Xia. 2023. CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts. arXiv preprint arXiv:2304.12008 (2023).Google Scholar

Index Terms

Deep dive into language traits of AI-generated Abstracts

Index terms have been assigned to the content through auto-classification.

Recommendations

Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms
This article investigates approaches to effectively harness source-side linguistic features for low-resource multilingual neural machine translation (MNMT). Previous works focus on using various features of a word such as lemma, part-of-speech tag, ...
Read More
Exploring extensive linguistic feature sets in near-synonym lexical choice
CICLing'12: Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

In the near-synonym lexical choice task, the best alternative out of a set of near-synonyms is selected to fill a lexical gap in a text. We experiment on an approach of an extensive set, over 650, linguistic features to represent the context of a word, ...
Read More
Comparing Academic Papers of Students and Experts in terms of Linguistic Features with Natural Language Processing
ICSET'20: 2020 The 4th International Conference on E-Society, E-Education and E-Technology

In graduate education, the quality of academic papers can reflect individual scientific research achievements. This study compared the differences in linguistic features between the papers of experts and graduate students with natural language ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
January 2024
627 pages
ISBN:9798400716348
DOI:10.1145/3632410
Editors:
Sriraam Natarajan,
Indrajit Bhattacharya,
Richa Singh,
Arun Kumar,
Sayan Ranu,
Kalika Bali,
Abinaya K
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
AI-Generated Abstracts
ChatGPT
Linguistic features
Semantic features
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Deep dive into language traits of AI-generated Abstracts

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms

Exploring extensive linguistic feature sets in near-synonym lexical choice

Comparing Academic Papers of Students and Experts in terms of Linguistic Features with Natural Language Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Deep dive into language traits of AI-generated Abstracts

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms

Exploring extensive linguistic feature sets in near-synonym lexical choice

Comparing Academic Papers of Students and Experts in terms of Linguistic Features with Natural Language Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media