ABSTRACT
Automated assessment is becoming increasingly common in Computer Science and with it automated plagiarism detection is also common. However, little attention has been paid to SQL assessment where submissions are much shorter and must be less varied than in imperative languages. This brings the challenge of avoiding high false-positive rates that require manual inspection and undermine the usefulness of automated detection.
In this paper we investigate the false-positive rate of various automated plagiarism detection algorithms. We find that there is a significant false-positive rate of between 15% and 64%. These results call into question the usefulness of automated detection for SQL since they imply that a lot of manual inspection will still be needed.
However, our results suggest that the false-positive rate may be restricted to shorter queries (e.g. under 200 characters). Further research is needed because our datasets consist mostly of short queries and the results for longer queries are based on a small subset of the data.
- Ilia Bider and David Rogers. 2016. YASQLT Yet another SQL tutor: A pragmatic approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9975 LNCS. Springer, Cham, 197--206.Google Scholar
- Jess Bidgood and Jeremy B. Merrill. 2017. As Computer Coding Classes Swell, So Does Cheating. https://www.nytimes.com/2017/05/29/us/computer-science-cheating.htmlhttps://www.nytimes.com/2017/05/29/us/computer-science-cheating.html%0Ahttps://www.nytimes.com/2017/05/29/us/computer-science-cheating.html?smid=nytcore-ipad-share&smprod=nytcore-ipad&_rGoogle Scholar
- Samuel Breese, Evan Maicus, Matthew Peveler, and Barbara Cutler. 2018. Correlation of a Flexible Late Day Policy with Student Stress and Programming Assignment Plagiarism. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education - SIGCSE '18. ACM Press, New York, New York, USA, 1089--1089. Google ScholarDigital Library
- Georgina Cosma and Mike Joy. 2006. Source-code plagiarism: A UK academic perspective. Research Report No. 422 (2006), 1--74. https://www.dcs.warwick.ac.uk/report/pdfs/cs-rr-422.pdfhttp://eprints.dcs.warwick.ac.uk/52/Google Scholar
- Robert Fraser. 2014. Collaboration, collusion and plagiarism in computer science coursework. Informatics in Education 13, 2 (2014), 179--195.Google ScholarCross Ref
- Amardeep Kahlon, Bonnie MacKellar, and Anastasia Kurdia. 2018. Combating the Wide Web of Plagiarism. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education - SIGCSE '18. ACM Press, New York, New York, USA, 1069--1069. Google ScholarDigital Library
- Anthony Kleerekoper and Andrew Schofield. 2018. SQL tester: an online SQL assessment tool and its impact. In Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education - ITiCSE 2018. ACM Press, New York, New York, USA, 87--92. Google ScholarDigital Library
- Joshua License. 2017. testSQL: Learn SQL the Interactive Way. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education - ITiCSE '17. ACM Press, New York, New York, USA, 376--376. Google ScholarDigital Library
- Tony Mason, Ada Gavrilovska, and David A. Joyner. 2019. Collaboration Versus Cheating. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education - SIGCSE '19. ACM Press, New York, New York, USA, 1004--1010. Google ScholarDigital Library
- Matija Novak, Mike Joy, and Dragutin Kermek. 2019. Source-code Similarity Detection and Detection Tools Used in Academia. ACM Transactions on Computing Education 19, 3 (5 2019), 1--37. Google ScholarDigital Library
- Julia Opgen-Rhein, Bastian Küppers, and Ulrik Schroeder. 2018. An Application to Discover Cheating in Digital Exams. In Proceedings of the 18th Koli Calling International Conference on Computing Education Research. ACM, 1--5. Google ScholarDigital Library
- Julia Prior. 2014. AsseSQL: an Online, Browser-based SQL Skills Assessment Tool. In Proceedings of the 2014 conference on Innovation & technology in computer science education ITiCSE '14. ACM Press, New York, New York, USA, 1. Google ScholarDigital Library
- Eric Roberts. 2002. Strategies for promoting academic integrity in CS courses. In 32nd Annual Frontiers in Education, Vol. 2. IEEE, F3G--F3G.Google ScholarCross Ref
- Gordon Russell and Andrew Cumming. 2005. Online assessment and checking of SQL: detecting and preventing plagiarism.. In Teaching, Learning and Assessment in Databases.Google Scholar
- Nikolai Scerbakov, Alexander Schukin, and Oleg Sabinin. 2018. Plagiarism detection in SQL student assignments. In Advances in Intelligent Systems and Computing, Vol. 716. Springer, Cham, 110--115.Google Scholar
- Judy Sheard, Simon, Matthew Butler, Katrina Falkner, Michael Morgan, and Amali Weerasinghe. 2017. Strategies for Maintaining Academic Integrity in First-Year Computing Courses. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education - ITiCSE 17. ACM Press, New York, New York, USA, 244--249. Google ScholarDigital Library
- Narjes Tahaei and David C. Noelle. 2018. Automated Plagiarism Detection for Computer Programming Exercises Based on Patterns of Resubmission. In Proceedings of the 2018 ACM Conference on International Computing Education Research - ICER '18. ACM Press, New York, New York, USA, 178--186. Google ScholarDigital Library
Index Terms
- The False-Positive Rate of Automated Plagiarism Detection for SQL Assessments
Recommendations
Source-code Similarity Detection and Detection Tools Used in Academia: A Systematic Review
Teachers deal with plagiarism on a regular basis, so they try to prevent and detect plagiarism, a task that is complicated by the large size of some classes. Students who cheat often try to hide their plagiarism (obfuscate), and many different ...
Detection of plagiarism in computer programming assignments
Plagiarism in programming assignments in computer science courses is on the rise, mainly due to recent innovation in computer technology which has made copying, sharing, and modifying a document effortless. Detecting plagiarism in computer programs ...
Automated SQL query generation for systematic testing of database engines
ASE '10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software EngineeringWe present a novel approach for generating syntactically and semantically correct SQL queries as inputs for testing relational databases. We leverage the SAT-based Alloy tool-set to reduce the problem of generating valid SQL queries into a SAT problem. ...
Comments