Skip to main content

Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection

  • Conference paper
Artificial Intelligence and Soft Computing (ICAISC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8468))

Included in the following conference series:

Abstract

The focus of the paper is to improve intrinsic plagiarism detection. The paper investigates and improves performance of character n-grams profiles method proposed by Stamatatos by tuning its parameter settings and proposing new modifications and rich feature sets. We raised the overall plagdet score from 24.67% to 33.41% for the PAN-PC09 corpus and from 18.83% to 26.66% for the PAN-PC11 corpus. Results are reported on PAN-PC09 and PAN-PC11 corpora, which are especially well suited for this task and were previously used in Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN) competitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McCabe, D.: Levels of cheating and plagiarism remain high. Technical report, Duke University, Center for Academic Integrity (2005)

    Google Scholar 

  2. Sheard, J., Dick, M., Markham, S., MacDonald, I., Walsh, M.: Cheating and plagiarism: Perceptions and practices of first year IT students. In: Caspersen, M.E., Joyce, D., Goelman, D., Utting, I. (eds.) Seventh Annual Conference on Innovation and Technology in Computer Science Education, pp. 183–187 (2002)

    Google Scholar 

  3. Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Language Resources and Evaluation 45(1), 63–82 (2011)

    Article  Google Scholar 

  4. Oberreuter, G., L’Huillier, G., Ríos, S.A., Velásquez, J.D.: Approaches for intrinsic and external plagiarism detection - Notebook for PAN at CLEF 2011. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)

    Google Scholar 

  5. Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)

    Google Scholar 

  6. Kestemont, M., Luyckx, K., Daelemans, W.: Intrinsic plagiarism detection using character trigram distance scores - Notebook for PAN at CLEF 2011. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)

    Google Scholar 

  7. Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 38–46 (2009)

    Google Scholar 

  8. Akiva, N.: Using clustering to identify outlier chunks of text - Notebook for PAN at CLEF 2011. In: Petras, V., Forner, P., Clough, P.D. (eds.) Notebook Papers of CLEF 2011 LABs and Workshops (2011)

    Google Scholar 

  9. Seaward, L., Matwin, S.: Intrinsic plagiarism detection using complexity analysis. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 56–61 (2009)

    Google Scholar 

  10. Potthast, M., Eiselt, A., Stein, B., Barrón-Cedeño, A., Rosso, P.: Plagiarism Corpus PAN-PC 2009 (2009), http://www.webis.de/research/corpora

  11. Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An Evaluation Framework for Plagiarism Detection. In: Huang, C.R., Jurafsky, D. (eds.) 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005. Association for Computational Linguistics (2010)

    Google Scholar 

  12. Barrón-Cedeño, A., Potthast, M., Rosso, P., Stein, B., Eiselt, A.: Corpus and Evaluation Measures for Automatic Plagiarism Detection. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) 7th Conference on International Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA) (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kuta, M., Kitowski, J. (2014). Optimisation of Character n-gram Profiles Method for Intrinsic Plagiarism Detection. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2014. Lecture Notes in Computer Science(), vol 8468. Springer, Cham. https://doi.org/10.1007/978-3-319-07176-3_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07176-3_44

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07175-6

  • Online ISBN: 978-3-319-07176-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics