skip to main content
10.1145/3323771.3323795acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicietConference Proceedingsconference-collections
research-article

Classification of Programming Problems based on Topic Modeling

Authors Info & Claims
Published:29 March 2019Publication History

ABSTRACT

Programming skill is one of the most important and demanding skill in the current generation. In order to enable learners and programmers to practice programming and gain problem-solving skills, many Online Judge (OJ) systems exist. Most of these OJ systems have to be operated solely by students and learners. These students and novice programmers sometimes compete against each other or solve the programming problems by themselves in offline mode. But, most OJ systems have their problems arranged simply into volumes and various contests events. This arrangement system does not have any clear indication of the difficulties and categories of problems. Thus, in this paper, we have studied reliable techniques on the extraction of keywords and features which can categorize these OJ system's programming problems into their respective types and skills. We have leveraged two popular topic modeling algorithms, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to extract relevant features. Afterward, six classifiers were trained on these topic modeling features and Naive TF-IDF features. From our studies, we discovered that topic modeling features were relatively smaller in dimensionality, yet matched the performance when trained on high dimensional naive TF-IDF features. Our main goal was to understand the precise trade-off between accuracy and dimensionality of the textual data of programming problem statements. This experiment has enabled us to obtain important tags, hint, and classification of Online Judge programming problems.

References

  1. Aizu online judge (new site).Google ScholarGoogle Scholar
  2. Aizu online judge: Programming challenge.Google ScholarGoogle Scholar
  3. Aoj developers site (api).Google ScholarGoogle Scholar
  4. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919, 2017.Google ScholarGoogle Scholar
  5. A. Alvarez and T. A. Scott. Using student surveys in determining the difficulty of programming assignments. Journal of Computing Sciences in Colleges, 26(2):157--163, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77--84, Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Burr and D. H. Spennemann. Patterns of user behaviour in university online forums. International Journal of Instructional Technology and Distance Learning, 1(10):11--28, 2004.Google ScholarGoogle Scholar
  9. C. Fernandez-Medina, J. R. Pérez-Pérez, V. M. Álvarez-García, and M. Paule-Ruiz. Assistance in computer programming learning using educational data mining and learning analytics. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education, pages 237--242. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. E. Francisco and A. P. Ambrosio. Mining an online judge system to support introductory computer programming teaching. In EDM (Workshops), 2015.Google ScholarGoogle Scholar
  11. D. Greene. Matrix factorization for topic models.Google ScholarGoogle Scholar
  12. L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 80--88, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine learning research, 5(Nov):1457--1469, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. M. Intisar and Y. Watanobe. Classification of online judge programmers based on rule extraction from self organizing feature map. In 2018 9th International Conference on Awareness Science and Technology (iCAST), pages 313--318, Sep. 2018.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. M. Intisar and Y. Watanobe. Cluster analysis to estimate the difficulty of programming problems. In Proceedings of the 3rd International Conference on Applications in Information Technology, ICAIT'2018, pages 23--28, New York, NY, USA, 2018. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Kurnia, A. Lim, and B. Cheang. Online judge. Computers and Education, 36(4):299--315, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556--562, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T.-W. Lee. Independent component analysis. In Independent component analysis, pages 27--66. Springer, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  19. Z. Li, W. Shang, and M. Yan. News text classification model based on topic model. In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pages 1--5. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. J. MacKay and L. C. B. Peto. A hierarchical dirichlet language model. Natural language engineering, 1(3):289--308, 1995.Google ScholarGoogle Scholar
  21. R. Mazza and C. Milani. Gismo: a graphical interactive student monitoring tool for course management systems. In International Conference on Technology Enhanced Learning, Milan, pages 1--8, 2004.Google ScholarGoogle Scholar
  22. R. Mazza and D. Vania. The design of a course data visualizator: An empirical study. In Proc. Int. Conf. New Educ. Environ., pages 215--220.Google ScholarGoogle Scholar
  23. M. Pechenizkiy, N. Trcka, E. Vasilyeva, W. van der Aalst, and P. De Bra. Process mining online assessment data. International Working Group on Educational Data Mining, 2009.Google ScholarGoogle Scholar
  24. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Ramos et al. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, volume 242, pages 133--142, 2003.Google ScholarGoogle Scholar
  26. S. Retalis, A. Papasalouros, Y. Psaromiligkos, S. Siscos, and T. Kargidis. Towards networked learning analytics--a concept and a tool. In Proceedings of the fifth international conference on networked learning, pages 1--8, 2006.Google ScholarGoogle Scholar
  27. R. Shen, P. Han, F. Yang, Q. Yang, and J. Z. Huang. Data mining and case-based reasoning for distance learning. International Journal of Distance Education Technologies (IJDET), 1(3):46--58, 2003.Google ScholarGoogle Scholar
  28. R. Shen, F. Yang, and P. Han. Data analysis center based on e-learning platform. In The Internet Challenge: Technology and Applications, pages 19--28. Springer, 2002.Google ScholarGoogle Scholar
  29. R. Y. Toledo and Y. C. Mota. An e-learning collaborative filtering approach to suggest problems to solve in programming online judges. Int. J. Distance Educ. Technol., 12(2):51--65, Apr. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 448--456, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Wasik, M. Antczak, A. Laskowski, T. Sternal, et al. A survey on online judge systems and their applications. ACM Computing Surveys (CSUR), 51(1):3, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Wold, K. Esbensen, and P. Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37--52, 1987.Google ScholarGoogle Scholar
  33. R. Yera and L. Martínez. A recommendation approach for programming online judges supported by data preprocessing techniques. Applied Intelligence, 47(2):277--290, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Yera Toledo, Y. Caballero Mota, and L. Martínez. A recommender system for programming online judges using fuzzy information modeling. In Informatics, volume 5, page 17. Multidisciplinary Digital Publishing Institute, 2018.Google ScholarGoogle Scholar
  35. X. Yu and W. Chen. Research on three-layer collaborative filtering recommendation for online judge. In Green and Sustainable Computing Conference (IGSC0< 2016 Seventh International, pages 1--4. IEEE, 2016.Google ScholarGoogle Scholar
  36. O. R. Zaiane and J. Luo. Web usage mining for a better web-based learning environment. In Proceedings of conference on advanced technology for education, pages 60--64, 2001.Google ScholarGoogle Scholar
  37. C. Zhang and S. Zhang. Association rule mining: models and algorithms. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. W. X. Zhao, W. Zhang, Y. He, X. Xie, and J.-R. Wen. Automatically learning topics and difficulty levels of problems in online judge systems. ACM Transactions on Information Systems (TOIS), 36(3):27, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Zoubek and M. Burda. Visualization of differences in data measuring mathematical skills. International Working Group on Educational Data Mining, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Classification of Programming Problems based on Topic Modeling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology
      March 2019
      338 pages
      ISBN:9781450366397
      DOI:10.1145/3323771

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 March 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader