ABSTRACT
Programming skill is one of the most important and demanding skill in the current generation. In order to enable learners and programmers to practice programming and gain problem-solving skills, many Online Judge (OJ) systems exist. Most of these OJ systems have to be operated solely by students and learners. These students and novice programmers sometimes compete against each other or solve the programming problems by themselves in offline mode. But, most OJ systems have their problems arranged simply into volumes and various contests events. This arrangement system does not have any clear indication of the difficulties and categories of problems. Thus, in this paper, we have studied reliable techniques on the extraction of keywords and features which can categorize these OJ system's programming problems into their respective types and skills. We have leveraged two popular topic modeling algorithms, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to extract relevant features. Afterward, six classifiers were trained on these topic modeling features and Naive TF-IDF features. From our studies, we discovered that topic modeling features were relatively smaller in dimensionality, yet matched the performance when trained on high dimensional naive TF-IDF features. Our main goal was to understand the precise trade-off between accuracy and dimensionality of the textual data of programming problem statements. This experiment has enabled us to obtain important tags, hint, and classification of Online Judge programming problems.
- Aizu online judge (new site).Google Scholar
- Aizu online judge: Programming challenge.Google Scholar
- Aoj developers site (api).Google Scholar
- M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919, 2017.Google Scholar
- A. Alvarez and T. A. Scott. Using student surveys in determining the difficulty of programming assignments. Journal of Computing Sciences in Colleges, 26(2):157--163, 2010. Google ScholarDigital Library
- D. M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77--84, Apr. 2012. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993--1022, 2003. Google ScholarDigital Library
- L. Burr and D. H. Spennemann. Patterns of user behaviour in university online forums. International Journal of Instructional Technology and Distance Learning, 1(10):11--28, 2004.Google Scholar
- C. Fernandez-Medina, J. R. Pérez-Pérez, V. M. Álvarez-García, and M. Paule-Ruiz. Assistance in computer programming learning using educational data mining and learning analytics. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education, pages 237--242. ACM, 2013. Google ScholarDigital Library
- R. E. Francisco and A. P. Ambrosio. Mining an online judge system to support introductory computer programming teaching. In EDM (Workshops), 2015.Google Scholar
- D. Greene. Matrix factorization for topic models.Google Scholar
- L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 80--88, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine learning research, 5(Nov):1457--1469, 2004. Google ScholarDigital Library
- C. M. Intisar and Y. Watanobe. Classification of online judge programmers based on rule extraction from self organizing feature map. In 2018 9th International Conference on Awareness Science and Technology (iCAST), pages 313--318, Sep. 2018.Google ScholarCross Ref
- C. M. Intisar and Y. Watanobe. Cluster analysis to estimate the difficulty of programming problems. In Proceedings of the 3rd International Conference on Applications in Information Technology, ICAIT'2018, pages 23--28, New York, NY, USA, 2018. ACM. Google ScholarDigital Library
- A. Kurnia, A. Lim, and B. Cheang. Online judge. Computers and Education, 36(4):299--315, 2001. Google ScholarDigital Library
- D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556--562, 2001. Google ScholarDigital Library
- T.-W. Lee. Independent component analysis. In Independent component analysis, pages 27--66. Springer, 1998.Google ScholarCross Ref
- Z. Li, W. Shang, and M. Yan. News text classification model based on topic model. In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pages 1--5. IEEE, 2016.Google ScholarCross Ref
- D. J. MacKay and L. C. B. Peto. A hierarchical dirichlet language model. Natural language engineering, 1(3):289--308, 1995.Google Scholar
- R. Mazza and C. Milani. Gismo: a graphical interactive student monitoring tool for course management systems. In International Conference on Technology Enhanced Learning, Milan, pages 1--8, 2004.Google Scholar
- R. Mazza and D. Vania. The design of a course data visualizator: An empirical study. In Proc. Int. Conf. New Educ. Environ., pages 215--220.Google Scholar
- M. Pechenizkiy, N. Trcka, E. Vasilyeva, W. van der Aalst, and P. De Bra. Process mining online assessment data. International Working Group on Educational Data Mining, 2009.Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
- J. Ramos et al. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, volume 242, pages 133--142, 2003.Google Scholar
- S. Retalis, A. Papasalouros, Y. Psaromiligkos, S. Siscos, and T. Kargidis. Towards networked learning analytics--a concept and a tool. In Proceedings of the fifth international conference on networked learning, pages 1--8, 2006.Google Scholar
- R. Shen, P. Han, F. Yang, Q. Yang, and J. Z. Huang. Data mining and case-based reasoning for distance learning. International Journal of Distance Education Technologies (IJDET), 1(3):46--58, 2003.Google Scholar
- R. Shen, F. Yang, and P. Han. Data analysis center based on e-learning platform. In The Internet Challenge: Technology and Applications, pages 19--28. Springer, 2002.Google Scholar
- R. Y. Toledo and Y. C. Mota. An e-learning collaborative filtering approach to suggest problems to solve in programming online judges. Int. J. Distance Educ. Technol., 12(2):51--65, Apr. 2014. Google ScholarDigital Library
- C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 448--456, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- S. Wasik, M. Antczak, A. Laskowski, T. Sternal, et al. A survey on online judge systems and their applications. ACM Computing Surveys (CSUR), 51(1):3, 2018. Google ScholarDigital Library
- S. Wold, K. Esbensen, and P. Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37--52, 1987.Google Scholar
- R. Yera and L. Martínez. A recommendation approach for programming online judges supported by data preprocessing techniques. Applied Intelligence, 47(2):277--290, 2017. Google ScholarDigital Library
- R. Yera Toledo, Y. Caballero Mota, and L. Martínez. A recommender system for programming online judges using fuzzy information modeling. In Informatics, volume 5, page 17. Multidisciplinary Digital Publishing Institute, 2018.Google Scholar
- X. Yu and W. Chen. Research on three-layer collaborative filtering recommendation for online judge. In Green and Sustainable Computing Conference (IGSC0< 2016 Seventh International, pages 1--4. IEEE, 2016.Google Scholar
- O. R. Zaiane and J. Luo. Web usage mining for a better web-based learning environment. In Proceedings of conference on advanced technology for education, pages 60--64, 2001.Google Scholar
- C. Zhang and S. Zhang. Association rule mining: models and algorithms. Springer-Verlag, 2002. Google ScholarDigital Library
- W. X. Zhao, W. Zhang, Y. He, X. Xie, and J.-R. Wen. Automatically learning topics and difficulty levels of problems in online judge systems. ACM Transactions on Information Systems (TOIS), 36(3):27, 2018. Google ScholarDigital Library
- L. Zoubek and M. Burda. Visualization of differences in data measuring mathematical skills. International Working Group on Educational Data Mining, 2009.Google Scholar
Index Terms
- Classification of Programming Problems based on Topic Modeling
Recommendations
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
On a Topic Model for Sentences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalProbabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as ...
Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling
MM '14: Proceedings of the 22nd ACM international conference on MultimediaThe ever-increasing user-generated contents in social media and other web services make it highly desirable to discover opinions of users on all kinds of topics. Motivated by the assumption that individual word and paragraph in documents will deliver ...
Comments