research-article

Classification of Programming Problems based on Topic Modeling

Authors:
Chowdhury Md Intisar

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan
View Profile

,
Yutaka Watanobe

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan
View Profile

,
Manoj Poudel

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan
View Profile

,
Subhash Bhalla

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan

Graduate Department of Information Systems, University of Aizu, Fukusima, Japan
View Profile

ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education TechnologyMarch 2019Pages 275–283https://doi.org/10.1145/3323771.3323795

Published:29 March 2019Publication History

ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology

Pages 275–283

ABSTRACT

Programming skill is one of the most important and demanding skill in the current generation. In order to enable learners and programmers to practice programming and gain problem-solving skills, many Online Judge (OJ) systems exist. Most of these OJ systems have to be operated solely by students and learners. These students and novice programmers sometimes compete against each other or solve the programming problems by themselves in offline mode. But, most OJ systems have their problems arranged simply into volumes and various contests events. This arrangement system does not have any clear indication of the difficulties and categories of problems. Thus, in this paper, we have studied reliable techniques on the extraction of keywords and features which can categorize these OJ system's programming problems into their respective types and skills. We have leveraged two popular topic modeling algorithms, Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to extract relevant features. Afterward, six classifiers were trained on these topic modeling features and Naive TF-IDF features. From our studies, we discovered that topic modeling features were relatively smaller in dimensionality, yet matched the performance when trained on high dimensional naive TF-IDF features. Our main goal was to understand the precise trade-off between accuracy and dimensionality of the textual data of programming problem statements. This experiment has enabled us to obtain important tags, hint, and classification of Online Judge programming problems.

References

Aizu online judge (new site).Google Scholar
Aizu online judge: Programming challenge.Google Scholar
Aoj developers site (api).Google Scholar
M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919, 2017.Google Scholar
A. Alvarez and T. A. Scott. Using student surveys in determining the difficulty of programming assignments. Journal of Computing Sciences in Colleges, 26(2):157--163, 2010. Google ScholarDigital Library
D. M. Blei. Probabilistic topic models. Commun. ACM, 55(4):77--84, Apr. 2012. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993--1022, 2003. Google ScholarDigital Library
L. Burr and D. H. Spennemann. Patterns of user behaviour in university online forums. International Journal of Instructional Technology and Distance Learning, 1(10):11--28, 2004.Google Scholar
C. Fernandez-Medina, J. R. Pérez-Pérez, V. M. Álvarez-García, and M. Paule-Ruiz. Assistance in computer programming learning using educational data mining and learning analytics. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education, pages 237--242. ACM, 2013. Google ScholarDigital Library
R. E. Francisco and A. P. Ambrosio. Mining an online judge system to support introductory computer programming teaching. In EDM (Workshops), 2015.Google Scholar
D. Greene. Matrix factorization for topic models.Google Scholar
L. Hong and B. D. Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 80--88, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine learning research, 5(Nov):1457--1469, 2004. Google ScholarDigital Library
C. M. Intisar and Y. Watanobe. Classification of online judge programmers based on rule extraction from self organizing feature map. In 2018 9th International Conference on Awareness Science and Technology (iCAST), pages 313--318, Sep. 2018.Google ScholarCross Ref
C. M. Intisar and Y. Watanobe. Cluster analysis to estimate the difficulty of programming problems. In Proceedings of the 3rd International Conference on Applications in Information Technology, ICAIT'2018, pages 23--28, New York, NY, USA, 2018. ACM. Google ScholarDigital Library
A. Kurnia, A. Lim, and B. Cheang. Online judge. Computers and Education, 36(4):299--315, 2001. Google ScholarDigital Library
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556--562, 2001. Google ScholarDigital Library
T.-W. Lee. Independent component analysis. In Independent component analysis, pages 27--66. Springer, 1998.Google ScholarCross Ref
Z. Li, W. Shang, and M. Yan. News text classification model based on topic model. In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pages 1--5. IEEE, 2016.Google ScholarCross Ref
D. J. MacKay and L. C. B. Peto. A hierarchical dirichlet language model. Natural language engineering, 1(3):289--308, 1995.Google Scholar
R. Mazza and C. Milani. Gismo: a graphical interactive student monitoring tool for course management systems. In International Conference on Technology Enhanced Learning, Milan, pages 1--8, 2004.Google Scholar
R. Mazza and D. Vania. The design of a course data visualizator: An empirical study. In Proc. Int. Conf. New Educ. Environ., pages 215--220.Google Scholar
M. Pechenizkiy, N. Trcka, E. Vasilyeva, W. van der Aalst, and P. De Bra. Process mining online assessment data. International Working Group on Educational Data Mining, 2009.Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
J. Ramos et al. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, volume 242, pages 133--142, 2003.Google Scholar
S. Retalis, A. Papasalouros, Y. Psaromiligkos, S. Siscos, and T. Kargidis. Towards networked learning analytics--a concept and a tool. In Proceedings of the fifth international conference on networked learning, pages 1--8, 2006.Google Scholar
R. Shen, P. Han, F. Yang, Q. Yang, and J. Z. Huang. Data mining and case-based reasoning for distance learning. International Journal of Distance Education Technologies (IJDET), 1(3):46--58, 2003.Google Scholar
R. Shen, F. Yang, and P. Han. Data analysis center based on e-learning platform. In The Internet Challenge: Technology and Applications, pages 19--28. Springer, 2002.Google Scholar
R. Y. Toledo and Y. C. Mota. An e-learning collaborative filtering approach to suggest problems to solve in programming online judges. Int. J. Distance Educ. Technol., 12(2):51--65, Apr. 2014. Google ScholarDigital Library
C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 448--456, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
S. Wasik, M. Antczak, A. Laskowski, T. Sternal, et al. A survey on online judge systems and their applications. ACM Computing Surveys (CSUR), 51(1):3, 2018. Google ScholarDigital Library
S. Wold, K. Esbensen, and P. Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37--52, 1987.Google Scholar
R. Yera and L. Martínez. A recommendation approach for programming online judges supported by data preprocessing techniques. Applied Intelligence, 47(2):277--290, 2017. Google ScholarDigital Library
R. Yera Toledo, Y. Caballero Mota, and L. Martínez. A recommender system for programming online judges using fuzzy information modeling. In Informatics, volume 5, page 17. Multidisciplinary Digital Publishing Institute, 2018.Google Scholar
X. Yu and W. Chen. Research on three-layer collaborative filtering recommendation for online judge. In Green and Sustainable Computing Conference (IGSC0< 2016 Seventh International, pages 1--4. IEEE, 2016.Google Scholar
O. R. Zaiane and J. Luo. Web usage mining for a better web-based learning environment. In Proceedings of conference on advanced technology for education, pages 60--64, 2001.Google Scholar
C. Zhang and S. Zhang. Association rule mining: models and algorithms. Springer-Verlag, 2002. Google ScholarDigital Library
W. X. Zhao, W. Zhang, Y. He, X. Xie, and J.-R. Wen. Automatically learning topics and difficulty levels of problems in online judge systems. ACM Transactions on Information Systems (TOIS), 36(3):27, 2018. Google ScholarDigital Library
L. Zoubek and M. Burda. Visualization of differences in data measuring mathematical skills. International Working Group on Educational Data Mining, 2009.Google Scholar

Index Terms

Classification of Programming Problems based on Topic Modeling
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Read More
On a Topic Model for Sentences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as ...
Read More
Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

The ever-increasing user-generated contents in social media and other web services make it highly desirable to discover opinions of users on all kinds of topics. Motivated by the assumption that individual word and paragraph in documents will deliver ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology
March 2019
338 pages
ISBN:9781450366397
DOI:10.1145/3323771

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 March 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Online Judge Systems
feature extraction
novice programmer
text classification
topic modeling
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 312
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Classification of Programming Problems based on Topic Modeling

ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extractive text summarization using clustering-based topic modeling

On a Topic Model for Sentences

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Classification of Programming Problems based on Topic Modeling

ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extractive text summarization using clustering-based topic modeling

On a Topic Model for Sentences

Jointly Discovering Fine-grained and Coarse-grained Sentiments via Topic Modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media