Abstract
Top Down Induction of Decision Trees (TDIDT) is the most commonly used method of constructing a model from a dataset in the form of classification rules to classify previously unseen data. Alternative algorithms have been developed such as the Prism algorithm. Prism constructs modular rules which produce qualitatively better rules than rules induced by TDIDT. However, along with the increasing size of databases, many existing rule learning algorithms have proved to be computational expensive on large datasets. To tackle the problem of scalability, parallel classification rule induction algorithms have been introduced. As TDIDT is the most popular classifier, even though there are strongly competitive alternative algorithms, most parallel approaches to inducing classification rules are based on TDIDT. In this paper we describe work on a distributed classifier that induces classification rules in a parallel manner based on Prism.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
M. Bramer. Automatic Induction of Classification Rules from Examples Using N-Prism. In Research and Development in Intelligent Systems XVI. Springer, 2000.
J. Catlett. Megainduction: Machine learning on very large databases. PhD thesis, University of Technology, Sydney, 1991.
J. Cendrowska. PRISM: an Algorithm for Inducing Modular Rules. International Journal of Man-Machine Studies, 27:349-370, 1987.
D. Berrar, F. Stahl, C.S. Goncalves Silva, J.R. Rodrigues, R.M.M Brito, and W. Dubitzky. Towards Data Warehousing and Mining of Protein Unfolding Simulation Data. Journal of Clinical Monitoring and Computing, 19:307-317, 2005.
F. Provost and V. Kolluri. Scaling Up Inductive Algorithms: An Overview. In Third International Conference on Knowledge Discovery and Data Mining, pages 239-242, California, 1997.
F. Stahl. Systems Architecture for Distributed Data Mining on Data Warehouses of Molecular Dynamics Simulation Studies. Master’s thesis, University of Applied Science Weihenstephan, 2006.
F. Stahl, D. Berrar, C. S. Goncalves Silva, J. R. Rodrigues, R. M. M. Brito, and W. Dubitzky. Grid Warehousing of Molecular Dynamics Protein Unfolding Data. In Fifth IEEE/ACM Int’l Symposium on Cluster Computing and the Grid, 2005.
L. J. Frey and D. H Fisher. Modelling Decision Tree Performance with the Power Law. In eventh International Workshop on Artificial Intelligence and Statistics, San Francisco, CA, 1999.
L. Nolle, K. C. P. Wong, and A. A. Hopgood. DARBS: A Distributed Blackboard System. In Twenty-first SGES International Conference on Knowledge Based Systems, Cambridge, 2001.
J. C. Shafer, R. Agrawal, and M. Mehta. SPRINT: A Scalable Parallel Classifier for Data Mining. In Twenty-second International Conference on Very Large Data Bases, 1996.
F. Stahl and M. Bramer. Towards a Computationally Efficient Approach to Modular Classification Rule Induction. In Twenty-seventh SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, 2007. Springer.
T. Barrett. NCBI GEO: mining millions of expression profiles-database and tools. Nucleic Acids Res., 33:D562-D566, 2005.
M. J. Zaki, C.-T. Ho, and R. Agrawal. Parallel Classification for Data Mining on Shared Memory Multiprocessors. In Fifteenth International conference on Data Mining, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 International Federation for Information Processing
About this paper
Cite this paper
Stahl, F.T., Bramer, M.A., Adda, M. (2008). P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction. In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice II. IFIP AI 2008. IFIP – The International Federation for Information Processing, vol 276. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09695-7_8
Download citation
DOI: https://doi.org/10.1007/978-0-387-09695-7_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09694-0
Online ISBN: 978-0-387-09695-7
eBook Packages: Computer ScienceComputer Science (R0)