Abstract
Data preprocessing is one of the important task in Knowledge Discovery in Databases or Data Mining. The preprocessing is complex and tedious task especially involving large dataset. It is crucial for a data miner to be able to determine the appropriate data preprocessing techniques for a particular data set as it will save the processing time and retain the quality of the data for data mining. Current data mining researchers use agent as a tool to assist data mining process. However, very few researches focus on using agent in the data preprocessing. Applying agents with autonomous, flexible and intelligence reduced the cost of having a quality, precise and updated data or knowledge. The most important part of having an agent to perform data mining task particularly data preprocessing is the generation of agent’s knowledge. The data preprocessing agent’s knowledge are meant for agent to decide the appropriate data preprocessing technique to be used on a particular dataset. Therefore, in this paper we propose a methodology for creating the data preprocessing agent’s knowledge by using rough set theory. The experimental results showed that the agent’s knowledge generated is significant to be used for automated data preprocessing techniques selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aflori, C., Leon, F.: Efficient Distributed Data Mining Using Intelligent Agents, 1–6 (2004)
Ahmad, A.M., Nordin, N.A., Saaim, E.H., Samaon, F., Ibrahim, M.D.: An Architecture Design of The Intelligent Agent for Speech Recognition and Translation. In: 14th International Conference on Computer Theory and Applications (ICCTA 2004). IEEE, Egypt (2004)
Kehagias, D., Chatzidimitriou, K.C., Symeonidis, A.L., Mitkas, P.A.: Information Agents Cooperating with Heterogeneous Data Sources for Customer-Order Management. In: ACM Symposium on Applied Computing, pp. 52–57. ACM, Cyprus (2004)
Daiping, H., Weiquo, W., Huiming, D., Wei, Q.: An Agent Based Fault Diagnosis Support System and Its Application (2006)
Bo, Y., Wang, Y.D., Hong, S.X.: Research and Design of Distributed Training Algorithm For Neural Network. In: Proceedings of the International Conference on Machine Learning and Cybernetics, pp. 4044–4049. IEEE, China (2005)
Czarnowski, I., Jedrzejowiez, P.: An Agent-Based Approach to ANN training. Knowledge-Based System 19, 304–308 (2006)
Yun-Lan, W., Zeng-Zhi, L., Hai-Ping, Z.: Mobile-Agent-Based Distributed and Incremental Techniques for Association Rules. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, pp. 266–271. IEEE, Poland (2003)
Yu-Fang, Z., Zhong-Yang, X., Xiu-Qiong, W.: Distributed Intrusion Detection Based on Clustering. In: Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, pp. 2379–2382. IEEE, Guangzhaou (2005)
Josenildo, C., et al.: Distributed Data Mining and Agents. International Journal of Engineering Applications of Artificial Intelligent 18, 791–807 (2005)
Seydim, A.Y.: Intelligent Agents: A Data Mining Perspective, Dallas (1999)
Nurmi, P., Przybilski, M., Lindén, G., Floréen, P.: An architecture for distributed agent-based data preprocessing. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds.) AIS-ADM 2005. LNCS, vol. 3505, pp. 123–133. Springer, Heidelberg (2005)
Luo, P., He, Q., Huang, Q., Lin, F., Shi, Z.: Execution Engine of Meta-Learning System for KDD in Multi-Agent Environment. In: AIS-ADM, pp. 149–160. IEEE, Los Alamitos (2005)
Li, C., Gao, Y.: Agent-Based Pattern Mining of Discredited Activities in Public Services. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, USA, pp. 15–18 (2006)
Othman, Z.A., Shuib, N., Bakar, A.A., Omar, K.: Agent based Preprocessing. In: International Conferences on Intelligent & Advanced Systems, KLCC Malaysia, p. 54 (2007)
Bakar, A.A., Othman, Z.A., Hamdan, A.R., Yusof, R., Ismail, R.: An Agent Based Rough Classifier for Data Mining. In: The International Conference on Intelligent Systems Design and Applications (ISDA 2008), Kaohsiung, Taiwan (2008)
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall, Upper Saddle River (2003)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in database. AI Magazine 17, 37–54 (1996)
Simon, H.A.: The Science of the Artificial, 2nd edn., Cambridge (1981)
Michal, R., Chmielewski, J.W., Grzymala, B.: Global Discretization of Continuous Attributes as Preprocessing for Machine Learning, 319–331 (1996)
Yang, Y.: Discretization for Data Mining, http://www.csse.monash.edu.au/~yyang/ Discretization for DM.pdf
Divina, F., Keijzer, M., Marchiori, E.: A Method for Handling Numerical Attributes in GA-based Inductive Concept Learners. In: Proceedings of the Genetic and Evolutionary Computation Conference, p. 898. Springer, Chicago (2003)
Famili, A.: The Role of Data Pre-Processing in Intellligent Data Analysis. In: Proceeding of the International Sysmposiumon Intelligent Data Analysis (IDA 1995), pp. 54–58. NRC Publication, Germany (1995)
UCI Repositories of Machine Learning and Domain Theories, http://archive.ics.uci.edu/ml/dataset.html
ROSETTA – A Rough Set Toolkit for Analysis of Data, http://www.galaxy.gmu.Edu/interface/I01/2001Proceedings/JBreault/JBreault-Paper.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Othman, Z.A., Bakar, A.A., Othman, Z., Rosli, S. (2009). Development of the Data Preprocessing Agent’s Knowledge for Data Mining Using Rough Set Theory. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds) Rough Sets and Knowledge Technology. RSKT 2009. Lecture Notes in Computer Science(), vol 5589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02962-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-02962-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02961-5
Online ISBN: 978-3-642-02962-2
eBook Packages: Computer ScienceComputer Science (R0)