Abstract
Rapid growth in technology and its accessibility by general public produce voluminous, heterogeneous and unstructured data resulted in the emergence of new concepts, viz. Big Data and Big Data Analytics. High dimensionality, variability, uncertainty and speed of generating such data pose new challenges in data analysis using standard statistical methods, especially when Big Data consists of redundant as well as important information. Devising intelligent methods is the need of the hour to extract meaningful information from Big Data. Different computational tools such as rough-set theory, fuzzy-set theory, fuzzy-rough-set and genetic algorithm that are often applied to analyse such kind of data are the focus of this chapter. But sometimes local optimal solution is achieved due to premature convergence, so hybridization of genetic algorithm with local search methods has been discussed here. Genetic algorithm, a well-proven global optimization algorithm, has been extended to search the fitness space more efficiently in order to select global optimum feature subset. Real-life data is often vague, so fuzzy logic and rough-set theory are applied to handle uncertainty and maintain consistency in the data sets. The aim of the fuzzy-rough-based method is to generate optimum variation in the range of membership functions of linguistic variables. As a next step, dimensionality reduction is performed to search the selected features for discovering knowledge from the given data set. The searching of most informative features may terminate at local optimum, whereas the global optimum may lie elsewhere in the search space. To remove local minima, an algorithm is proposed using fuzzy-rough-set concept and genetic algorithm. The proposed algorithm searches the most informative attribute set by utilising the optimal range of membership values used to design the objective function. Finally, a case study is given where the dimension reduction techniques are applied in the field of agricultural science, a real-life application domain. Rice plants diseases infect leaves, stems, roots and other parts, which cause degradation of production. Disease identification and taking precaution is very important data analytic task in the field of agriculture. Here, it is demonstrated in the case study to show how the images are collected from the fields, diseased features are extracted and preprocessed and finally important features are selected using genetic algorithm-based local searching technique and fuzzy-rough-set theory. These features are important to develop a decision support system to predict the diseases and accordingly devise methods to protect the most important crops.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–686
Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57:86–94
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1.2):245–271
Knowles JD, David WC (2000) M-PAES: a memetic algorithm for multiobjective optimization. In: Proceedings of the 2000 congress on evolutionary computation, 2000, vol. 1. IEEE
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3.2:95–99
Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6.2:182–197
Coello CA et al (2007) Evolutionary algorithms for solving multi-objective problems. Springer, New York
Kim KW, Yun YS, Yoon JM, Gen M, Yamazaki G (2005) Hybrid genetic algorithm with adaptive abilities for resource constrained multiple project scheduling. Comput Ind 56(2):143–160
Diaz CA, Muro AG, Pérez RB, Morales EV (2014) A hybrid model of genetic algorithm with local search to discover linguistic data summaries from creep data. In: Proceedings of the expert system with appllications, 2014, pp 2035–2042
Ishibuchi H, Murata T (1998) A multi-objective genetic local search algorithm and its application to flowshop scheduling. IEEE Trans Syst Man Cybern 28(3):392–403
Sharma S, Mathew TV (2011) Multiobjective network design for emission and travel-time trade-off for a sustainable large urban transportation network. Environ Plan B: Plan Des 38.3:520–538
Kalyanmoy D et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6.2:182–197
Pati SK et al (2013) Gene selection using multiobjective genetic algorithm integrating cellular automata and rough set theory, swarm, evolutionary, and memetic computing. Springer, Cham, pp 144–155
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29:661–688
Pawlak Z (1991) Rough sets – theoretical aspects of reasoning about data. Kluwer Academic Publishers, Boston/London/Dordrecht, p 229
Jensen R, Shen Q (2002) Fuzzy-rough sets for descriptive dimensionality reduction. In: Procceding of the 11th international conference on fuzzy systems, pp 29–34
Moumita S, Sil J (2011) Dimensionality reduction using genetic algorithm and fuzzy-rough concepts. In: 2011 world congress on information and communication technologies, IEEE, pp 379–384
Qinghua H, Daren Y, Zongxia X (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Zhigang L, Zetian F, Yan S, Tiehua X (2003) Prototype system of automatic identification of cotton insect pest and intelligent decision based on machine vision. American Society of Agricultural and Biological Engineers
Qin Z, Zhang M, Christensen T, Li W, Tang H (2003) Remote sensing analysis of rice disease stresses for farm pest management using wide band airborne data. IEEE 4:2215–2217
Gonzalez RC, Woods RE (2007) Digital image processing. Pearson Education, New Delhi
Pratt WK (2010) Digital image processing. Wiley, New York
Chan RH, Ho CW, Nikolova M (2005) Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization. IEEE Trans Image Process 14(10):1479–1485
Bannari A, Morin D, Bonn F, Huete AR (1995) A review of vegetation indices. Remote Sens Rev 13(1):95–120
Wang N, Dowell FE, Zhang N (2003) Determining wheat vitreousness using image processing and a neural network. Trans Am Soc Agric Eng 46(4):1143–1150
Mingqiang Y, Kidiyo K, Joseph R (2008) A survey of shape feature extraction techniques. In: Pattern recognition techniques, technology and applications. I-Tech, Vienna
Horn B (1986) Robot vision. MIT Press, Cambridge
Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983–1001
Cheng C, Liu W, Zhang H (2001) Image retrieval based on region shape similarity. In: Proceedings of the 13th SPIE symposium on electronic imaging, storage and retrieval for image and video databases
Soffer A (1997) Negative shape features for image databases consisting of geographic symbols. In: Proceedings of the 3rd international workshop on visual form
Zhang D, Lu G (2002) A comparative study of fourier descriptors for shape representation and retrieval. In: Proceedings of the 5th asian conference on computer vision
Mukundan R (2004) A new class of rotational invariants using discrete orthogonal moments. Sixth IASTED international conference on signal and image processing, pp 80–84
Yao Q, Guan Z, Zhou Y, Tang J, Hu Y, Yang B (2009) Application of support vector machine for detecting rice diseases using shape and colour texture features. IEEE International Conference on Engineering Computation, pp 79–83
Robert M, Haralick KS, Itshak D (1973) Texture features for image classification. IEEE Trans Syst Man Cybern SMC-3(6):610–621
Weszka JS, Dyer CR, Rosenfeld A (1976) A comparative study of texture measures for terrain classification. IEEE SMC-6:269–285
Kashyap RL, Chellappa R (1983) Estimation and choice of neighbors in spatial interaction models of images. IT V29:60–72
Kartikeyan B, Sarkar A (1991) An identification approach for 2-D autoregressive models in describing textures. CVGIP Graph Model Image Process 53:121–131
Mao J, Jain AK (1992) Texture classification and segmentation using multi resolution simultaneous autoregressive models. Pattern Recogn 25(2):173–188
Phadikar S, Sil J, Das AK (2012) Feature selection and rule generation to classify rice diseases by extracting features using empirical methods. J Comput Electron Agric 75:304–312
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Sil, J., Das, A.K. (2016). Feature Selection for Adaptive Decision Making in Big Data Analytics. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-31861-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)