Feature Selection for Adaptive Decision Making in Big Data Analytics

Sil, Jaya; Das, Asit Kumar

doi:10.1007/978-3-319-31861-5_12

Jaya Sil² &
Asit Kumar Das²

4486 Accesses
1 Citations
1 Altmetric

Abstract

Rapid growth in technology and its accessibility by general public produce voluminous, heterogeneous and unstructured data resulted in the emergence of new concepts, viz. Big Data and Big Data Analytics. High dimensionality, variability, uncertainty and speed of generating such data pose new challenges in data analysis using standard statistical methods, especially when Big Data consists of redundant as well as important information. Devising intelligent methods is the need of the hour to extract meaningful information from Big Data. Different computational tools such as rough-set theory, fuzzy-set theory, fuzzy-rough-set and genetic algorithm that are often applied to analyse such kind of data are the focus of this chapter. But sometimes local optimal solution is achieved due to premature convergence, so hybridization of genetic algorithm with local search methods has been discussed here. Genetic algorithm, a well-proven global optimization algorithm, has been extended to search the fitness space more efficiently in order to select global optimum feature subset. Real-life data is often vague, so fuzzy logic and rough-set theory are applied to handle uncertainty and maintain consistency in the data sets. The aim of the fuzzy-rough-based method is to generate optimum variation in the range of membership functions of linguistic variables. As a next step, dimensionality reduction is performed to search the selected features for discovering knowledge from the given data set. The searching of most informative features may terminate at local optimum, whereas the global optimum may lie elsewhere in the search space. To remove local minima, an algorithm is proposed using fuzzy-rough-set concept and genetic algorithm. The proposed algorithm searches the most informative attribute set by utilising the optimal range of membership values used to design the objective function. Finally, a case study is given where the dimension reduction techniques are applied in the field of agricultural science, a real-life application domain. Rice plants diseases infect leaves, stems, roots and other parts, which cause degradation of production. Disease identification and taking precaution is very important data analytic task in the field of agriculture. Here, it is demonstrated in the case study to show how the images are collected from the fields, diseased features are extracted and preprocessed and finally important features are selected using genetic algorithm-based local searching technique and fuzzy-rough-set theory. These features are important to develop a decision support system to predict the diseases and accordingly devise methods to protect the most important crops.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–686
Article Google Scholar
Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57:86–94
Article Google Scholar
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1.2):245–271
Article MathSciNet MATH Google Scholar
Knowles JD, David WC (2000) M-PAES: a memetic algorithm for multiobjective optimization. In: Proceedings of the 2000 congress on evolutionary computation, 2000, vol. 1. IEEE
Google Scholar
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3.2:95–99
Article Google Scholar
Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6.2:182–197
Article Google Scholar
Coello CA et al (2007) Evolutionary algorithms for solving multi-objective problems. Springer, New York
MATH Google Scholar
Kim KW, Yun YS, Yoon JM, Gen M, Yamazaki G (2005) Hybrid genetic algorithm with adaptive abilities for resource constrained multiple project scheduling. Comput Ind 56(2):143–160
Article Google Scholar
Diaz CA, Muro AG, Pérez RB, Morales EV (2014) A hybrid model of genetic algorithm with local search to discover linguistic data summaries from creep data. In: Proceedings of the expert system with appllications, 2014, pp 2035–2042
Google Scholar
Ishibuchi H, Murata T (1998) A multi-objective genetic local search algorithm and its application to flowshop scheduling. IEEE Trans Syst Man Cybern 28(3):392–403
Article Google Scholar
Sharma S, Mathew TV (2011) Multiobjective network design for emission and travel-time trade-off for a sustainable large urban transportation network. Environ Plan B: Plan Des 38.3:520–538
Article Google Scholar
Kalyanmoy D et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6.2:182–197
Google Scholar
Pati SK et al (2013) Gene selection using multiobjective genetic algorithm integrating cellular automata and rough set theory, swarm, evolutionary, and memetic computing. Springer, Cham, pp 144–155
Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29:661–688
Article MATH Google Scholar
Pawlak Z (1991) Rough sets – theoretical aspects of reasoning about data. Kluwer Academic Publishers, Boston/London/Dordrecht, p 229
MATH Google Scholar
Jensen R, Shen Q (2002) Fuzzy-rough sets for descriptive dimensionality reduction. In: Procceding of the 11th international conference on fuzzy systems, pp 29–34
Google Scholar
Moumita S, Sil J (2011) Dimensionality reduction using genetic algorithm and fuzzy-rough concepts. In: 2011 world congress on information and communication technologies, IEEE, pp 379–384
Google Scholar
Qinghua H, Daren Y, Zongxia X (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423
Article Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Zhigang L, Zetian F, Yan S, Tiehua X (2003) Prototype system of automatic identification of cotton insect pest and intelligent decision based on machine vision. American Society of Agricultural and Biological Engineers
Google Scholar
Qin Z, Zhang M, Christensen T, Li W, Tang H (2003) Remote sensing analysis of rice disease stresses for farm pest management using wide band airborne data. IEEE 4:2215–2217
Google Scholar
Gonzalez RC, Woods RE (2007) Digital image processing. Pearson Education, New Delhi
Google Scholar
Pratt WK (2010) Digital image processing. Wiley, New York
MATH Google Scholar
Chan RH, Ho CW, Nikolova M (2005) Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization. IEEE Trans Image Process 14(10):1479–1485
Article Google Scholar
Bannari A, Morin D, Bonn F, Huete AR (1995) A review of vegetation indices. Remote Sens Rev 13(1):95–120
Article Google Scholar
Wang N, Dowell FE, Zhang N (2003) Determining wheat vitreousness using image processing and a neural network. Trans Am Soc Agric Eng 46(4):1143–1150
Google Scholar
Mingqiang Y, Kidiyo K, Joseph R (2008) A survey of shape feature extraction techniques. In: Pattern recognition techniques, technology and applications. I-Tech, Vienna
Google Scholar
Horn B (1986) Robot vision. MIT Press, Cambridge
Google Scholar
Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983–1001
Article Google Scholar
Cheng C, Liu W, Zhang H (2001) Image retrieval based on region shape similarity. In: Proceedings of the 13th SPIE symposium on electronic imaging, storage and retrieval for image and video databases
Google Scholar
Soffer A (1997) Negative shape features for image databases consisting of geographic symbols. In: Proceedings of the 3rd international workshop on visual form
Google Scholar
Zhang D, Lu G (2002) A comparative study of fourier descriptors for shape representation and retrieval. In: Proceedings of the 5th asian conference on computer vision
Google Scholar
Mukundan R (2004) A new class of rotational invariants using discrete orthogonal moments. Sixth IASTED international conference on signal and image processing, pp 80–84
Google Scholar
Yao Q, Guan Z, Zhou Y, Tang J, Hu Y, Yang B (2009) Application of support vector machine for detecting rice diseases using shape and colour texture features. IEEE International Conference on Engineering Computation, pp 79–83
Google Scholar
Robert M, Haralick KS, Itshak D (1973) Texture features for image classification. IEEE Trans Syst Man Cybern SMC-3(6):610–621
Google Scholar
Weszka JS, Dyer CR, Rosenfeld A (1976) A comparative study of texture measures for terrain classification. IEEE SMC-6:269–285
MATH Google Scholar
Kashyap RL, Chellappa R (1983) Estimation and choice of neighbors in spatial interaction models of images. IT V29:60–72
MATH Google Scholar
Kartikeyan B, Sarkar A (1991) An identification approach for 2-D autoregressive models in describing textures. CVGIP Graph Model Image Process 53:121–131
Article Google Scholar
Mao J, Jain AK (1992) Texture classification and segmentation using multi resolution simultaneous autoregressive models. Pattern Recogn 25(2):173–188
Article Google Scholar
Phadikar S, Sil J, Das AK (2012) Feature selection and rule generation to classify rice diseases by extracting features using empirical methods. J Comput Electron Agric 75:304–312
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India
Jaya Sil & Asit Kumar Das

Authors

Jaya Sil
View author publications
You can also search for this author in PubMed Google Scholar
Asit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaya Sil .

Editor information

Editors and Affiliations

Department of Computing and Mathematics , University of Derby, Derby, United Kingdom
Zaigham Mahmood

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sil, J., Das, A.K. (2016). Feature Selection for Adaptive Decision Making in Big Data Analytics. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-31861-5_12
Published: 06 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31859-2
Online ISBN: 978-3-319-31861-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics