Skip to main content

Feature Selection for Adaptive Decision Making in Big Data Analytics

  • Chapter
  • First Online:
Data Science and Big Data Computing

Abstract

Rapid growth in technology and its accessibility by general public produce voluminous, heterogeneous and unstructured data resulted in the emergence of new concepts, viz. Big Data and Big Data Analytics. High dimensionality, variability, uncertainty and speed of generating such data pose new challenges in data analysis using standard statistical methods, especially when Big Data consists of redundant as well as important information. Devising intelligent methods is the need of the hour to extract meaningful information from Big Data. Different computational tools such as rough-set theory, fuzzy-set theory, fuzzy-rough-set and genetic algorithm that are often applied to analyse such kind of data are the focus of this chapter. But sometimes local optimal solution is achieved due to premature convergence, so hybridization of genetic algorithm with local search methods has been discussed here. Genetic algorithm, a well-proven global optimization algorithm, has been extended to search the fitness space more efficiently in order to select global optimum feature subset. Real-life data is often vague, so fuzzy logic and rough-set theory are applied to handle uncertainty and maintain consistency in the data sets. The aim of the fuzzy-rough-based method is to generate optimum variation in the range of membership functions of linguistic variables. As a next step, dimensionality reduction is performed to search the selected features for discovering knowledge from the given data set. The searching of most informative features may terminate at local optimum, whereas the global optimum may lie elsewhere in the search space. To remove local minima, an algorithm is proposed using fuzzy-rough-set concept and genetic algorithm. The proposed algorithm searches the most informative attribute set by utilising the optimal range of membership values used to design the objective function. Finally, a case study is given where the dimension reduction techniques are applied in the field of agricultural science, a real-life application domain. Rice plants diseases infect leaves, stems, roots and other parts, which cause degradation of production. Disease identification and taking precaution is very important data analytic task in the field of agriculture. Here, it is demonstrated in the case study to show how the images are collected from the fields, diseased features are extracted and preprocessed and finally important features are selected using genetic algorithm-based local searching technique and fuzzy-rough-set theory. These features are important to develop a decision support system to predict the diseases and accordingly devise methods to protect the most important crops.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–686

    Article  Google Scholar 

  2. Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57:86–94

    Article  Google Scholar 

  3. Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1.2):245–271

    Article  MathSciNet  MATH  Google Scholar 

  4. Knowles JD, David WC (2000) M-PAES: a memetic algorithm for multiobjective optimization. In: Proceedings of the 2000 congress on evolutionary computation, 2000, vol. 1. IEEE

    Google Scholar 

  5. Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3.2:95–99

    Article  Google Scholar 

  6. Deb K et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6.2:182–197

    Article  Google Scholar 

  7. Coello CA et al (2007) Evolutionary algorithms for solving multi-objective problems. Springer, New York

    MATH  Google Scholar 

  8. Kim KW, Yun YS, Yoon JM, Gen M, Yamazaki G (2005) Hybrid genetic algorithm with adaptive abilities for resource constrained multiple project scheduling. Comput Ind 56(2):143–160

    Article  Google Scholar 

  9. Diaz CA, Muro AG, Pérez RB, Morales EV (2014) A hybrid model of genetic algorithm with local search to discover linguistic data summaries from creep data. In: Proceedings of the expert system with appllications, 2014, pp 2035–2042

    Google Scholar 

  10. Ishibuchi H, Murata T (1998) A multi-objective genetic local search algorithm and its application to flowshop scheduling. IEEE Trans Syst Man Cybern 28(3):392–403

    Article  Google Scholar 

  11. Sharma S, Mathew TV (2011) Multiobjective network design for emission and travel-time trade-off for a sustainable large urban transportation network. Environ Plan B: Plan Des 38.3:520–538

    Article  Google Scholar 

  12. Kalyanmoy D et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6.2:182–197

    Google Scholar 

  13. Pati SK et al (2013) Gene selection using multiobjective genetic algorithm integrating cellular automata and rough set theory, swarm, evolutionary, and memetic computing. Springer, Cham, pp 144–155

    Google Scholar 

  14. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  15. Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29:661–688

    Article  MATH  Google Scholar 

  16. Pawlak Z (1991) Rough sets – theoretical aspects of reasoning about data. Kluwer Academic Publishers, Boston/London/Dordrecht, p 229

    MATH  Google Scholar 

  17. Jensen R, Shen Q (2002) Fuzzy-rough sets for descriptive dimensionality reduction. In: Procceding of the 11th international conference on fuzzy systems, pp 29–34

    Google Scholar 

  18. Moumita S, Sil J (2011) Dimensionality reduction using genetic algorithm and fuzzy-rough concepts. In: 2011 world congress on information and communication technologies, IEEE, pp 379–384

    Google Scholar 

  19. Qinghua H, Daren Y, Zongxia X (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423

    Article  Google Scholar 

  20. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  21. Zhigang L, Zetian F, Yan S, Tiehua X (2003) Prototype system of automatic identification of cotton insect pest and intelligent decision based on machine vision. American Society of Agricultural and Biological Engineers

    Google Scholar 

  22. Qin Z, Zhang M, Christensen T, Li W, Tang H (2003) Remote sensing analysis of rice disease stresses for farm pest management using wide band airborne data. IEEE 4:2215–2217

    Google Scholar 

  23. Gonzalez RC, Woods RE (2007) Digital image processing. Pearson Education, New Delhi

    Google Scholar 

  24. Pratt WK (2010) Digital image processing. Wiley, New York

    MATH  Google Scholar 

  25. Chan RH, Ho CW, Nikolova M (2005) Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization. IEEE Trans Image Process 14(10):1479–1485

    Article  Google Scholar 

  26. Bannari A, Morin D, Bonn F, Huete AR (1995) A review of vegetation indices. Remote Sens Rev 13(1):95–120

    Article  Google Scholar 

  27. Wang N, Dowell FE, Zhang N (2003) Determining wheat vitreousness using image processing and a neural network. Trans Am Soc Agric Eng 46(4):1143–1150

    Google Scholar 

  28. Mingqiang Y, Kidiyo K, Joseph R (2008) A survey of shape feature extraction techniques. In: Pattern recognition techniques, technology and applications. I-Tech, Vienna

    Google Scholar 

  29. Horn B (1986) Robot vision. MIT Press, Cambridge

    Google Scholar 

  30. Loncaric S (1998) A survey of shape analysis techniques. Pattern Recogn 31(8):983–1001

    Article  Google Scholar 

  31. Cheng C, Liu W, Zhang H (2001) Image retrieval based on region shape similarity. In: Proceedings of the 13th SPIE symposium on electronic imaging, storage and retrieval for image and video databases

    Google Scholar 

  32. Soffer A (1997) Negative shape features for image databases consisting of geographic symbols. In: Proceedings of the 3rd international workshop on visual form

    Google Scholar 

  33. Zhang D, Lu G (2002) A comparative study of fourier descriptors for shape representation and retrieval. In: Proceedings of the 5th asian conference on computer vision

    Google Scholar 

  34. Mukundan R (2004) A new class of rotational invariants using discrete orthogonal moments. Sixth IASTED international conference on signal and image processing, pp 80–84

    Google Scholar 

  35. Yao Q, Guan Z, Zhou Y, Tang J, Hu Y, Yang B (2009) Application of support vector machine for detecting rice diseases using shape and colour texture features. IEEE International Conference on Engineering Computation, pp 79–83

    Google Scholar 

  36. Robert M, Haralick KS, Itshak D (1973) Texture features for image classification. IEEE Trans Syst Man Cybern SMC-3(6):610–621

    Google Scholar 

  37. Weszka JS, Dyer CR, Rosenfeld A (1976) A comparative study of texture measures for terrain classification. IEEE SMC-6:269–285

    MATH  Google Scholar 

  38. Kashyap RL, Chellappa R (1983) Estimation and choice of neighbors in spatial interaction models of images. IT V29:60–72

    MATH  Google Scholar 

  39. Kartikeyan B, Sarkar A (1991) An identification approach for 2-D autoregressive models in describing textures. CVGIP Graph Model Image Process 53:121–131

    Article  Google Scholar 

  40. Mao J, Jain AK (1992) Texture classification and segmentation using multi resolution simultaneous autoregressive models. Pattern Recogn 25(2):173–188

    Article  Google Scholar 

  41. Phadikar S, Sil J, Das AK (2012) Feature selection and rule generation to classify rice diseases by extracting features using empirical methods. J Comput Electron Agric 75:304–312

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaya Sil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Sil, J., Das, A.K. (2016). Feature Selection for Adaptive Decision Making in Big Data Analytics. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31861-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31859-2

  • Online ISBN: 978-3-319-31861-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics