Skip to main content

Data and Dimensionality Reduction in Data Analysis and System Modeling

  • Reference work entry

Definition of the Subject

Data and dimensionality reduction are fundamental pursuits of data analysis and system modeling. With the rapid growth of sizes of data sets and the diversity of data themselves, the use of some reduction mechanisms becomes a necessity. Data reduction is concerned with a reduction of sizes of data sets in terms of the number of data points. This helps reveal an underlying structure in data by presenting a collection of groups present in data. Given a number of groups which is very limited, the clustering mechanisms become effective in terms of data reduction. Dimensionality reduction is aimed at the reduction of the number of attributes (features) of the data which leads to a typically small subset of features or brings the data from a highly dimensional feature space to a new one of a far lower dimensionality. A joint reduction process involves data and feature reduction.

Introduction

In the information age, we are continuously flooded by enormous amounts of...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   3,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

Reduction process:

A suite of activities leading to the reduction of available data and/or reduction of features.

Feature selection:

An algorithmic process in which a large set of features (attributes) is reduced by choosing a certain relatively small subset of them. The reduction is a combinatorial optimization task which is NP‐complete. Given this, quite often it is realized in a suboptimal way.

Feature transformation:

A process of transforming a highly dimensional feature space into a low‐dimensional counterpart. These transformations are linear or nonlinear and can be guided by some optimization criterion. The commonly encountered methods utilize Principal Component Analysis (PCA) which is an example of a linear feature transformation.

Dimensionality reduction:

A way of converting a large data set into a representative subset. Typically, the data are grouped into clusters whose prototypes are representatives of the overall data set.

Curse of dimensionality:

A phenomenon of a rapid exponential increase of computing related with the dimensionality of the problem (say, the number of data or the number of features) which prevents us from achieving an optimal solution. The curse of dimensionality leads to the construction of sub‐optimal solutions.

Data mining:

A host of activities aimed at discovery of easily interpretable and experimentally sound findings in huge data sets.

Biologically‐inspired optimization:

An array of optimization techniques realizing searches in highly‐dimensional spaces where the search itself is guided by a collection of mechanisms (operators) inspired by biological search processes. Genetic algorithms, evolutionary methods, particle swarm optimization, and ant colonies are examples of biologically‐inspired search techniques.

Bibliography

Primary Literature

  1. Bargiela A, Pedrycz W (2003) Granular Computing: An Introduction. Kluwer, Dordrecht

    MATH  Google Scholar 

  2. Bezdek JC (1992) On the relationship between neural networks, pattern recognition and intelligence. Int J Approx Reason 6(2):85–107

    Google Scholar 

  3. Duda RO, Hart PE, Stork DF (2001) Pattern Classification, 2nd edn. Wiley, Ney York

    MATH  Google Scholar 

  4. Cheng Y, Church GM (2000) Biclustering of expression data. Proc 8th Int Conf on Intelligent Systems for Molecular Biology, pp 93–103

    Google Scholar 

  5. Gersho A, Gray RM (1992) Vector Quantization and Signal Compression. Kluwer, Boston

    MATH  Google Scholar 

  6. Gottwald S (2005) Mathematical fuzzy logic as a tool for the treatment of vague information. Inf Sci 172(1–2):41–71

    MathSciNet  MATH  Google Scholar 

  7. Gray RM (1984) Vector quantization. IEEE Acoust Speech Signal Process 1:4–29

    Google Scholar 

  8. Hansen E (1975) A generalized interval arithmetic. Lecture Notes in Computer Science, vol 29. Springer, Berlin, pp 7–18

    Google Scholar 

  9. Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129

    Google Scholar 

  10. Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied Interval Analysis. Springer, London

    MATH  Google Scholar 

  11. Jolliffe IT (1986) Principal Component Analysis. Springer, New York

    Google Scholar 

  12. Kennedy J, Eberhart RC (1995) Particle swarm optimization, vol 4. Proc IEEE Int Conf on Neural Networks. IEEE Press, Piscataway, pp 1942–1948

    Google Scholar 

  13. Kohonen T (1989) Self Organization and Associative Memory, 3rd edn. Springer, Berlin

    Google Scholar 

  14. Mitchell M (1996) An Introduction to Genetic Algorithms. MIT Press, Cambridge

    Google Scholar 

  15. Moore R (1966) Interval Analysis. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  16. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356

    MathSciNet  MATH  Google Scholar 

  17. Pawlak Z (1991) Rough Sets. Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht

    MATH  Google Scholar 

  18. Pawlak Z, Skowron A (2007) Rough sets: some extensions. Inf Sci 177(1):28–40

    MathSciNet  MATH  Google Scholar 

  19. Pedrycz W (ed) (2001) Granular Computing: An Emerging Paradigm. Physica, Heidelberg

    MATH  Google Scholar 

  20. Pedrycz W (2005) Knowledge‐based Clustering. Wiley, Hoboken

    MATH  Google Scholar 

  21. Warmus M (1956) Calculus of approximations. Bull Acad Pol Sci 4(5):253–259

    MathSciNet  MATH  Google Scholar 

  22. Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta MM, Ragade RK, Yager RR (eds) Advances in Fuzzy Set Theory and Applications. North Holland, Amsterdam, pp 3–18

    Google Scholar 

  23. Zadeh LA (1996) Fuzzy logic = Computing with words. IEEE Trans Fuzzy Syst 4:103–111

    Google Scholar 

  24. Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90:111–117

    MathSciNet  MATH  Google Scholar 

  25. Zadeh LA (1999) From computing with numbers to computing with words-from manipulation of measurements to manipulation of perceptions. IEEE Trans Circ Syst 45:105–119

    MathSciNet  Google Scholar 

  26. Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU) – an outline. Inf Sci 172:1–40

    MathSciNet  MATH  ADS  Google Scholar 

  27. Zimmermann HJ (1996) Fuzzy Set Theory and Its Applications, 3rd edn. Kluwer, Norwell

    MATH  Google Scholar 

Books and Reviews

  1. Baldi P, Hornik K (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2:53–58

    Google Scholar 

  2. Bortolan G, Pedrycz W (2002) Fuzzy descriptive models: an interactive framework of information granulation. IEEE Trans Fuzzy Syst 10(6):743–755

    Google Scholar 

  3. Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35(9):2964–2987

    MathSciNet  MATH  Google Scholar 

  4. Fukunaga K (1990) Introduction to Statistical Pattern Recognition. Academic Press, Boston

    MATH  Google Scholar 

  5. Gifi A (1990) Nonlinear Multivariate Analysis. Wiley, Chichester

    MATH  Google Scholar 

  6. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Google Scholar 

  7. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    MATH  Google Scholar 

  8. Lauro C, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15:73–87

    MATH  Google Scholar 

  9. Manly BF, Bryan FJ (1986) Multivariate Statistical Methods: A Primer. Chapman and Hall, London

    Google Scholar 

  10. Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Google Scholar 

  11. Monahan AH (2000) Nonlinear principal component analysis by neural networks: Theory and applications to the Lorenz system. J Clim 13:821–835

    ADS  Google Scholar 

  12. Muni DP, Das Pal NR (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B 36(1):106–117

    Google Scholar 

  13. Pawlak Z (1991) Rough Sets – Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht

    MATH  Google Scholar 

  14. Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recognit 35:825–834

    MATH  Google Scholar 

  15. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(1):2323–2326

    ADS  Google Scholar 

  16. Setiono R, Liu H (1977) Neural‐network feature selector, IEEE Trans. Neural Netw 8(3):654–662

    Google Scholar 

  17. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(1):2319–2323

    ADS  Google Scholar 

  18. Watada J, Yabuuchi Y (1997) Fuzzy principal component analysis and its application. Biomed Fuzzy Human Sci 3:83–92

    Google Scholar 

Download references

Acknowledgments

Support from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Canada Research Chair (CRC) is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendix: Particle Swarm Optimization (PSO)

Appendix: Particle Swarm Optimization (PSO)

In the studies on dimensionality and data reduction, we consider the use of Particle Swarm Optimization (PSO) . PSO is an example of a biologically‐inspired and population‐driven optimization. Originally, the algorithm was introduced by Kennedy and Eberhart [12] where the authors strongly emphasized an inspiration coming from the swarming behavior of animals as well as some inspiration coming from the as well as human social behavior, cf. also [19]. In essence, a particle swarm is a population of particles – possible solutions in the multidimensional search space. Each particle explores the search space and during this search adheres to some quite intuitively appealing guidelines navigating the search process: (a) it tries to follow its previous direction, and (b) it looks back at the best performance both at the level of the individual particle and the entire population. In this sense there is some collective search of the problem space along with some component of memory incorporated as an integral part of the search mechanism.

The performance of each particle during its movement is assessed by means of some performance index. A position of a swarm in the search space, is described by some vector \( { \boldsymbol{z}(t) } \) where “t” denotes consecutive discrete time moments. The next position of the particle is governed by the following update expressions concerning the particle, \( { \boldsymbol{z}(t+1) } \) and its speed, \( { \boldsymbol{v}(t+1) } \)

$$ \begin{aligned} & \boldsymbol{z}(t+1) = \boldsymbol{z}(t) + \boldsymbol{v}(t+1)\\ & \qquad\qquad\qquad\quad \text{// update of position of the particle}\\ & \boldsymbol{v}(t+1) = \xi \boldsymbol{v}(t) + \phi_{1}(\boldsymbol{p}- \boldsymbol{x}(t)) + \phi_{2} (\boldsymbol{p}_\mathrm{total}-\boldsymbol{x}(t))\\ & \qquad\qquad\qquad\quad \text{// update of speed of the particle} \end{aligned} $$

where \( { \boldsymbol{p} } \) denotes the best position (the lowest performance index) reported so far for this particle, \( { \boldsymbol{p}_\text{total} } \) is the best position overall developed so far across the whole population. ϕ1 and ϕ2 are random number drawn from the uniform distribution \( { U[0,2] } \) that help build a proper mix of the components of the speed; different random numbers affect the individual coordinates of the speed. The second expression governing the change in the velocity of the particle is particularly interesting as it nicely captures the relationships between the particle and its history as well as the history of the overall population in terms of their performance reported so far.

There are three components determining the updated speed of the particle. First, the current speed \( { \boldsymbol{v}(t) } \) is scaled by the inertial weight (ξ) smaller than 1 whose role is to articulate some tendency to a drastic change of the current speed. Second, we relate to the memory of this particle by recalling the best position of the particle achieved so far. Thirdly, there is some reliance on the best performance reported across the whole population (which is captured by the last component of the expression governing the speed adjustment).

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag

About this entry

Cite this entry

Pedrycz, W. (2009). Data and Dimensionality Reduction in Data Analysis and System Modeling. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30440-3_113

Download citation

Publish with us

Policies and ethics