Definition of the Subject
Data and dimensionality reduction are fundamental pursuits of data analysis and system modeling. With the rapid growth of sizes of data sets and the diversity of data themselves, the use of some reduction mechanisms becomes a necessity. Data reduction is concerned with a reduction of sizes of data sets in terms of the number of data points. This helps reveal an underlying structure in data by presenting a collection of groups present in data. Given a number of groups which is very limited, the clustering mechanisms become effective in terms of data reduction. Dimensionality reduction is aimed at the reduction of the number of attributes (features) of the data which leads to a typically small subset of features or brings the data from a highly dimensional feature space to a new one of a far lower dimensionality. A joint reduction process involves data and feature reduction.
Introduction
In the information age, we are continuously flooded by enormous amounts of...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- Reduction process:
-
A suite of activities leading to the reduction of available data and/or reduction of features.
- Feature selection:
-
An algorithmic process in which a large set of features (attributes) is reduced by choosing a certain relatively small subset of them. The reduction is a combinatorial optimization task which is NP‐complete. Given this, quite often it is realized in a suboptimal way.
- Feature transformation:
-
A process of transforming a highly dimensional feature space into a low‐dimensional counterpart. These transformations are linear or nonlinear and can be guided by some optimization criterion. The commonly encountered methods utilize Principal Component Analysis (PCA) which is an example of a linear feature transformation.
- Dimensionality reduction:
-
A way of converting a large data set into a representative subset. Typically, the data are grouped into clusters whose prototypes are representatives of the overall data set.
- Curse of dimensionality:
-
A phenomenon of a rapid exponential increase of computing related with the dimensionality of the problem (say, the number of data or the number of features) which prevents us from achieving an optimal solution. The curse of dimensionality leads to the construction of sub‐optimal solutions.
- Data mining:
-
A host of activities aimed at discovery of easily interpretable and experimentally sound findings in huge data sets.
- Biologically‐inspired optimization:
-
An array of optimization techniques realizing searches in highly‐dimensional spaces where the search itself is guided by a collection of mechanisms (operators) inspired by biological search processes. Genetic algorithms, evolutionary methods, particle swarm optimization, and ant colonies are examples of biologically‐inspired search techniques.
Bibliography
Primary Literature
Bargiela A, Pedrycz W (2003) Granular Computing: An Introduction. Kluwer, Dordrecht
Bezdek JC (1992) On the relationship between neural networks, pattern recognition and intelligence. Int J Approx Reason 6(2):85–107
Duda RO, Hart PE, Stork DF (2001) Pattern Classification, 2nd edn. Wiley, Ney York
Cheng Y, Church GM (2000) Biclustering of expression data. Proc 8th Int Conf on Intelligent Systems for Molecular Biology, pp 93–103
Gersho A, Gray RM (1992) Vector Quantization and Signal Compression. Kluwer, Boston
Gottwald S (2005) Mathematical fuzzy logic as a tool for the treatment of vague information. Inf Sci 172(1–2):41–71
Gray RM (1984) Vector quantization. IEEE Acoust Speech Signal Process 1:4–29
Hansen E (1975) A generalized interval arithmetic. Lecture Notes in Computer Science, vol 29. Springer, Berlin, pp 7–18
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied Interval Analysis. Springer, London
Jolliffe IT (1986) Principal Component Analysis. Springer, New York
Kennedy J, Eberhart RC (1995) Particle swarm optimization, vol 4. Proc IEEE Int Conf on Neural Networks. IEEE Press, Piscataway, pp 1942–1948
Kohonen T (1989) Self Organization and Associative Memory, 3rd edn. Springer, Berlin
Mitchell M (1996) An Introduction to Genetic Algorithms. MIT Press, Cambridge
Moore R (1966) Interval Analysis. Prentice Hall, Englewood Cliffs
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356
Pawlak Z (1991) Rough Sets. Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht
Pawlak Z, Skowron A (2007) Rough sets: some extensions. Inf Sci 177(1):28–40
Pedrycz W (ed) (2001) Granular Computing: An Emerging Paradigm. Physica, Heidelberg
Pedrycz W (2005) Knowledge‐based Clustering. Wiley, Hoboken
Warmus M (1956) Calculus of approximations. Bull Acad Pol Sci 4(5):253–259
Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta MM, Ragade RK, Yager RR (eds) Advances in Fuzzy Set Theory and Applications. North Holland, Amsterdam, pp 3–18
Zadeh LA (1996) Fuzzy logic = Computing with words. IEEE Trans Fuzzy Syst 4:103–111
Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90:111–117
Zadeh LA (1999) From computing with numbers to computing with words-from manipulation of measurements to manipulation of perceptions. IEEE Trans Circ Syst 45:105–119
Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU) – an outline. Inf Sci 172:1–40
Zimmermann HJ (1996) Fuzzy Set Theory and Its Applications, 3rd edn. Kluwer, Norwell
Books and Reviews
Baldi P, Hornik K (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2:53–58
Bortolan G, Pedrycz W (2002) Fuzzy descriptive models: an interactive framework of information granulation. IEEE Trans Fuzzy Syst 10(6):743–755
Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35(9):2964–2987
Fukunaga K (1990) Introduction to Statistical Pattern Recognition. Academic Press, Boston
Gifi A (1990) Nonlinear Multivariate Analysis. Wiley, Chichester
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Lauro C, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15:73–87
Manly BF, Bryan FJ (1986) Multivariate Statistical Methods: A Primer. Chapman and Hall, London
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Monahan AH (2000) Nonlinear principal component analysis by neural networks: Theory and applications to the Lorenz system. J Clim 13:821–835
Muni DP, Das Pal NR (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B 36(1):106–117
Pawlak Z (1991) Rough Sets – Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht
Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recognit 35:825–834
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(1):2323–2326
Setiono R, Liu H (1977) Neural‐network feature selector, IEEE Trans. Neural Netw 8(3):654–662
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(1):2319–2323
Watada J, Yabuuchi Y (1997) Fuzzy principal component analysis and its application. Biomed Fuzzy Human Sci 3:83–92
Acknowledgments
Support from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Canada Research Chair (CRC) is gratefully acknowledged.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix: Particle Swarm Optimization (PSO)
Appendix: Particle Swarm Optimization (PSO)
In the studies on dimensionality and data reduction, we consider the use of Particle Swarm Optimization (PSO) . PSO is an example of a biologically‐inspired and population‐driven optimization. Originally, the algorithm was introduced by Kennedy and Eberhart [12] where the authors strongly emphasized an inspiration coming from the swarming behavior of animals as well as some inspiration coming from the as well as human social behavior, cf. also [19]. In essence, a particle swarm is a population of particles – possible solutions in the multidimensional search space. Each particle explores the search space and during this search adheres to some quite intuitively appealing guidelines navigating the search process: (a) it tries to follow its previous direction, and (b) it looks back at the best performance both at the level of the individual particle and the entire population. In this sense there is some collective search of the problem space along with some component of memory incorporated as an integral part of the search mechanism.
The performance of each particle during its movement is assessed by means of some performance index. A position of a swarm in the search space, is described by some vector \( { \boldsymbol{z}(t) } \) where “t” denotes consecutive discrete time moments. The next position of the particle is governed by the following update expressions concerning the particle, \( { \boldsymbol{z}(t+1) } \) and its speed, \( { \boldsymbol{v}(t+1) } \)
where \( { \boldsymbol{p} } \) denotes the best position (the lowest performance index) reported so far for this particle, \( { \boldsymbol{p}_\text{total} } \) is the best position overall developed so far across the whole population. ϕ1 and ϕ2 are random number drawn from the uniform distribution \( { U[0,2] } \) that help build a proper mix of the components of the speed; different random numbers affect the individual coordinates of the speed. The second expression governing the change in the velocity of the particle is particularly interesting as it nicely captures the relationships between the particle and its history as well as the history of the overall population in terms of their performance reported so far.
There are three components determining the updated speed of the particle. First, the current speed \( { \boldsymbol{v}(t) } \) is scaled by the inertial weight (ξ) smaller than 1 whose role is to articulate some tendency to a drastic change of the current speed. Second, we relate to the memory of this particle by recalling the best position of the particle achieved so far. Thirdly, there is some reliance on the best performance reported across the whole population (which is captured by the last component of the expression governing the speed adjustment).
Rights and permissions
Copyright information
© 2009 Springer-Verlag
About this entry
Cite this entry
Pedrycz, W. (2009). Data and Dimensionality Reduction in Data Analysis and System Modeling. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30440-3_113
Download citation
DOI: https://doi.org/10.1007/978-0-387-30440-3_113
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-75888-6
Online ISBN: 978-0-387-30440-3
eBook Packages: Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics