Data and Dimensionality Reduction in Data Analysis and System Modeling

Pedrycz, Witold

doi:10.1007/978-0-387-30440-3_113

Data and Dimensionality Reduction in Data Analysis and System Modeling

Witold Pedrycz^2,3

Reference work entry

508 Accesses
2 Citations

Definition of the Subject

Data and dimensionality reduction are fundamental pursuits of data analysis and system modeling. With the rapid growth of sizes of data sets and the diversity of data themselves, the use of some reduction mechanisms becomes a necessity. Data reduction is concerned with a reduction of sizes of data sets in terms of the number of data points. This helps reveal an underlying structure in data by presenting a collection of groups present in data. Given a number of groups which is very limited, the clustering mechanisms become effective in terms of data reduction. Dimensionality reduction is aimed at the reduction of the number of attributes (features) of the data which leads to a typically small subset of features or brings the data from a highly dimensional feature space to a new one of a far lower dimensionality. A joint reduction process involves data and feature reduction.

Introduction

In the information age, we are continuously flooded by enormous amounts of...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 3,499.99; Price excludes VAT (USA)

Hardcover Book: USD 549.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

Reduction process:: A suite of activities leading to the reduction of available data and/or reduction of features.
Feature selection:: An algorithmic process in which a large set of features (attributes) is reduced by choosing a certain relatively small subset of them. The reduction is a combinatorial optimization task which is NP‐complete. Given this, quite often it is realized in a suboptimal way.
Feature transformation:: A process of transforming a highly dimensional feature space into a low‐dimensional counterpart. These transformations are linear or nonlinear and can be guided by some optimization criterion. The commonly encountered methods utilize Principal Component Analysis (PCA) which is an example of a linear feature transformation.
Dimensionality reduction:: A way of converting a large data set into a representative subset. Typically, the data are grouped into clusters whose prototypes are representatives of the overall data set.
Curse of dimensionality:: A phenomenon of a rapid exponential increase of computing related with the dimensionality of the problem (say, the number of data or the number of features) which prevents us from achieving an optimal solution. The curse of dimensionality leads to the construction of sub‐optimal solutions.
Data mining:: A host of activities aimed at discovery of easily interpretable and experimentally sound findings in huge data sets.
Biologically‐inspired optimization:: An array of optimization techniques realizing searches in highly‐dimensional spaces where the search itself is guided by a collection of mechanisms (operators) inspired by biological search processes. Genetic algorithms, evolutionary methods, particle swarm optimization, and ant colonies are examples of biologically‐inspired search techniques.

Bibliography

Primary Literature

Bargiela A, Pedrycz W (2003) Granular Computing: An Introduction. Kluwer, Dordrecht
MATH Google Scholar
Bezdek JC (1992) On the relationship between neural networks, pattern recognition and intelligence. Int J Approx Reason 6(2):85–107
Google Scholar
Duda RO, Hart PE, Stork DF (2001) Pattern Classification, 2nd edn. Wiley, Ney York
MATH Google Scholar
Cheng Y, Church GM (2000) Biclustering of expression data. Proc 8th Int Conf on Intelligent Systems for Molecular Biology, pp 93–103
Google Scholar
Gersho A, Gray RM (1992) Vector Quantization and Signal Compression. Kluwer, Boston
MATH Google Scholar
Gottwald S (2005) Mathematical fuzzy logic as a tool for the treatment of vague information. Inf Sci 172(1–2):41–71
MathSciNet MATH Google Scholar
Gray RM (1984) Vector quantization. IEEE Acoust Speech Signal Process 1:4–29
Google Scholar
Hansen E (1975) A generalized interval arithmetic. Lecture Notes in Computer Science, vol 29. Springer, Berlin, pp 7–18
Google Scholar
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67:123–129
Google Scholar
Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied Interval Analysis. Springer, London
MATH Google Scholar
Jolliffe IT (1986) Principal Component Analysis. Springer, New York
Google Scholar
Kennedy J, Eberhart RC (1995) Particle swarm optimization, vol 4. Proc IEEE Int Conf on Neural Networks. IEEE Press, Piscataway, pp 1942–1948
Google Scholar
Kohonen T (1989) Self Organization and Associative Memory, 3rd edn. Springer, Berlin
Google Scholar
Mitchell M (1996) An Introduction to Genetic Algorithms. MIT Press, Cambridge
Google Scholar
Moore R (1966) Interval Analysis. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356
MathSciNet MATH Google Scholar
Pawlak Z (1991) Rough Sets. Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht
MATH Google Scholar
Pawlak Z, Skowron A (2007) Rough sets: some extensions. Inf Sci 177(1):28–40
MathSciNet MATH Google Scholar
Pedrycz W (ed) (2001) Granular Computing: An Emerging Paradigm. Physica, Heidelberg
MATH Google Scholar
Pedrycz W (2005) Knowledge‐based Clustering. Wiley, Hoboken
MATH Google Scholar
Warmus M (1956) Calculus of approximations. Bull Acad Pol Sci 4(5):253–259
MathSciNet MATH Google Scholar
Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta MM, Ragade RK, Yager RR (eds) Advances in Fuzzy Set Theory and Applications. North Holland, Amsterdam, pp 3–18
Google Scholar
Zadeh LA (1996) Fuzzy logic = Computing with words. IEEE Trans Fuzzy Syst 4:103–111
Google Scholar
Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90:111–117
MathSciNet MATH Google Scholar
Zadeh LA (1999) From computing with numbers to computing with words-from manipulation of measurements to manipulation of perceptions. IEEE Trans Circ Syst 45:105–119
MathSciNet Google Scholar
Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU) – an outline. Inf Sci 172:1–40
MathSciNet MATH ADS Google Scholar
Zimmermann HJ (1996) Fuzzy Set Theory and Its Applications, 3rd edn. Kluwer, Norwell
MATH Google Scholar

Books and Reviews

Baldi P, Hornik K (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2:53–58
Google Scholar
Bortolan G, Pedrycz W (2002) Fuzzy descriptive models: an interactive framework of information granulation. IEEE Trans Fuzzy Syst 10(6):743–755
Google Scholar
Busygin S, Prokopyev O, Pardalos PM (2008) Biclustering in data mining. Comput Oper Res 35(9):2964–2987
MathSciNet MATH Google Scholar
Fukunaga K (1990) Introduction to Statistical Pattern Recognition. Academic Press, Boston
MATH Google Scholar
Gifi A (1990) Nonlinear Multivariate Analysis. Wiley, Chichester
MATH Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
MATH Google Scholar
Lauro C, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15:73–87
MATH Google Scholar
Manly BF, Bryan FJ (1986) Multivariate Statistical Methods: A Primer. Chapman and Hall, London
Google Scholar
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Google Scholar
Monahan AH (2000) Nonlinear principal component analysis by neural networks: Theory and applications to the Lorenz system. J Clim 13:821–835
ADS Google Scholar
Muni DP, Das Pal NR (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B 36(1):106–117
Google Scholar
Pawlak Z (1991) Rough Sets – Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht
MATH Google Scholar
Pedrycz W, Vukovich G (2002) Feature analysis through information granulation and fuzzy sets. Pattern Recognit 35:825–834
MATH Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(1):2323–2326
ADS Google Scholar
Setiono R, Liu H (1977) Neural‐network feature selector, IEEE Trans. Neural Netw 8(3):654–662
Google Scholar
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(1):2319–2323
ADS Google Scholar
Watada J, Yabuuchi Y (1997) Fuzzy principal component analysis and its application. Biomed Fuzzy Human Sci 3:83–92
Google Scholar

Download references

Acknowledgments

Support from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Canada Research Chair (CRC) is gratefully acknowledged.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
Witold Pedrycz
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Witold Pedrycz

Authors

Witold Pedrycz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RAMTECH LIMITED, 122 Escalle Lane, Larkspur, CA, 94939, USA
Robert A. Meyers Ph. D. (Editor-in-Chief) (Editor-in-Chief)

Appendix: Particle Swarm Optimization (PSO)

In the studies on dimensionality and data reduction, we consider the use of Particle Swarm Optimization (PSO) . PSO is an example of a biologically‐inspired and population‐driven optimization. Originally, the algorithm was introduced by Kennedy and Eberhart [12] where the authors strongly emphasized an inspiration coming from the swarming behavior of animals as well as some inspiration coming from the as well as human social behavior, cf. also [19]. In essence, a particle swarm is a population of particles – possible solutions in the multidimensional search space. Each particle explores the search space and during this search adheres to some quite intuitively appealing guidelines navigating the search process: (a) it tries to follow its previous direction, and (b) it looks back at the best performance both at the level of the individual particle and the entire population. In this sense there is some collective search of the problem space along with some component of memory incorporated as an integral part of the search mechanism.

The performance of each particle during its movement is assessed by means of some performance index. A position of a swarm in the search space, is described by some vector $ { \boldsymbol{z}(t) } $ where “t” denotes consecutive discrete time moments. The next position of the particle is governed by the following update expressions concerning the particle, $ { \boldsymbol{z}(t+1) } $ and its speed, $ { \boldsymbol{v}(t+1) } $

$$ \begin{aligned} & \boldsymbol{z}(t+1) = \boldsymbol{z}(t) + \boldsymbol{v}(t+1)\\ & \qquad\qquad\qquad\quad \text{// update of position of the particle}\\ & \boldsymbol{v}(t+1) = \xi \boldsymbol{v}(t) + \phi_{1}(\boldsymbol{p}- \boldsymbol{x}(t)) + \phi_{2} (\boldsymbol{p}_\mathrm{total}-\boldsymbol{x}(t))\\ & \qquad\qquad\qquad\quad \text{// update of speed of the particle} \end{aligned} $$

where $ { \boldsymbol{p} } $ denotes the best position (the lowest performance index) reported so far for this particle, $ { \boldsymbol{p}_\text{total} } $ is the best position overall developed so far across the whole population. ϕ₁ and ϕ₂ are random number drawn from the uniform distribution $ { U[0,2] } $ that help build a proper mix of the components of the speed; different random numbers affect the individual coordinates of the speed. The second expression governing the change in the velocity of the particle is particularly interesting as it nicely captures the relationships between the particle and its history as well as the history of the overall population in terms of their performance reported so far.

There are three components determining the updated speed of the particle. First, the current speed $ { \boldsymbol{v}(t) } $ is scaled by the inertial weight (ξ) smaller than 1 whose role is to articulate some tendency to a drastic change of the current speed. Second, we relate to the memory of this particle by recalling the best position of the particle achieved so far. Thirdly, there is some reliance on the best performance reported across the whole population (which is captured by the last component of the expression governing the speed adjustment).

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Pedrycz, W. (2009). Data and Dimensionality Reduction in Data Analysis and System Modeling. In: Meyers, R. (eds) Encyclopedia of Complexity and Systems Science. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30440-3_113

Download citation

DOI: https://doi.org/10.1007/978-0-387-30440-3_113
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-75888-6
Online ISBN: 978-0-387-30440-3
eBook Packages: Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics

Publish with us

Policies and ethics

Definition of the Subject

Introduction

Buying options

Abbreviations

Bibliography

Primary Literature

Books and Reviews

Acknowledgments

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Appendix: Particle Swarm Optimization (PSO)

Appendix: Particle Swarm Optimization (PSO)

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation