Data Reduction

García, Salvador; Luengo, Julián; Herrera, Francisco

doi:10.1007/978-3-319-10247-4_6

Salvador García⁶,
Julián Luengo⁷ &
Francisco Herrera⁸

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 72))

9992 Accesses
1 Citations

Abstract

The most common tasks for data reduction carried out in Data Mining consist of removing or grouping the data through the two main dimensions, examples and attributes; and simplifying the domain of the data. A global overview to this respect is given in Sect. 6.1. One of the well-known problems in Data Mining is the “curse of dimensionality”, related with the usual high amount of attributes in data. Section 6.2 deals with this problem. Data sampling and data simplification are introduced in Sects. 6.3 and 6.4, respectively, providing the basic notions on these topics for further analysis and explanation in subsequent chapters of the book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C., Reddy, C.: Data clustering: recent advances and applications. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series. Taylor & Francis Group, Boca Raton (2013)
Google Scholar
Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, New York (2014)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Bellman, R.E.: Adaptive control processes—a guided tour. Princeton University Press, Princeton (1961)
MATH Google Scholar
Chatfield, C., Collins, A.J.: Introduction to Multivariate Analysis. Chapman and Hall, London (1980)
Book MATH Google Scholar
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., Pregibon, D.: Squashing flat files flatter. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’99, pp. 6–15 (1999)
Google Scholar
Dunteman, G.: Principal Components Analysis. SAGE Publications, Newbury Park (1989)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press Professional, Inc., San Diego (1990)
MATH Google Scholar
Gan, G., Ma, C., Wu, J.: Data Clustering—Theory, Algorithms, and Applications. SIAM, Philadelphia (2007)
Book MATH Google Scholar
Girolami, M., He, C.: Probability density estimation from optimally condensed data samples. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1253–1264 (2003)
Article Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Google Scholar
Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)
Article Google Scholar
Hwang, J., Lay, S., Lippman, A.: Nonparametric multivariate density estimation: a comparative study. IEEE Trans. Signal Process. 42, 2795–2810 (1994)
Article Google Scholar
Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice-Hall, Englewood Cliffs (2001)
Google Scholar
Kim, J.O., Mueller, C.W.: Factor Analysis: Statistical Methods and Practical Issues (Quantitative Applications in the Social Sciences). Sage Publications, Inc, Beverly Hills (1978)
Google Scholar
Kohonen, T.: The self organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Article Google Scholar
Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., Ridgeway, G.: Likelihood-based data squashing: a modeling approach to instance construction. Data Min. Knowl. Disc. 6(2), 173–190 (2002)
Article MATH MathSciNet Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Density-based multiscale data condensation. IEEE Trans. Pattern Anal. Mach. Intell. 24(6), 734–747 (2002)
Article Google Scholar
Nisbet, R., Elder, J., Miner, G.: Handbook of Statistical Analysis and Data Mining Applications. Academic Press, Boston (2009)
MATH Google Scholar
Owen, A.: Data squashing by empirical likelihood. Data Min. Knowl. Disc. 7, 101–113 (2003)
Article MathSciNet Google Scholar
Refaat, M.: Data Preparation for Data Mining Using SAS. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Google Scholar
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3), 645–678 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Jaén, Jaén, Spain
Salvador García
Department of Civil Engineering, University of Burgos, Burgos, Spain
Julián Luengo
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Francisco Herrera

Authors

Salvador García
View author publications
You can also search for this author in PubMed Google Scholar
Julián Luengo
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvador García .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

García, S., Luengo, J., Herrera, F. (2015). Data Reduction. In: Data Preprocessing in Data Mining. Intelligent Systems Reference Library, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-319-10247-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-10247-4_6
Published: 31 August 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10246-7
Online ISBN: 978-3-319-10247-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics