Data preprocessing and intelligent data analysis

https://doi.org/10.1016/S1088-467X(98)00007-9Get rights and content

Abstract

This paper first provides an overview of data preprocessing, focusing on problems of real world data. These are primarily problems that have to be carefully understood and solved before any data analysis process can start. The paper discusses in detail two main reasons for performing data preprocessing: (i) problems with the data and (ii) preparation for data analysis. The paper continues with details of data preprocessing techniques achieving each of the above mentioned objectives. A total of 14 techniques are discussed. Two examples of data preprocessing applications from two of the most data rich domains are given at the end. The applications are related to semiconductor manufacturing and aerospace domains where large amounts of data are available, and they are fairly reliable. Future directions and some challenges are discussed at the end.

References (57)

  • E. Bontrager

    GAIT-ER-AID: An Expert System for Analysis of Gait with Automatic Intelligent Preprocessing of Data

  • R.L. Chen et al.

    Statistical Data Pre-processing for Fuzzy Modelling of Semiconductor Manufacturing Process

  • J.J. Clark
  • C. Cortes et al.

    Limits on Learning Machine Accuracy Imposed by Data Quality

  • V.G. Dabija

    Learning to Learn Decision Trees

  • J.F. Davis et al.

    Process Monitoring, Data Analysis and Data Interpretation

  • J. DeWitt

    Adaptive Filtering Network for Associative Memory Data Preprocessing

  • Z. Duszak et al.

    Using Principal Component Transformation in Machine Learning

  • W. Emde et al.

    The Discovery of the Equator or Concept Driven Learning

  • A. Famili et al.

    Intelligently Helping Human Planner in Industrial Process Planning

    AIEDAM

    (1991)
  • U. Fayyad et al.

    From Data Mining to Knowledge Discovery

  • L. Hesselink

    Research Issues in Vector and Tensor Field Visualization

  • S. Iwasaki

    Clustering of Experimental Data and its Applications to Nuclear Data Evaluation

  • M. Ke et al.

    MLS, A Machine Learning System for Engine Fault Diagnosis

  • P.M. Kelly et al.

    Preprocessing Remotely-Sensed Data for Efficient Analysis and Classification, Applications of Artificial Intelligence

  • P. Langley et al.

    Rediscovering Chemistry with the Bacon System

  • N. Lavrac et al.

    Cost-Sensitive Feature Reduction Applied to a Hybrid Genetic Algorithm

  • C. Matheus

    Feature Construction: An Analytic Framework and Application to Decision Trees

  • Cited by (304)

    • Physics-driven neural networks for nonlinear micromechanics

      2024, International Journal of Mechanical Sciences
    View all citing articles on Scopus
    View full text