Improving SVM classification on imbalanced time series data sets with ghost points

Köknar-Tezel, Suzan; Latecki, Longin Jan

doi:10.1007/s10115-010-0310-3

Improving SVM classification on imbalanced time series data sets with ghost points

Regular Paper
Published: 16 June 2010

Volume 28, pages 1–23, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Suzan Köknar-Tezel¹ &
Longin Jan Latecki¹

681 Accesses
41 Citations
3 Altmetric
Explore all metrics

Abstract

Imbalanced data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. However, distances between time series are known to be non-Euclidean and non-metric, since comparing time series requires warping in time. This fact makes it impossible to apply standard methods like SMOTE to insert synthetic data points in feature spaces. We present an innovative approach that augments the minority class by adding synthetic points in distance spaces. We then use Support Vector Machines for classification. Our experimental results on standard time series show that our synthetic points significantly improve the classification rate of the rare events, and in most cases also improves the overall accuracy of SVMs. We also show how adding our synthetic points can aid in the visualization of time series data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Aach J, Church GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17: 495–508
Article Google Scholar
Aizerman MA, Braverman EA, Rozonoer L (1964) Theoretical foundations of the potential function method in pattern recognition learning. In: Automation and remote control, vol 25, pp 821–837
Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of ECML’04, pp 39–50
Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1): 20–29
Article Google Scholar
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp 359–370
Bishop CM (2007) Pattern recognition and machine learning (Information Science and Statistics), 1st ed. 2006. corr. 2nd printing edn, Springer
Chan P, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: In Proceedings of the fourth international conference on knowledge discovery and data mining. AAAI Press, pp 164–168
Chawla NV, Bowyer KW, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16: 321–357
MATH Google Scholar
Chawla NV, Lazarevic A, Hall LO , Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: Proceedings of the principles of knowledge discovery in databases, PKDD-2003, pp 107–119
Cieslak DA, Chawla NV (2008) Start globally, optimize locally, predict globally: improving performance on imbalanced data. In: ‘ICDM’08: Proceedings of the 2008 eighth IEEE international conference on data mining’, IEEE Computer Society, Washington, DC, USA, pp 143–152
Georgiou C, Hatami H (2008) CSC2414- Metric embeddings. Lecture 1: A brief introduction to metric embeddings, examples and motivation’
Giorgino T (2009) Computing and visualizing dynamic time warping alignments in R: the dtw package. Journal of Statistical Software 31(7): 1–24
Google Scholar
Han H, Wang W, Mao B (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning, vol 3644 of Lecture Notes in Computer Science, Springer, pp 878–887
Hovsepian K, Anselmo P, Mazumdar S (2010) Supervised inductive learning with LotkaVolterra derived models. Knowl Inf Syst
Keogh E, Xi X, Wei L, Ratanamahatana CA (2006) Ucr time series classification/clustering page, Website. http://www.cs.ucr.edu/~eamonn/time_series_data/
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Machine Learning 30(2–3): 195–215
Article Google Scholar
Latecki LJ, Wang Q, Köknar-Tezel S, Megalooikonomou V (2007) Optimal subsequence bijection. IEEE International conference on data Mining, pp 565–570
Latecki LJ, Lakaemper R, Eckhardt U (2000) Shape descriptors for non-rigid shapes with a single closed contour. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 424–429
Laub J, Müller K-R (2004) Feature discovery in non-metric pairwise data. J Mach Learn Res 5: 801–818
Google Scholar
Matousek J (2002) Lectures on Discrete Geometry. Springer-Verlag New York Inc., Secaucus, NJ, USA
MATH Google Scholar
Mena L, Gonzalez J (2006) Machine learning for imbalanced datasets: Application in medical diagnostic. In: In proceedings of the 19th international FLAIRS conference
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290: 2323–2326
Article Google Scholar
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26: 43–49
Article MATH Google Scholar
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290: 2319–2323
Article Google Scholar
Tufte ER (2001) The visual display of quantitative information, 2nd edition. Graphics Press, Cheshire, CT
Google Scholar
van Rijsbergen C (1979) In: Information retrieval. Butterworths, London
Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag New York Inc., New York, NY, USA
MATH Google Scholar
Wang BX, Japkowicz N (2009) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst
Weber M, Alexa M, Müller W (2001) Visualizing time-series on spirals, In: Proceedings of the IEEE symposium on information visualization 2001 (INFOVIS’01), pp 7–14
Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1): 7–19
Article Google Scholar
Weiss GM, Hirsh H (1998) Learning to predict rare events in event sequences. In: In Proceedings of the fourth international conference on knowledge discovery and data mining, AAAI Press, pp 359–363
Weiss GM, Provost F (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19: 315–354
MATH Google Scholar
Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer P (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. Int J Pattern Recognit Artif Intell 7: 1417–1436
Article Google Scholar
Wu G, Chang EY (2003) Class-boundary alignment for imbalanced dataset learning. In: Workshop on learning from imbalanced datasets in international conference on machine learning (ICML)
Yang X, Bai X, Latecki LJ, Tu Z (2008) Improving shape retrieval by learning graph transduction. In: ‘ECCV (4)’, Vol 5305 of Lecture Notes in Computer Science, Springer, pp 788–801
Yi BK, Jagadish HV, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of international conference on data engineering (ICDE98), pp 201–208
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Advances in neural information processing systems 17. MIT Press, pp 1601–1608
Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15(3): 321–334
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
Suzan Köknar-Tezel & Longin Jan Latecki

Authors

Suzan Köknar-Tezel
View author publications
You can also search for this author in PubMed Google Scholar
Longin Jan Latecki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suzan Köknar-Tezel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Köknar-Tezel, S., Latecki, L.J. Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 28, 1–23 (2011). https://doi.org/10.1007/s10115-010-0310-3

Download citation

Received: 07 March 2010
Revised: 07 April 2010
Accepted: 15 May 2010
Published: 16 June 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10115-010-0310-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Improving SVM classification on imbalanced time series data sets with ghost points

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

An end-to-end machine learning approach with explanation for time series with varying lengths

Data Augmentation techniques in time series domain: a survey and taxonomy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving SVM classification on imbalanced time series data sets with ghost points

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

An end-to-end machine learning approach with explanation for time series with varying lengths

Data Augmentation techniques in time series domain: a survey and taxonomy

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation