Abstract
Under study is a strongly NP-hard problem of finding a subset of a given size of a finite set of vectors in Euclidean space which minimizes the sum of squared distances from the elements of this subset to its center. The center of the subset is defined as the average vector calculated with all subset elements. It is proved that, unless P=NP, in the general case of the problem there is no fully polynomial time approximation scheme (FPTAS). Such a scheme is provided in the case when the dimension of the space is fixed.
Similar content being viewed by others
References
A. V. Kel’manov and A. V. Pyatkin, “NP-Completeness of Some Problems of Choosing a Vector Subset,” Diskretn. Anal. Issled. Oper. 17(5), 37–45 (2010) [J. Appl. Indust. Math. 5 (3), 352–357 (2011)].
A. V. Kel’manov and S. M. Romanchenko, “An Approximation Algorithm for Solving a Problem of Search for a Vector Subset,” Diskretn. Anal. Issled. Oper. 18(1), 61–69 (2011) [J. Appl. Indust. Math. 6 (1), 90–96 (2012)].
A. V. Kel’manov and S. M. Romanchenko, “Pseudopolynomial Algorithms for Certain Computationally Hard Vector Subset and Cluster Analysis Problems,” Avtomat. i Telemekh. No 2, 156–162 (2012) [Automat. Remote Control 73 2), 349–354 (2012)].
V. V. Shenmaier, “An Approximation Scheme for a Problem of Search for a Vector Subset,” Diskretn. Anal. Issled. Oper. 19(2), 92–100 (2019) [J. Appl. Indust. Math. 6 (3), 381–386 (2012)].
D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP-Hardness of Euclidean Sum-of-Squares Clustering,” Les Cahiers du GERAD, G-2008-33 (2008) [Machine Learning, 75 (2), 245–248 (2009)].
K. Anil and K. Jain, “Data Clustering: 50 Years Beyond k-Means,” Pattern Recognit. Lett. 31, 651–666 (2010).
M. R. Garey and D. S. Johnson, Computers and Intractability: a Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979).
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: DataMining, Inference, and Prediction (Springer, New York, 2001).
J. B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proceedings of the 5th Berkeley Symposium on Mathematics, Statistics, and Probability (Berkeley, June 21–July 18, 1965; December 27, 1965–January 7, 1966), Vol. 1 (Univ. of California Press, Berkeley, 1967), pp. 281–297.
C. H. Papadimitriou, Computational Complexity (Addison-Wesley, New York, 1994).
M. Rao, “Cluster Analysis and Mathematical Programming,” J. Amer. Stat. Assoc. 66, 622–626 (1971).
H. Wirth, Algorithms + Data Structures = Programs (Prentice Hall, New Jersey, 1976).
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.V. Kel’manov, S. M. Romanchenko, 2014, published in Diskretnyi Analiz i Issledovanie Operatsii, 2014, Vol. 21, No. 3, pp. 41–52.
Rights and permissions
About this article
Cite this article
Kel’manov, A.V., Romanchenko, S.M. An FPTAS for a vector subset search problem. J. Appl. Ind. Math. 8, 329–336 (2014). https://doi.org/10.1134/S1990478914030041
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1990478914030041