Abstract
K-Nearest Neighbors (KNN) is a crucial tool for many applications, e.g. recommender systems, image classification and web-related applications. However, KNN is a resource greedy operation particularly for large datasets. We focus on the challenge of KNN computation over large datasets on a single commodity PC with limited memory. We propose a novel approach to compute KNN on large datasets by leveraging both disk and main memory efficiently. The main rationale of our approach is to minimize random accesses to disk, maximize sequential accesses to data and efficient usage of only the available memory.
We evaluate our approach on large datasets, in terms of performance and memory consumption. The evaluation shows that our approach requires only 7 % of the time needed by an in-memory baseline to compute a KNN graph.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term ‘pons’ is Latin for ‘bridge’.
- 2.
Twitter dataset: http://konect.uni-koblenz.de/networks/twitter_mpi.
References
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML (2006)
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR (2008)
Boutet, A., Frey, D., Guerraoui, R., Jegou, A., Kermarrec, A.M.: WHATSUP: a decentralized instant news recommender. In: IPDPS (2013)
Boutet, A., Frey, D., Guerraoui, R., Jegou, A., Kermarrec, A.M.: Privacy-preserving distributed collaborative filtering. In: Noubir, G., Raynal, M. (eds.) Networked Systems. LNCS, vol. 8593, pp. 169–184. Springer, Heidelberg (2014)
Boutet, A., Frey, D., Guerraoui, R., Kermarrec, A.M., Patra, R.: HyRec: Leveraging browsers for scalable recommenders. In: Middleware (2014)
Chen, J., Fang, H.R., Saad, Y.: Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009)
Chiluka, N., Kermarrec, A.M., Olivares, J.: Scaling KNN computation over large graphs on a PC. In: Middleware (2014)
Debatty, T., Michiardi, P., Thonnard, O., Mees, W.: Building k-nn graphs from large text data. In: Big Data (2014)
Dong, W., Moses, C., Li, K.: Efficient k-nearest neighbor graph construction for generic similarity measures. In: WWW (2011)
Fukunaga, K., Narendra, P.M.: A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. C–24(7), 750–753 (1975)
Han, W.S., Lee, S., Park, K., Lee, J.H., Kim, M.S., Kim, J., Yu, H.: TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: SIGKDD (2013)
Jégou, H., Tavenard, R., Douze, M., Amsaleg, L.: Searching in one billion vectors: re-rank with source coding. In: ICASSP (2011)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: SIGMOD, vol. 26, pp. 369–380. ACM (1997)
Kyrola, A., Blelloch, G.E., Guestrin, C.: GraphChi: Large-scale graph computation on just a PC. In: OSDI (2012)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection (2014). http://snap.stanford.edu/data
Lin, Z., Kahng, M., Sabrin, K., Chau, D., Lee, H., Kang, U.: MMAP: fast billion-scale graph computation on a PC via memory mapping. In: Big Data (2014)
McRoberts, R.E., Nelson, M.D., Wendt, D.G.: Stratified estimation of forest area using satellite imagery, inventory data, and the k-nearest neighbors technique. Remote Sens. Environ. 82(2), 457–468 (2002)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: SIGMOD (1995)
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: SOSP (2013)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Wong, W.K., Cheung, D.W.l., Kao, B., Mamoulis, N.: Secure KNN computation on encrypted databases. In: SIGMOD (2009)
Zhu, X., Han, W., Chen, W.: GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In: USENIX ATC (2015)
Acknowledgments
This work was partially funded by Conicyt/Beca Doctorado en el Extranjero Folio 72140173 and Google Focused Award Web Alter-Ego.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Chiluka, N., Kermarrec, AM., Olivares, J. (2016). The Out-of-core KNN Awakens:. In: Abdulla, P., Delporte-Gallet, C. (eds) Networked Systems. NETYS 2016. Lecture Notes in Computer Science(), vol 9944. Springer, Cham. https://doi.org/10.1007/978-3-319-46140-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-46140-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46139-7
Online ISBN: 978-3-319-46140-3
eBook Packages: Computer ScienceComputer Science (R0)