Abstract
Rare Category Exploration (in short as RCE) discovers the remaining data examples of a rare category from a seed. Approaches to this problem often have a high time complexity and are applicable to rare categories with compact and spherical shapes rather than arbitrary shapes. In this paper, we present FREE an effective and efficient RCE solution to explore rare categories of arbitrary shapes on a linear time complexity w.r.t. data set size. FREE firstly decomposes a data set into equal-sized cells, on which it performs wavelet transform and data density analysis to find the coarse shape of a rare category, and refines the coarse shape via an M\(k\)NN based metric. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38, Washington, DC, USA, August 24–27, 2003
He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 633–640, Vancouver, British Columbia, Canada, December 3–6, 2007
He, J., Carbonell, J.: Prior-free rare category detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), pp. 155–163, Sparks, Nevada, USA, April 30-May 2, 2009
He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 226–235, Sydney, Australia, December 14–17, 2010
Huang, H., Chiew, K., Gao, Y., He, Q., Li, Q.: Rare category exploration. ESWA 41(9), 4197–4210 (2014)
Huang, H., He, Q., Chiew, K., Qian, F., Ma, L.: CLOVER: A faster prior-free approach to rarecategory detection. Knowledge and Information Systems 35(3), 713–736 (2013)
Huang, H., He, Q., He, J., Ma, L.: RADAR: Rare category detection via computation of boundary degree. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 258–269. Springer, Heidelberg (2011)
Huang, J.Z., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. TPAMI 27(5), 657–668 (2005)
Li, S., Z. Wang, Zhou, G., Lee, S.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1826–1831 (2011)
Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: More effective and efficient solutions. ESWA 41(17), 7691–7706 (2014)
Liu, Z., Huang, H., He, Q., Chiew, K., Ma, L.: Rare category detection on \(O(dN)\) timecomplexity. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD 2014), pp. 498–509, Tainan, Taiwan, May 13–16, 2014
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal 8(3–4), 289–304 (2000)
Tang, Y., Zhang, Y., Chawla, N., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Transactions on systems, man, and cybernetics 39(1), 281–288 (2009)
Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 847–856, Paris, France, June 28-July 1, 2009
Wand, M.P.: Data-based choice of histogram bin width. The American Statistician 51(1), 59–64 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, Z., Huang, H., He, Q., Chiew, K., Gao, Y. (2015). Rare Category Exploration on Linear Time Complexity. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-18123-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18122-6
Online ISBN: 978-3-319-18123-3
eBook Packages: Computer ScienceComputer Science (R0)