Rare Category Exploration on Linear Time Complexity

Liu, Zhenguang; Huang, Hao; He, Qinming; Chiew, Kevin; Gao, Yunjun

doi:10.1007/978-3-319-18123-3_3

Rare Category Exploration on Linear Time Complexity

Zhenguang Liu¹⁷,
Hao Huang^17,18,
Qinming He¹⁷,
Kevin Chiew¹⁹ &
…
Yunjun Gao¹⁷

Conference paper
First Online: 01 January 2015

1762 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9050))

Abstract

Rare Category Exploration (in short as RCE) discovers the remaining data examples of a rare category from a seed. Approaches to this problem often have a high time complexity and are applicable to rare categories with compact and spherical shapes rather than arbitrary shapes. In this paper, we present FREE an effective and efficient RCE solution to explore rare categories of arbitrary shapes on a linear time complexity w.r.t. data set size. FREE firstly decomposes a data set into equal-sized cells, on which it performs wavelet transform and data density analysis to find the coarse shape of a rare category, and refines the coarse shape via an M\(k\)NN based metric. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)
Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38, Washington, DC, USA, August 24–27, 2003
Google Scholar
He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 633–640, Vancouver, British Columbia, Canada, December 3–6, 2007
Google Scholar
He, J., Carbonell, J.: Prior-free rare category detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), pp. 155–163, Sparks, Nevada, USA, April 30-May 2, 2009
Google Scholar
He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 226–235, Sydney, Australia, December 14–17, 2010
Google Scholar
Huang, H., Chiew, K., Gao, Y., He, Q., Li, Q.: Rare category exploration. ESWA 41(9), 4197–4210 (2014)
Google Scholar
Huang, H., He, Q., Chiew, K., Qian, F., Ma, L.: CLOVER: A faster prior-free approach to rarecategory detection. Knowledge and Information Systems 35(3), 713–736 (2013)
Article Google Scholar
Huang, H., He, Q., He, J., Ma, L.: RADAR: Rare category detection via computation of boundary degree. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 258–269. Springer, Heidelberg (2011)
Chapter Google Scholar
Huang, J.Z., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. TPAMI 27(5), 657–668 (2005)
Article Google Scholar
Li, S., Z. Wang, Zhou, G., Lee, S.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1826–1831 (2011)
Google Scholar
Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: More effective and efficient solutions. ESWA 41(17), 7691–7706 (2014)
Google Scholar
Liu, Z., Huang, H., He, Q., Chiew, K., Ma, L.: Rare category detection on \(O(dN)\) timecomplexity. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD 2014), pp. 498–509, Tainan, Taiwan, May 13–16, 2014
Google Scholar
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)
Book MATH Google Scholar
Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal 8(3–4), 289–304 (2000)
Article Google Scholar
Tang, Y., Zhang, Y., Chawla, N., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Transactions on systems, man, and cybernetics 39(1), 281–288 (2009)
Article Google Scholar
Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 847–856, Paris, France, June 28-July 1, 2009
Google Scholar
Wand, M.P.: Data-based choice of histogram bin width. The American Statistician 51(1), 59–64 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Zhenguang Liu, Hao Huang, Qinming He & Yunjun Gao
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Hao Huang
Singapore Branch, Handal Indah Sdn Bhd, Johor Bahru, Malaysia
Kevin Chiew

Authors

Zhenguang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qinming He
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Chiew
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Huang .

Editor information

Editors and Affiliations

Universität München, München, Germany
Matthias Renz
University of Southern California, Los Angeles, USA
Cyrus Shahabi
University of Queensland, Brisbane, Australia
Xiaofang Zhou
Monash University, Clayton, Australia
Muhammad Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Z., Huang, H., He, Q., Chiew, K., Gao, Y. (2015). Rare Category Exploration on Linear Time Complexity. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-18123-3_3
Published: 09 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18122-6
Online ISBN: 978-3-319-18123-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics