Skip to main content

Rare Category Exploration on Linear Time Complexity

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9050))

Abstract

Rare Category Exploration (in short as RCE) discovers the remaining data examples of a rare category from a seed. Approaches to this problem often have a high time complexity and are applicable to rare categories with compact and spherical shapes rather than arbitrary shapes. In this paper, we present FREE an effective and efficient RCE solution to explore rare categories of arbitrary shapes on a linear time complexity w.r.t. data set size. FREE firstly decomposes a data set into equal-sized cells, on which it performs wavelet transform and data density analysis to find the coarse shape of a rare category, and refines the coarse shape via an M\(k\)NN based metric. Experimental results on both synthetic and real data sets verify the effectiveness and efficiency of our approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)

    Google Scholar 

  2. Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: KDD, pp. 29–38, Washington, DC, USA, August 24–27, 2003

    Google Scholar 

  3. He, J., Carbonell, J.: Nearest-neighbor-based active learning for rare category detection. In: Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 633–640, Vancouver, British Columbia, Canada, December 3–6, 2007

    Google Scholar 

  4. He, J., Carbonell, J.: Prior-free rare category detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009), pp. 155–163, Sparks, Nevada, USA, April 30-May 2, 2009

    Google Scholar 

  5. He, J., Tong, H., Carbonell, J.: Rare category characterization. In: The 10th IEEE International Conference on Data Mining (ICDM 2010), pp. 226–235, Sydney, Australia, December 14–17, 2010

    Google Scholar 

  6. Huang, H., Chiew, K., Gao, Y., He, Q., Li, Q.: Rare category exploration. ESWA 41(9), 4197–4210 (2014)

    Google Scholar 

  7. Huang, H., He, Q., Chiew, K., Qian, F., Ma, L.: CLOVER: A faster prior-free approach to rarecategory detection. Knowledge and Information Systems 35(3), 713–736 (2013)

    Article  Google Scholar 

  8. Huang, H., He, Q., He, J., Ma, L.: RADAR: Rare category detection via computation of boundary degree. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 258–269. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Huang, J.Z., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. TPAMI 27(5), 657–668 (2005)

    Article  Google Scholar 

  10. Li, S., Z. Wang, Zhou, G., Lee, S.: Semi-supervised learning for imbalanced sentiment classification. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 1826–1831 (2011)

    Google Scholar 

  11. Liu, Z., Chiew, K., He, Q., Huang, H., Huang, B.: Prior-free rare category detection: More effective and efficient solutions. ESWA 41(17), 7691–7706 (2014)

    Google Scholar 

  12. Liu, Z., Huang, H., He, Q., Chiew, K., Ma, L.: Rare category detection on \(O(dN)\) timecomplexity. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD 2014), pp. 498–509, Tainan, Taiwan, May 13–16, 2014

    Google Scholar 

  13. Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, New York (1992)

    Book  MATH  Google Scholar 

  14. Sheikholeslami, G., Chatterjee, S., Zhang, A.: Wavecluster: A wavelet-based clustering approach for spatial data in very large databases. The VLDB Journal 8(3–4), 289–304 (2000)

    Article  Google Scholar 

  15. Tang, Y., Zhang, Y., Chawla, N., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Transactions on systems, man, and cybernetics 39(1), 281–288 (2009)

    Article  Google Scholar 

  16. Vatturi, P., Wong, W.: Category detection using hierarchical mean shift. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 847–856, Paris, France, June 28-July 1, 2009

    Google Scholar 

  17. Wand, M.P.: Data-based choice of histogram bin width. The American Statistician 51(1), 59–64 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, Z., Huang, H., He, Q., Chiew, K., Gao, Y. (2015). Rare Category Exploration on Linear Time Complexity. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18123-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18122-6

  • Online ISBN: 978-3-319-18123-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics