Skip to main content

Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Platform

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9622))

Included in the following conference series:

  • 1494 Accesses

Abstract

We present a platform named Redhyte, short for an interactive platform for “Rapid exploration of data and hypothesis testing”. Redhyte aims to augment the conventional statistical hypothesis testing framework with data-mining techniques in a bid for more wholesome and efficient hypothesis testing. The platform is self-diagnosing (it can detect whether the user is doing a valid statistical test), self-correcting (it can propose and make corrections to the user’s statistical test), and helpful (it can search for promising or interesting hypotheses related to the initial user-specified hypothesis). In Redhyte, hypothesis mining consists of several steps: context mining, mined-hypothesis formulation, mined-hypothesis scoring on interestingness, and statistical adjustments. To capture and evaluate specific aspects of interestingness, we developed and implemented various hypothesis-mining metrics. Redhyte is an R shiny web application and can be found online at https://tohweizhong.shinyapps.io/redhyte, and the source codes are housed in a GitHub repository at https://github.com/tohweizhong/redhyte.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bickel, P., Hammel, E., O’connell, J.: Sex bias in graduate admissions: data from Berkeley. Sci. 187, 398–404 (1975)

    Article  Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Cochran, W.G.: Some methods for strengthening the common \(\chi ^2\) tests. Biometrics 10, 417–451 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cox, D.R.: The regression analysis of binary sequences (with discussion). J. R. Stat. Soc. B 20, 215–242 (1958)

    MATH  Google Scholar 

  5. Fisher, R.A.: On a distribution yielding the error functions of several well-known statistics. Proc. Int. Congr. Math. 2, 805–813 (1924)

    Google Scholar 

  6. Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  7. Gosset, W.S.: The probable error of a mean. Biometrika 6, 1–25 (1908)

    Article  Google Scholar 

  8. Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2, e124 (2005)

    Article  MathSciNet  Google Scholar 

  9. Liu, G., Suchitra, A., Zhang, H., Feng, M., Ng, S.K., Wong, L.: AssocExplorer: an association rule visualization system for exploratory data analysis. In: Proceedings of 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1536–1539 (2012)

    Google Scholar 

  10. Liu, G., Zhang, H., Wong, L.: A flexible approach to finding representative pattern sets. IEEE Trans. Knowl. Data Eng. 26, 1562–1574 (2014)

    Article  Google Scholar 

  11. Liu, G., Zhang, H., Feng, M., Wong, L., Ng, S.K.: Supporting exploratory hypothesis testing and analysis. ACM Trans. Knowl. Discov. Data 9, Article 31 (2015)

    Google Scholar 

  12. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)

    Article  MathSciNet  MATH  Google Scholar 

  13. Pavlides, M., Perlman, M.: How likely is Simpson’s paradox? Am. Stat. 63, 226–233 (2009)

    Article  MathSciNet  Google Scholar 

  14. Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5(50), 157–175 (1900)

    Article  MATH  Google Scholar 

  15. Poernomo, A.K., Gopalkrishnan, V.: CP-summary: a concise representation for browsing frequent itemsets. In: Proceedings of 12th ACM SIGKDD International Conference on Knowlegde Discovery and Data Mining, pp. 687–696 (2009)

    Google Scholar 

  16. Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  17. Simpson, E.H.: The interpretation of interaction in contingency tables. J. R. Stat. Soc. B 13, 238–241 (1951)

    MathSciNet  MATH  Google Scholar 

  18. Toh, W.Z.: Redhyte: an interactuve platform for rapid exploration of data and hypothesis testing. Project report, National University of Singapore (2015). http://www.comp.nus.edu.sg/wongls/psZ/tohweizhong-fyp2015.pdf

  19. Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 730–735 (2006)

    Google Scholar 

  20. West, M.: Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat. 7, 723–732 (2003)

    Google Scholar 

  21. Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Proceedings of 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 314–323 (2005)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by a Singapore Ministry of Education tier-2 grant (MOE2012-T2-1-061) and by NCS Pte Ltd, Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limsoon Wong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Toh, W.Z., Choi, K.P., Wong, L. (2016). Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Platform. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9622. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49390-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49390-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49389-2

  • Online ISBN: 978-3-662-49390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics