Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Platform

Toh, Wei Zhong; Choi, Kwok Pui; Wong, Limsoon

doi:10.1007/978-3-662-49390-8_1

Wei Zhong Toh^8,9,
Kwok Pui Choi⁸ &
Limsoon Wong⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9622))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1494 Accesses

Abstract

We present a platform named Redhyte, short for an interactive platform for “Rapid exploration of data and hypothesis testing”. Redhyte aims to augment the conventional statistical hypothesis testing framework with data-mining techniques in a bid for more wholesome and efficient hypothesis testing. The platform is self-diagnosing (it can detect whether the user is doing a valid statistical test), self-correcting (it can propose and make corrections to the user’s statistical test), and helpful (it can search for promising or interesting hypotheses related to the initial user-specified hypothesis). In Redhyte, hypothesis mining consists of several steps: context mining, mined-hypothesis formulation, mined-hypothesis scoring on interestingness, and statistical adjustments. To capture and evaluate specific aspects of interestingness, we developed and implemented various hypothesis-mining metrics. Redhyte is an R shiny web application and can be found online at https://tohweizhong.shinyapps.io/redhyte, and the source codes are housed in a GitHub repository at https://github.com/tohweizhong/redhyte.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bickel, P., Hammel, E., O’connell, J.: Sex bias in graduate admissions: data from Berkeley. Sci. 187, 398–404 (1975)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Cochran, W.G.: Some methods for strengthening the common \(\chi ^2\) tests. Biometrics 10, 417–451 (1954)
Article MathSciNet MATH Google Scholar
Cox, D.R.: The regression analysis of binary sequences (with discussion). J. R. Stat. Soc. B 20, 215–242 (1958)
MATH Google Scholar
Fisher, R.A.: On a distribution yielding the error functions of several well-known statistics. Proc. Int. Congr. Math. 2, 805–813 (1924)
Google Scholar
Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2009)
Book MATH Google Scholar
Gosset, W.S.: The probable error of a mean. Biometrika 6, 1–25 (1908)
Article Google Scholar
Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2, e124 (2005)
Article MathSciNet Google Scholar
Liu, G., Suchitra, A., Zhang, H., Feng, M., Ng, S.K., Wong, L.: AssocExplorer: an association rule visualization system for exploratory data analysis. In: Proceedings of 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1536–1539 (2012)
Google Scholar
Liu, G., Zhang, H., Wong, L.: A flexible approach to finding representative pattern sets. IEEE Trans. Knowl. Data Eng. 26, 1562–1574 (2014)
Article Google Scholar
Liu, G., Zhang, H., Feng, M., Wong, L., Ng, S.K.: Supporting exploratory hypothesis testing and analysis. ACM Trans. Knowl. Discov. Data 9, Article 31 (2015)
Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
Article MathSciNet MATH Google Scholar
Pavlides, M., Perlman, M.: How likely is Simpson’s paradox? Am. Stat. 63, 226–233 (2009)
Article MathSciNet Google Scholar
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5(50), 157–175 (1900)
Article MATH Google Scholar
Poernomo, A.K., Gopalkrishnan, V.: CP-summary: a concise representation for browsing frequent itemsets. In: Proceedings of 12th ACM SIGKDD International Conference on Knowlegde Discovery and Data Mining, pp. 687–696 (2009)
Google Scholar
Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965)
Article MathSciNet MATH Google Scholar
Simpson, E.H.: The interpretation of interaction in contingency tables. J. R. Stat. Soc. B 13, 238–241 (1951)
MathSciNet MATH Google Scholar
Toh, W.Z.: Redhyte: an interactuve platform for rapid exploration of data and hypothesis testing. Project report, National University of Singapore (2015). http://www.comp.nus.edu.sg/wongls/psZ/tohweizhong-fyp2015.pdf
Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 730–735 (2006)
Google Scholar
West, M.: Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat. 7, 723–732 (2003)
Google Scholar
Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Proceedings of 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 314–323 (2005)
Google Scholar

Download references

Acknowledgements

This work was supported in part by a Singapore Ministry of Education tier-2 grant (MOE2012-T2-1-061) and by NCS Pte Ltd, Singapore.

Author information

Authors and Affiliations

National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore
Wei Zhong Toh, Kwok Pui Choi & Limsoon Wong
NCS Pte Ltd, 5 Ang Mo Kio Street 62, Singapore, 569141, Singapore
Wei Zhong Toh

Authors

Wei Zhong Toh
View author publications
You can also search for this author in PubMed Google Scholar
Kwok Pui Choi
View author publications
You can also search for this author in PubMed Google Scholar
Limsoon Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Limsoon Wong .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Wrocław University of Technology, Wrocław, Poland
Bogdan Trawiński
Iwate Prefectural University, Takizawa, Japan
Hamido Fujita
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toh, W.Z., Choi, K.P., Wong, L. (2016). Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Platform. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9622. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49390-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-662-49390-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49389-2
Online ISBN: 978-3-662-49390-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics