ABSTRACT
Democratizing Data Science requires a fundamental rethinking of the way data analytics and model discovery is done. Available tools for analyzing massive data sets and curating machine learning models are limited in a number of fundamental ways. First, existing tools require well-trained data scientists to select the appropriate techniques to build models and to evaluate their outcomes. Second, existing tools require heavy data preparation steps and are often too slow to give interactive feedback to domain experts in the model building process, severely limiting the possible interactions. Third, current tools do not provide adequate analysis of statistical risk factors in the model development. In this work, we present the first iteration of QuIC-M (pronounced quick-m), an interactive human-in-the-loop data exploration and model building suite. The goal is to enable domain experts to build the machine learning pipelines an order of magnitude faster than machine learning experts while having model qualities comparable to expert solutions.
- Andrew Crotty, Alex Galakatos, Emanuel Zgraggen, Carsten Binnig, and Tim Kraska. 2015. Vizdom: interactive analytics through pen and touch. Proceedings of the VLDB Endowment 8, 12 (2015), 2024--2027. Google ScholarDigital Library
- Danyel Fisher, Rob DeLine, Mary Czerwinski, and Steven Drucker. 2012. Interactions with big data analytics. interactions 19, 3 (2012), 50--59. Google ScholarDigital Library
- Mark Senn. 2017 (accessed December 29, 2017). Kaggle Competitions. https://www.kaggle.com/competitionsGoogle Scholar
- Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, and Tim Kraska. 2017. Controlling false discoveries during interactive data exploration. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 527--540. Google ScholarDigital Library
- Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
- Towards Interactive Curation & Automatic Tuning of ML Pipelines
Recommendations
Democratizing Data Science through Interactive Curation of ML Pipelines
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataStatistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning ...
Designing Interactive Transfer Learning Tools for ML Non-Experts
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsInteractive machine learning (iML) tools help to make ML accessible to users with limited ML expertise. However, gathering necessary training data and expertise for model-building remains challenging. Transfer learning, a process where learned ...
Big data curation
COMAD '14: Proceedings of the 20th International Conference on Management of DataA new mode of inquiry, problem solving, and decision making has become pervasive in our society, consisting of applying computational, mathematical, and statistical models to infer actionable information from large quantities of data. This paradigm, ...
Comments