Abstract
Given the complexity of data science projects and related demand for human expertise, automation has the potential to transform the data science process.
- Amershi, S. et al. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conf. on Human Factors in Computing Systems, 2019, 1--13.Google Scholar
- Bjorkman, A. et al. Plant functional trait change across a warming tundra biome. Nature 562, 7725 (2018), 57.Google Scholar
- Blockeel, H., Calders, T., Fromont, É., Goethals, B., Prado, A., and Robardet, C. An inductive database system based on virtual mining views. Data Mining and Knowledge Discovery 24, 1 (2012), 247--287.Google ScholarDigital Library
- Brazdil, P., Carrier, C., Soares, C., and Vilalta, R. Metalearning: Applications to Data Mining. Springer Science & Business Media, 2008.Google Scholar
- Chapman, P. et al. CRISP-DM 1.0 Step-by-step data mining guide, 2000.Google Scholar
- Chen, J., Jimenez-Ruiz, E., Horrocks, I., and Sutton, C. ColNet: Embedding the semantics of Web tables for column type prediction. In Proceedings of the 33rd AAAI Conf. on Artificial Intelligence, 2019.Google ScholarDigital Library
- Dasu, T. and Johnson, T. Exploratory Data Mining and Data Cleaning. Wiley, 2003.Google ScholarDigital Library
- De Bie, T. Subjective interestingness in exploratory data mining. In Proceedings of the Intern. Symp. Intelligent Data Analysis. Springer, 2013, 19--31.Google ScholarDigital Library
- De Raedt, L., Kersting, K., Natarajan, S., and Poole, D. Statistical relational artificial intelligence: Logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning 10, 2 (2016), 1--189.Google ScholarCross Ref
- Donoho, D. 50 years of data science. J. Computational and Graphical Statistics 26, 4 (2017), 745--766.Google ScholarCross Ref
- Elbattah, M. and Molloy, O. Analytics using machine learning-guided simulations with application to healthcare scenarios. Analytics and Knowledge Mgmt. Auerbach Publications, 2018, 277--324.Google Scholar
- Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., and Hutter, F. Efficient and robust automated machine learning. Advances in Neural Information Processing Systems 28, 2015, 2962--2970.Google ScholarDigital Library
- Gale, W. Statistical applications of artificial intelligence and knowledge engineering. The Knowledge Engineering Rev. 2, 4 (1987), 227--247.Google ScholarCross Ref
- Geng, L. and Hamilton, H. Interestingness measures for data mining: A survey. ACM Computing Surveys 38, 3 (2006), 9.Google ScholarDigital Library
- Gordon, A., Graepel, T., Rolland, N., Russo, C., Borgstrom, J., and Guiver, J. Tabular: A schema driven probabilistic programming language. ACM SIGPLAN Notices 49, 1 (2014), 321--334.Google ScholarDigital Library
- Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Computing Surveys 51, 5 (2018). 93.Google Scholar
- Guyon, I., et al. A brief review of the ChaLearn AutoML Challenge: Any-time any-dataset learning without human intervention. In Proceedings of the Workshop on Automatic Machine Learning 64 (2016), 21--30. F. Hutter, L. Kotthoff, and J. Vanschoren, Eds.Google Scholar
- Heer, J., Hellerstein, J., and Kandel, S. Predictive interaction for data transformation. In Proceedings of Conf. on Innovative Data Systems Research, 2015.Google Scholar
- Heer, J., Hellerstein, J., and Kandel, S. Data Wrangling. Encyclopedia of Big Data Technologies. S. Sakr and A. Zomaya, Eds. Springer, 2019.Google Scholar
- Hernández-Orallo, J., et al. Reframing in context: A systematic approach for model reuse in machine learning. AI Commun. 29, 5 (2016), 551--566.Google ScholarCross Ref
- Hutter, F., Hoos, H., and Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. In Proceedings of the Intern. Conf. on Learning and Intelligent Optimization. Springer, 2011, 507--523.Google ScholarDigital Library
- Hutter, F., Kotthoff, L., and Vanschoren, J., Eds. Automated Machine Learning---Methods, Systems, Challenges. Springer, 2019.Google Scholar
- Keim, D., Andrienko, G., Fekete, J., Görg, C., Kohlhammer, J., and Melançon, G. Visual analytics: Definition, process, and challenges. Information Visualization. Springer, 2008, 154--175.Google Scholar
- King, R., et al. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427, 6971 (2004, 247.Google Scholar
- Kotthoff, L., Thornton, C., Hoos, H., Hutter, F., and Leyton-Brown, K. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. J. Machine Learning Research 18, 1 (2017), 826--830.Google Scholar
- Langley, P., Simon, H., Bradshaw, G., and Zytkow, J. Scientific Discovery: Computational Explorations of the Creative Processes. MIT Press, 1987.Google ScholarCross Ref
- Le, V. and Gulwani, S. FlashExtract: A Framework for data extraction by examples. In Proceedings of the 35th ACM SIGPLAN Conf. on Programming Language Design and Implementation, 2014, 542--553.Google ScholarDigital Library
- Liu, C., et al. Progressive neural architecture search. In Proceedings of the European Conf. on Computer Vision, 2018, 19--34.Google ScholarDigital Library
- Lloyd, J., Duvenaud, D., Grosse, R., Tenenbaum, J., and Ghahramani, Z. Automatic construction and natural-language description of nonparametric regression models. In Proceedings of the 28th AAAI Conf. on Artificial Intelligence, 2014.Google ScholarCross Ref
- Mansinghka, V., Tibbetts, R., Baxter, J., Shafto, P., and Eaves, B. BayesDB: A probabilistic programming system for querying the probable implications of data. 2015; arXiv:1512.05006.Google Scholar
- Martínez-Plumed, F. et al. CRISP-DM twenty years later: From data mining processes to data science trajectories. IEEE Trans. Knowledge and Data Engineering (2020), 1 Google ScholarCross Ref
- Nazabal, A., Williams, C., Colavizza, G., Smith, C., and Williams, A. Data engineering for data analytics: A classification of the issues, and case studies. 2020; arXiv:2004.12929.Google Scholar
- Rakotoarison, H., Schoenauer, M., and Sebag, M. Automated machine learning with Monte-Carlo tree search. In Proceedings of the 28th Intern. Joint Conf. on Artificial Intelligence 7, (2019); https://doi.org/10.24963/ijcai.2019/457. Google ScholarCross Ref
- Ratner, A., De Sa, C., Wu, S., Selsam, D., and Ré, C. Data programming: Creating large training sets, quickly. In Proceedings of the 30th Intern. Conf. on Neural Information Processing Systems, 2016, 3574--3582.Google Scholar
- Ruotsalo, T., Jacucci, G., Myllymäki, P., and Kaski, S. Interactive intent modeling: Information discovery beyond search. Commun. ACM 58, 1 (Jan. 2014), 86--92.Google Scholar
- Sculley, D. et al. Hidden technical debt in machine learning systems. Advances in Neural Info. Processing Systems 28, (2015), 2503--2511.Google Scholar
- Serban, F., Vanschoren, J., Kietz, J., and Bernstein, A. A survey of intelligent assistants for data analysis. ACM Computing Surveys 45, 3 (2013), 1--35.Google ScholarDigital Library
- St. Amant, R. and Cohen, P. Intelligent support for exploratory data analysis. J. Computational and Graphical Statistics 7, 4 (1998), 545--558.Google Scholar
- Sutton, C., Hobson, T., Geddes, J., and Caruana, R. Data diff: Interpretable, executable summaries of changes in distributions for data wrangling. In Proceedings of the 24th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, 2018.Google ScholarDigital Library
- Tao, F. and Qi, Q. Make more digital twins. Nature 573 (2019), 490--491.Google ScholarCross Ref
- Thornton, C., Hutter, F., Hoos, H., and Leyton-Brown, K. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining, 2013, 847--855.Google ScholarDigital Library
- Tukey, J. Exploratory Data Analysis. Pearson, 1977.Google Scholar
- Tukey, J. and Wilk, M. Data analysis and statistics: An expository overview. In Proceedings of 1966 Fall Joint Computer Conf. (Nov. 7--10, 1966), 695--709.Google ScholarDigital Library
- Vanschoren, J., Van Rijn, J., Bischl, B., and Torgo, L. OpenML: Networked science in machine learning. ACM SIGKDD Explorations Newsletter 15, 2 (2014), 49--60.Google ScholarDigital Library
- Vartak, M., Rahman, S., Madden, S., Parameswaran, A., and Polyzotis, N. SeeDB: Efficient data-driven visualization recommendations to support visual analytics. In Proceedings of the Intern. Conf. on Very Large Data Bases 8 (2015), 2182.Google Scholar
- Vartak, M., et al. ModelDB: A system for machine learning model management. In Proceedings of the ACM Workshop on Human-In-the-Loop Data Analytics, 2016, 14.Google ScholarDigital Library
- Wang, D., et al. Human-AI collaboration in data science: Exploring data scientists' perceptions of automated AI. In Proceedings of the ACM Conf. on Human-Computer Interaction 3, 2019, 1--24.Google ScholarDigital Library
- Wasay, A., Athanassoulis, M., and Idreos, S. Queriosity: Automated data exploration. In Proceedings of the 2015 IEEE Intern. Congress on Big Data, 716--719.Google Scholar
- Wongsuphasawat, K., Moritz, D., Anand, A., Mackinlay, J., Howe, B., and Heer, J. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE Trans. Visualization and Computer Graphics 22, 1 (2015), 649--658.Google Scholar
- Wulder, M., Coops, N., Roy, D., White, J., and Hermosilla, T. Land cover 2.0. Intern. J. Remote Sensing 39, 12 (2018), 4254--4284.Google ScholarCross Ref
- Zgraggen, E., Zhao, Z., Zeleznik, R., and Kraska, T. Investigating the effect of the multiple comparisons problem in visual analysis. In Proceedings of the 2018 CHI Conf. on Human Factors in Computing Systems, 1--12.Google Scholar
Index Terms
- Automating data science
Recommendations
Automating the loading of business process data warehouses
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database TechnologyBusiness processes drive the operations of an enterprise. In the past, the focus was primarily on business process design, modeling, and automation. Recently, enterprises have realized that they can benefit tremendously from analyzing the behavior of ...
Towards Automating Data Narratives
IUI '17: Proceedings of the 22nd International Conference on Intelligent User InterfacesWe propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through ...
Dirty Data in the Newsroom: Comparing Data Preparation in Journalism and Data Science
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing SystemsThe work involved in gathering, wrangling, cleaning, and otherwise preparing data for analysis is often the most time consuming and tedious aspect of data work. Although many studies describe data preparation within the context of data science workflows,...
Comments