ABSTRACT
Current data governance techniques are very labor-intensive, as teams of data stewards typically rely on best practices to transform business policies into governance rules. As data plays an increasingly key role in today's data-driven enterprises, current approaches do not scale to the complexity and variety present in the data ecosystem of an enterprise as an increasing number of data requirements, use cases, applications, tools and systems come into play. We believe techniques from artificial intelligence and machine learning have potential to improve discoverability, quality and compliance in data governance. In this paper, we propose a framework for 'contextual intelligence', where we argue for (1) collecting and integrating contextual metadata from variety of sources to establish a trusted unified repository of contextual data use across users and applications, and (2) applying machine learning and artificial intelligence techniques over this rich contextual metadata to improve discoverability, quality and compliance in governance practices. We propose an architecture that unifies governance across several systems, with a graph serving as a core repository of contextual metadata, accurately representing data usage across the enterprise and facilitating machine learning, We demonstrate how our approach can enable ML-based recommendations in support of governance best practices.
- 2018. IBM Information Server 11.7. (2018).Google Scholar
- 2018. IBM InfoSphere Information Analyzer. (2018). https://www.ibm.com/us-en/marketplace/infosphere-information-analyzerGoogle Scholar
- Gediminas Adomavicius, Ramesh Sankaranarayanan, Shahana Sen, and Alexander Tuzhilin. 2005. Incorporating Contextual Information in Recommender Systems Using a Multidimensional Approach. ACM Trans. Inf. Syst. 23, 1 (Jan. 2005), 103--145. Google ScholarDigital Library
- Javad Akbarnejad, Gloria Chatzopoulou, Magdalini Eirinaki, Suju Koshy, Sarika Mittal, Duc On, Neoklis Polyzotis, and Jothi S. Vindhiya Varman. 2010. SQL QueRIE Recommendations. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 1597--1600. Google ScholarDigital Library
- Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2005. Query Recommendation Using Query Logs in Search Engines. In Current Trends in Database Technology - EDBT 2004 Workshops, Wolfgang Lindner, Marco Mesiti, Can Türker, Yannis Tzitzikas, and Athena I. Vakali (Eds.). Springer, 588--596. Google ScholarDigital Library
- Gloria Chatzopoulou, Magdalini Eirinaki, and Neoklis Polyzotis. 2009. Query Recommendations for Interactive Database Exploration. In Scientific and Statistical Database Management, Marianne Winslett (Ed.). Springer, 3--18. Google ScholarDigital Library
- Scheepers F. Nguyen N. van Kessel R. Chessell, M. and R. van der Starre. 1994. Governing and Managing Big Data for Analytics and Decision Makers. IBM Redguides for Business Leaders (1994). http://www.redbooks.ibm.com/redpapers/pdfs/redp5120.pdfGoogle Scholar
- Christina Christodoulakis, Eser Kandogan, Ignacio G. Terrizzano, and Renée J. Miller. 2017. VIQS: Visual Interactive Exploration of Query Semantics. In Proceedings of the 2017 ACM Workshop on Exploratory Search and Interactive Data Analytics (ESIDA '17). ACM, New York, NY, USA, 25--32. Google ScholarDigital Library
- R. J. DeStefano, L. Tao, and K. Gai. 2016. Improving Data Governance in Large Organizations through Ontology and Linked Data. In 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud). 279--284.Google Scholar
- P. Dourish. 2004. What We Talk About when We Talk About Context. Personal Ubiquitous Comput. 8, 1 (Feb. 2004), 19--30.Google ScholarDigital Library
- Corentin Follenfant, Olivier Corby, Fabien Gandon, and David Trastour. 2012. RDF Modelling and SPARQL Processing of SQL Abstract Syntax Trees. In PSW -1st Workshop on Programming the Semantic Web. Boston, United States.Google Scholar
- Bill Howe, Garret Cole, Emad Souroush, Paraschos Koutris, Alicia Key, Nodira Khoussainova, and Leilani Battle. 2011. Database-as-a-Service for Long Tail Science. In SSDBM '11: Proceedings of the 23rd Scientific and Statistical Database Management Conference. Google ScholarDigital Library
- Shrainik Jain and Bill Howe. {n. d.}. Data Cleaning in the Wild: Reusable Curation Idioms from a Multi-Year SQL Workload. ({n. d.}).Google Scholar
- Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed Lazowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 281--293. Google ScholarDigital Library
- Shrainik Jain, Dominik Moritz, and Bill Howe. 2016. High Variety Cloud Databases. In Proceedings of the 2016 IEEE Cloud Data Management Workshop.Google ScholarCross Ref
- E. Kandogan, A. Balakrishnan, E. M. Haber, and J. S. Pierce. 2014. From Data to Insight: Work Practices of Analysts in the Enterprise. IEEE Computer Graphics and Applications 34, 5 (Sept 2014), 42--50.Google ScholarCross Ref
- E. Kandogan, M. Roth, C. Kieliszewski, F. ÃŰzcan, B. Schloss, and M. T. Schmidt. 2013. Data for All: A Systems Approach to Accelerate the Path from Data to Insight. In 2013 IEEE International Congress on Big Data. 427--428. Google ScholarDigital Library
- E. Kandogan, M. Roth, P. Schwarz, J. Hui, I. Terrizzano, C. Christodoulakis, and R. J. Miller. 2015. LabBook: Metadata-driven social collaborative data analysis. In 2015 IEEE International Conference on Big Data (Big Data). 431--440. Google ScholarDigital Library
- Vijay Khatri and Carol V. Brown. 2010. Designing Data Governance. Commun. ACM 53, 1 (Jan. 2010), 148--152. Google ScholarDigital Library
- Nodira Khoussainova, YongChul Kwon, Magdalena Balazinska, and Dan Suciu. 2010. SnipSuggest: Context-aware Autocompletion for SQL. Proc. VLDB Endow. 4, 1 (Oct. 2010), 22--33. Google ScholarDigital Library
- Hao Ma, Tom Chao Zhou, Michael R. Lyu, and Irwin King. 2011. Improving Recommender Systems by Incorporating Social Contextual Information. ACM Trans. Inf. Syst. 29, 2, Article 9 (April 2011), 23 pages. Google ScholarDigital Library
- Patrick Marcel and Elsa Negre. 2011. A survey of query recommendation techniques for data warehouse exploration. In Actes des 7èmes journées francophones sur les Entrepôts de Données et l'Analyse en ligne, Clermont-Ferrand, France, EDA 2011, Juin 2011. 119--134.Google Scholar
- Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. 1999. Discovering Frequent Closed Itemsets for Association Rules. In Proceedings of the 7th International Conference on Database Theory (ICDT '99). Springer-Verlag, London, UK, UK, 398--416. Google ScholarDigital Library
- Torsten Priebe and Günther Pernul. 2003. Towards Integrative Enterprise Knowledge Portals. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM '03). ACM, New York, NY, USA, 216--223. Google ScholarDigital Library
- P. P. Tallon. 2013. Corporate Governance of Big Data: Perspectives on Value, Risk, and Cost. Computer 46, 6 (June 2013), 32--38. Google ScholarDigital Library
- Kristin Weber, Boris Otto, and Hubert Österle. 2009. One Size Does Not Fit All---A Contingency Approach to Data Governance. J. Data and Information Quality 1, 1, Article 4 (June 2009), 27 pages. Google ScholarDigital Library
Index Terms
- Contextual Intelligence for Unified Data Governance
Recommendations
Cloud data governance maturity model
ICC '17: Proceedings of the Second International Conference on Internet of things, Data and Cloud ComputingTo ensure data governance in cloud computing, it is important to build-in data governance in the planning, strategy and the design phases of cloud computing and adapt a data governance program architecture which makes sure that regular and governance ...
Research and Application of Enterprise Big Data Governance
CSAE '18: Proceedings of the 2nd International Conference on Computer Science and Application EngineeringWith1 the further development of information technology, data has become one of the core resources of enterprises. In the current era of big data, data governance has gradually become an important means for enterprises to make intelligent decisions, ...
Designing data governance
Amir Pnueli: Ahead of His TimeIntroduction
Organizations are becoming increasingly serious about the notion of "data as an asset" as they face increasing pressure for reporting a "single version of the truth." In a 2006 survey of 359 North American organizations that had deployed ...
Comments