Metadata management: past, present and future

doi:10.1016/S0167-9236(02)00208-7

Decision Support Systems

Volume 37, Issue 1, April 2004, Pages 151-173

https://doi.org/10.1016/S0167-9236(02)00208-7 Get rights and content

Abstract

In the past, metadata has always been a second-class citizen in the world of databases and data warehouses. Its main purpose has been to define the data. However, the current emphasis on metadata in the data warehouse and software repository communities has elevated it to a new prominence. The organization now needs metadata for tool integration, data integration and change management. The paper presents a chronological account of this evolution—both from conceptual and management perspectives.

Repository concepts are currently being used to manage metadata for tool integration and data integration. As a final chapter in this evolution process, we point out the need of a concept called “metadata warehouse.” A real-life data warehouse project called TAMUS Information Portal (TIP) is used to describe the types of metadata needed in a data warehouse and the changes that the metadata go through. We propose that the metadata warehouse needs to be designed to store the metadata and manage its changes. We propose several architectures that can be used to develop a metadata warehouse.

Introduction

Metadata is a term that has been used and misused many times in the past. Webster defines “meta” as a more comprehensive term needed to “describe a new and related discipline designed to deal critically with the original one.” Metadata consequently then describes a discipline that fosters the study of data about data.

The origin of metadata can be traced back to how we use measurement units. The purpose of a unit is to describe a property of an object. For example, the length (a physical property) of a stick (an object) is 5 ft (a measurement unit). This example uses for one object, a data item (the number 5) and two metadata items (length and a measuring unit).

In the past, the metadata has often been treated as a second-class citizen. With the advent of computers and our incessant need for data, we have introduced techniques to store data permanently on a secondary storage. These data can then be retrieved and used by application programs. File managers are used to store and retrieve data from the secondary storage. To accomplish their job, file managers use such metadata as field names and filenames. This use of metadata, along with the actual data, now has extensively been ingrained in the database management technology.

As a result, in the last 30 years, we have witnessed a tremendous growth in the use of metadata in developing information systems. The purpose of this paper is to study this field and see how it helps us in decision support. To do this, we first describe a 40-year chronological development of metadata concept (see Section 2). Several management tools were also designed to manage metadata in the last 40 years. In Section 3, we categorize these developments. We argue that the most neglected area in the metadata management is the notion of managing changes in metadata. Section 4 emphasizes the changes in metadata and describes a real-life case study where the changes are of utmost importance. To manage these changes, we propose a new management tool called metadata warehouse. Unlike other tools that focus on tool integration and data integration, this tool manages the changes in metadata for organizational decision support. We conclude the paper in Section 5.

Section snippets

Evolution of the metadata concept

To describe the evolution of the concept of metadata, we look at each decade starting with the 1960s.

Evolution of metadata management

Metadata, although started as information to describe data or an asset, now is competing to get equal attention as an asset or data it defines. With phenomenal use of metadata in software development, in database management and in data warehouse design and implementation, it is important to look into the techniques used to manage the metadata.

To properly narrate the evolution of metadata managers, we start with a set of tools that implicitly manage data with metadata. From this stage, we

Proposing metadata warehouse—the final frontier

In the earlier sections, we have seen how the concept of metadata has evolved and how tools to manage them have matured from simple library managers to techniques that help integrate tools metadata. The integration is accomplished through a repository manager, supporting a common model of metadata under a single environment; or with a diverse set of metadata that follow a metamodel, called CWM.

The study of CWM reveals that it does not only support integration of tool metadata, it also supports

Conclusion and future research

While TIP Data Warehouse is a specific data warehouse, the need for metadata warehouse is universal. We see its need in the web data warehouses also. In fact, the changes are much more prevalent in the web world than in the regular data warehouse world [21]. The current research provides a window to show how metadata has come of age and now demands its own warehouse. Several conclusions and future research can be drawn from the Metadata Warehouse Proposal for the TIP project.

First, the major

Acknowledgements

The research was partially funded by a grant from Teradata, a division of NCR. The author acknowledges the help of Debbie Doran from the Texas A&M University System, and David Riegel and Mary Gros from Teradata.

Arun Sen is a full professor and Mays Fellow in the Department of Information and Operations Management in Texas A&M University. Before joining the Texas A&M University in 1986, he was an assistant and a tenured associate professor in the Department of Management Science, University of South Carolina. He holds an MTech in Electronics (from Calcutta University, India in 1971), an MS in Computer Science (from Penn State University in 1976) and a PhD in Information Systems (from Penn State

References (23)

F. Banchilhon et al.
Building an Object-Oriented Database System
(1992)
P.A. Bernstein
Repositories and object-oriented databases
SIGMOD Record
(1998 (March))
C.J. Date
An Introduction to Database Systems
(1995)
A. Deshpande, D. Van Gucht, An implementation for nested relational databases, Tech Report Number 234, Department of...
D. Doran, TIP management change repository, An Internal Report, Texas A&M University System,...
E. Gamma et al.
Design Patterns: Elements of Object-Oriented Software
(1995)
W.H. Inmon
Building Data Warehouse
(1992 and 1996)
R. Kimball et al.
The Data Warehouse Lifecycle Toolkit
(1998)
M. Leiter et al.
Support for maintaining object-oriented programs
Transactions on Software Engineering
(1992)
P.K. Linos et al.
A toolset for maintaining hybrid C++ programs
Software Maintenance Research and Practice
(1996)

D. Marco

Building and Managing Meta Data Repository: A Full Life Cycle Guide

(2000)

Cited by (74)

A valorization framework to strategically manage data for creating competitive value
2024, International Journal of Production Economics
Today's big challenge is to use data for informed decision-making. Despite the multitude of data definitions, managers should manage this digital resource strategically to create a competitive value. Previous studies accounted for data technically, either as technology, analytical process, or information system advancement, opening new avenues for management studies on data valorization in multiple industries. Among industries, Insurtech merges the leadership of information and the adoption of data-driven technologies that enable the creation of value. Based on a systematic literature review in the insurtech context, the research identified the pivotal role of data as a strategic business asset and emphasizes the need for data orchestration in insurtech companies to attain a competitive edge. The findings produced a theoretical data valorization framework and four data competitive advantages (enhanced risk assessment, optimized operations, customer engagement, and openness) to position in the market. Managers are enhanced to decide the practices for a competitive value. Policymakers could drive strategies for fostering corporates' data journey toward innovation and competition. The novelty of the study is the comprehensive view that enables valorizing data.
Seven ways to make a data science project fail
2023, Data and Information Management
The rapid emergence of data science as a field has made it a rival or replacement for information science from an industry perspective. In particular, the “big data” meme in data science and a heavy reliance on “black box” technology emphasize the quantity of data used in a project and asks, “what data do we have” rather than “what data do we need to solve our business problems.” This perspective also undermines the perceived importance of domain expertise, user research, data semantics and provenance, and other considerations valued in information science. This article uses a composite (and somewhat caricatured) case study of a data science project and discusses seven ways in which it is destined to fail, and then explains how “good information science” would have prevented or ameliorated them. Data science and information science need to recognize that together they can accomplish more than they can accomplish separately.
Metadata integrity in bioinformatics: Bridging the gap between data and knowledge
2023, Computational and Structural Biotechnology Journal
In the fast-evolving landscape of biomedical research, the emergence of big data has presented researchers with extraordinary opportunities to explore biological complexities. In biomedical research, big data imply also a big responsibility. This is not only due to genomics data being sensitive information but also due to genomics data being shared and re-analysed among the scientific community. This saves valuable resources and can even help to find new insights in silico. To fully use these opportunities, detailed and correct metadata are imperative. This includes not only the availability of metadata but also their correctness. Metadata integrity serves as a fundamental determinant of research credibility, supporting the reliability and reproducibility of data-driven findings. Ensuring metadata availability, curation, and accuracy are therefore essential for bioinformatic research. Not only must metadata be readily available, but they must also be meticulously curated and ideally error-free. Motivated by an accidental discovery of a critical metadata error in patient data published in two high-impact journals, we aim to raise awareness for the need of correct, complete, and curated metadata. We describe how the metadata error was found, addressed, and present examples for metadata-related challenges in omics research, along with supporting measures, including tools for checking metadata and software to facilitate various steps from data analysis to published research.
Accelerating the adoption of research data management strategies
2022, Matter
Citation Excerpt :
An example of a mutable data generation pipeline in the AIPAM project is presented in Figure 8, where blocks such as data transformation and approximation source can be altered, added, or removed depending upon the progress of the research. These dynamic changes, if accounted in the metadata, can aid researchers to utilize84 or reproduce53 and validate the data with ease. There are several studies that aimed to formalize the metadata for various domains,85,86 and a common trait among them is utilization and formalization of taxonomy to represent key objects.
The need for good research data management (RDM) practices is becoming more recognized as a critical part of research. This may be attributed to the 5V challenge in big data: volume, variety, velocity, veracity, and value. The materials science community is no exception to these challenges as it heralds its new paradigm of data-driven science, which uses artificial intelligence to accelerate materials discovery but requires massive datasets to perform effectively. Hence, there are efforts to standardize, curate, preserve, and disseminate these data in a way that is findable, accessible, interoperable, and reusable (FAIR). To understand the current state of data-driven materials science and learn about the challenges faced with RDM, we gather user stories of researchers from small- and large-scale projects. This enables us to provide relevant recommendations within the data-driven research life cycle to develop and/or procure an effective RDM system following the FAIR guiding principles.
Intelligent Decision-Making System for Integrated Geological and Engineering of Deep Coalbed Methane Development
2023, Energy and Fuels
Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities <sup>†</sup>
2023, Applied Sciences (Switzerland)

View all citing articles on Scopus

He has published 42 research papers in many journals such as MIS Quarterly, Information Systems Research, IEEE Transactions on Systems, Man and Cybernetics, IEEE Transactions on Software Engineering, IEEE Transactions on Engineering Management, Decision Sciences, Communications of the ACM, Information Systems, Computers and OR, Omega, European Journal of Operations Research, Decision Support Systems, Journal of MIS, Information and Management, and others. His research interests include decision support systems, database management, repository management and software reuse, case-based reasoning, technical and behavioral aspects of data warehouse and E-Commerce.

He was an associate editor of Journal of Database Management. He was a special issue editor for Decision Support Systems, Communications of the ACM, Database and Expert Systems with Application. He was the chair of INFORMS College on Information Systems, a program chair for the 1996 Workshop on Information Technology and Systems (WITS) Conference and a track chair (Decision Support Systems and AI track) for the 1996 National DSI Conference.

View full text

Metadata management: past, present and future

Abstract

Introduction

Section snippets

Evolution of the metadata concept

Evolution of metadata management

Proposing metadata warehouse—the final frontier

Conclusion and future research

Acknowledgements

Building an Object-Oriented Database System

Repositories and object-oriented databases

SIGMOD Record

An Introduction to Database Systems

Design Patterns: Elements of Object-Oriented Software

Building Data Warehouse

The Data Warehouse Lifecycle Toolkit

Support for maintaining object-oriented programs

Transactions on Software Engineering

A toolset for maintaining hybrid C++ programs

Software Maintenance Research and Practice

Building and Managing Meta Data Repository: A Full Life Cycle Guide