Metadata management: past, present and future

https://doi.org/10.1016/S0167-9236(02)00208-7Get rights and content

Abstract

In the past, metadata has always been a second-class citizen in the world of databases and data warehouses. Its main purpose has been to define the data. However, the current emphasis on metadata in the data warehouse and software repository communities has elevated it to a new prominence. The organization now needs metadata for tool integration, data integration and change management. The paper presents a chronological account of this evolution—both from conceptual and management perspectives.

Repository concepts are currently being used to manage metadata for tool integration and data integration. As a final chapter in this evolution process, we point out the need of a concept called “metadata warehouse.” A real-life data warehouse project called TAMUS Information Portal (TIP) is used to describe the types of metadata needed in a data warehouse and the changes that the metadata go through. We propose that the metadata warehouse needs to be designed to store the metadata and manage its changes. We propose several architectures that can be used to develop a metadata warehouse.

Introduction

Metadata is a term that has been used and misused many times in the past. Webster defines “meta” as a more comprehensive term needed to “describe a new and related discipline designed to deal critically with the original one.” Metadata consequently then describes a discipline that fosters the study of data about data.

The origin of metadata can be traced back to how we use measurement units. The purpose of a unit is to describe a property of an object. For example, the length (a physical property) of a stick (an object) is 5 ft (a measurement unit). This example uses for one object, a data item (the number 5) and two metadata items (length and a measuring unit).

In the past, the metadata has often been treated as a second-class citizen. With the advent of computers and our incessant need for data, we have introduced techniques to store data permanently on a secondary storage. These data can then be retrieved and used by application programs. File managers are used to store and retrieve data from the secondary storage. To accomplish their job, file managers use such metadata as field names and filenames. This use of metadata, along with the actual data, now has extensively been ingrained in the database management technology.

As a result, in the last 30 years, we have witnessed a tremendous growth in the use of metadata in developing information systems. The purpose of this paper is to study this field and see how it helps us in decision support. To do this, we first describe a 40-year chronological development of metadata concept (see Section 2). Several management tools were also designed to manage metadata in the last 40 years. In Section 3, we categorize these developments. We argue that the most neglected area in the metadata management is the notion of managing changes in metadata. Section 4 emphasizes the changes in metadata and describes a real-life case study where the changes are of utmost importance. To manage these changes, we propose a new management tool called metadata warehouse. Unlike other tools that focus on tool integration and data integration, this tool manages the changes in metadata for organizational decision support. We conclude the paper in Section 5.

Section snippets

Evolution of the metadata concept

To describe the evolution of the concept of metadata, we look at each decade starting with the 1960s.

Evolution of metadata management

Metadata, although started as information to describe data or an asset, now is competing to get equal attention as an asset or data it defines. With phenomenal use of metadata in software development, in database management and in data warehouse design and implementation, it is important to look into the techniques used to manage the metadata.

To properly narrate the evolution of metadata managers, we start with a set of tools that implicitly manage data with metadata. From this stage, we

Proposing metadata warehouse—the final frontier

In the earlier sections, we have seen how the concept of metadata has evolved and how tools to manage them have matured from simple library managers to techniques that help integrate tools metadata. The integration is accomplished through a repository manager, supporting a common model of metadata under a single environment; or with a diverse set of metadata that follow a metamodel, called CWM.

The study of CWM reveals that it does not only support integration of tool metadata, it also supports

Conclusion and future research

While TIP Data Warehouse is a specific data warehouse, the need for metadata warehouse is universal. We see its need in the web data warehouses also. In fact, the changes are much more prevalent in the web world than in the regular data warehouse world [21]. The current research provides a window to show how metadata has come of age and now demands its own warehouse. Several conclusions and future research can be drawn from the Metadata Warehouse Proposal for the TIP project.

First, the major

Acknowledgements

The research was partially funded by a grant from Teradata, a division of NCR. The author acknowledges the help of Debbie Doran from the Texas A&M University System, and David Riegel and Mary Gros from Teradata.

Arun Sen is a full professor and Mays Fellow in the Department of Information and Operations Management in Texas A&M University. Before joining the Texas A&M University in 1986, he was an assistant and a tenured associate professor in the Department of Management Science, University of South Carolina. He holds an MTech in Electronics (from Calcutta University, India in 1971), an MS in Computer Science (from Penn State University in 1976) and a PhD in Information Systems (from Penn State

References (23)

  • F. Banchilhon et al.

    Building an Object-Oriented Database System

    (1992)
  • P.A. Bernstein

    Repositories and object-oriented databases

    SIGMOD Record

    (1998 (March))
  • C.J. Date

    An Introduction to Database Systems

    (1995)
  • A. Deshpande, D. Van Gucht, An implementation for nested relational databases, Tech Report Number 234, Department of...
  • D. Doran, TIP management change repository, An Internal Report, Texas A&M University System,...
  • E. Gamma et al.

    Design Patterns: Elements of Object-Oriented Software

    (1995)
  • W.H. Inmon

    Building Data Warehouse

    (1992 and 1996)
  • R. Kimball et al.

    The Data Warehouse Lifecycle Toolkit

    (1998)
  • M. Leiter et al.

    Support for maintaining object-oriented programs

    Transactions on Software Engineering

    (1992)
  • P.K. Linos et al.

    A toolset for maintaining hybrid C++ programs

    Software Maintenance Research and Practice

    (1996)
  • D. Marco

    Building and Managing Meta Data Repository: A Full Life Cycle Guide

    (2000)
  • Cited by (74)

    • Seven ways to make a data science project fail

      2023, Data and Information Management
    • Metadata integrity in bioinformatics: Bridging the gap between data and knowledge

      2023, Computational and Structural Biotechnology Journal
    • Accelerating the adoption of research data management strategies

      2022, Matter
      Citation Excerpt :

      An example of a mutable data generation pipeline in the AIPAM project is presented in Figure 8, where blocks such as data transformation and approximation source can be altered, added, or removed depending upon the progress of the research. These dynamic changes, if accounted in the metadata, can aid researchers to utilize84 or reproduce53 and validate the data with ease. There are several studies that aimed to formalize the metadata for various domains,85,86 and a common trait among them is utilization and formalization of taxonomy to represent key objects.

    View all citing articles on Scopus

    Arun Sen is a full professor and Mays Fellow in the Department of Information and Operations Management in Texas A&M University. Before joining the Texas A&M University in 1986, he was an assistant and a tenured associate professor in the Department of Management Science, University of South Carolina. He holds an MTech in Electronics (from Calcutta University, India in 1971), an MS in Computer Science (from Penn State University in 1976) and a PhD in Information Systems (from Penn State University in 1979).

    He has published 42 research papers in many journals such as MIS Quarterly, Information Systems Research, IEEE Transactions on Systems, Man and Cybernetics, IEEE Transactions on Software Engineering, IEEE Transactions on Engineering Management, Decision Sciences, Communications of the ACM, Information Systems, Computers and OR, Omega, European Journal of Operations Research, Decision Support Systems, Journal of MIS, Information and Management, and others. His research interests include decision support systems, database management, repository management and software reuse, case-based reasoning, technical and behavioral aspects of data warehouse and E-Commerce.

    He was an associate editor of Journal of Database Management. He was a special issue editor for Decision Support Systems, Communications of the ACM, Database and Expert Systems with Application. He was the chair of INFORMS College on Information Systems, a program chair for the 1996 Workshop on Information Technology and Systems (WITS) Conference and a track chair (Decision Support Systems and AI track) for the 1996 National DSI Conference.

    View full text