Metadata management: past, present and future
Introduction
Metadata is a term that has been used and misused many times in the past. Webster defines “meta” as a more comprehensive term needed to “describe a new and related discipline designed to deal critically with the original one.” Metadata consequently then describes a discipline that fosters the study of data about data.
The origin of metadata can be traced back to how we use measurement units. The purpose of a unit is to describe a property of an object. For example, the length (a physical property) of a stick (an object) is 5 ft (a measurement unit). This example uses for one object, a data item (the number 5) and two metadata items (length and a measuring unit).
In the past, the metadata has often been treated as a second-class citizen. With the advent of computers and our incessant need for data, we have introduced techniques to store data permanently on a secondary storage. These data can then be retrieved and used by application programs. File managers are used to store and retrieve data from the secondary storage. To accomplish their job, file managers use such metadata as field names and filenames. This use of metadata, along with the actual data, now has extensively been ingrained in the database management technology.
As a result, in the last 30 years, we have witnessed a tremendous growth in the use of metadata in developing information systems. The purpose of this paper is to study this field and see how it helps us in decision support. To do this, we first describe a 40-year chronological development of metadata concept (see Section 2). Several management tools were also designed to manage metadata in the last 40 years. In Section 3, we categorize these developments. We argue that the most neglected area in the metadata management is the notion of managing changes in metadata. Section 4 emphasizes the changes in metadata and describes a real-life case study where the changes are of utmost importance. To manage these changes, we propose a new management tool called metadata warehouse. Unlike other tools that focus on tool integration and data integration, this tool manages the changes in metadata for organizational decision support. We conclude the paper in Section 5.
Section snippets
Evolution of the metadata concept
To describe the evolution of the concept of metadata, we look at each decade starting with the 1960s.
Evolution of metadata management
Metadata, although started as information to describe data or an asset, now is competing to get equal attention as an asset or data it defines. With phenomenal use of metadata in software development, in database management and in data warehouse design and implementation, it is important to look into the techniques used to manage the metadata.
To properly narrate the evolution of metadata managers, we start with a set of tools that implicitly manage data with metadata. From this stage, we
Proposing metadata warehouse—the final frontier
In the earlier sections, we have seen how the concept of metadata has evolved and how tools to manage them have matured from simple library managers to techniques that help integrate tools metadata. The integration is accomplished through a repository manager, supporting a common model of metadata under a single environment; or with a diverse set of metadata that follow a metamodel, called CWM.
The study of CWM reveals that it does not only support integration of tool metadata, it also supports
Conclusion and future research
While TIP Data Warehouse is a specific data warehouse, the need for metadata warehouse is universal. We see its need in the web data warehouses also. In fact, the changes are much more prevalent in the web world than in the regular data warehouse world [21]. The current research provides a window to show how metadata has come of age and now demands its own warehouse. Several conclusions and future research can be drawn from the Metadata Warehouse Proposal for the TIP project.
First, the major
Acknowledgements
The research was partially funded by a grant from Teradata, a division of NCR. The author acknowledges the help of Debbie Doran from the Texas A&M University System, and David Riegel and Mary Gros from Teradata.
Arun Sen is a full professor and Mays Fellow in the Department of Information and Operations Management in Texas A&M University. Before joining the Texas A&M University in 1986, he was an assistant and a tenured associate professor in the Department of Management Science, University of South Carolina. He holds an MTech in Electronics (from Calcutta University, India in 1971), an MS in Computer Science (from Penn State University in 1976) and a PhD in Information Systems (from Penn State
References (23)
- et al.
Building an Object-Oriented Database System
(1992) Repositories and object-oriented databases
SIGMOD Record
(1998 (March))An Introduction to Database Systems
(1995)- A. Deshpande, D. Van Gucht, An implementation for nested relational databases, Tech Report Number 234, Department of...
- D. Doran, TIP management change repository, An Internal Report, Texas A&M University System,...
- et al.
Design Patterns: Elements of Object-Oriented Software
(1995) Building Data Warehouse
(1992 and 1996)- et al.
The Data Warehouse Lifecycle Toolkit
(1998) - et al.
Support for maintaining object-oriented programs
Transactions on Software Engineering
(1992) - et al.
A toolset for maintaining hybrid C++ programs
Software Maintenance Research and Practice
(1996)
Building and Managing Meta Data Repository: A Full Life Cycle Guide
Cited by (74)
A valorization framework to strategically manage data for creating competitive value
2024, International Journal of Production EconomicsSeven ways to make a data science project fail
2023, Data and Information ManagementMetadata integrity in bioinformatics: Bridging the gap between data and knowledge
2023, Computational and Structural Biotechnology JournalAccelerating the adoption of research data management strategies
2022, MatterCitation Excerpt :An example of a mutable data generation pipeline in the AIPAM project is presented in Figure 8, where blocks such as data transformation and approximation source can be altered, added, or removed depending upon the progress of the research. These dynamic changes, if accounted in the metadata, can aid researchers to utilize84 or reproduce53 and validate the data with ease. There are several studies that aimed to formalize the metadata for various domains,85,86 and a common trait among them is utilization and formalization of taxonomy to represent key objects.
Improving Collaborative Filtering Recommendations with Tag and Time Integration in Virtual Online Communities <sup>†</sup>
2023, Applied Sciences (Switzerland)
Arun Sen is a full professor and Mays Fellow in the Department of Information and Operations Management in Texas A&M University. Before joining the Texas A&M University in 1986, he was an assistant and a tenured associate professor in the Department of Management Science, University of South Carolina. He holds an MTech in Electronics (from Calcutta University, India in 1971), an MS in Computer Science (from Penn State University in 1976) and a PhD in Information Systems (from Penn State University in 1979).
He has published 42 research papers in many journals such as MIS Quarterly, Information Systems Research, IEEE Transactions on Systems, Man and Cybernetics, IEEE Transactions on Software Engineering, IEEE Transactions on Engineering Management, Decision Sciences, Communications of the ACM, Information Systems, Computers and OR, Omega, European Journal of Operations Research, Decision Support Systems, Journal of MIS, Information and Management, and others. His research interests include decision support systems, database management, repository management and software reuse, case-based reasoning, technical and behavioral aspects of data warehouse and E-Commerce.
He was an associate editor of Journal of Database Management. He was a special issue editor for Decision Support Systems, Communications of the ACM, Database and Expert Systems with Application. He was the chair of INFORMS College on Information Systems, a program chair for the 1996 Workshop on Information Technology and Systems (WITS) Conference and a track chair (Decision Support Systems and AI track) for the 1996 National DSI Conference.