Metadata lifecycle management guideline

Document type:
Guideline
Version:
Final v1.0.0
Status:
Current
Owner:
QGCIO
Effective:
October 2019–current
Security classification:
OFFICIAL-Public

Final | October 2019 | v1.0.0 | OFFICIAL - Public |QGCIO

Introduction

Purpose

A Queensland Government Enterprise Architecture (QGEA) guideline provides information for Queensland Government agencies on the recommended practices for a given topic area. Guidelines are generally for information only and agencies are not required to comply. They are intended to help agencies understand the appropriate approach to addressing a particular issue or doing a particular task.

This document provides guidance to Queensland Government agencies which rely on metadata to facilitate effective data management, discovery and use. The guideline is structured around the typical phases of the information lifecycle and provides a recommended approach for managing metadata throughout each phase. As well as defining what metadata is and why its effective management is important, this guideline describes details of typical activities, expected outcomes and relevant stakeholders associated with each lifecycle phase to assist agencies to better manage their metadata holdings.

The intent of this document is to establish a point of reference from which agencies can formally develop specific policies, standards and procedures which meet agency business requirements for managing metadata throughout the lifecycle.

Audience

This document is primarily intended for:

  • Metadata creators
  • Information asset custodians
  • Information management specialists
  • Record keepers and archivists
  • Enterprise architects
  • Information architects
  • Business analysts
  • System administrators
  • Data analysts and researchers
  • Website developers
  • Metadata consumers

Scope

In scope

All metadata currently produced, collected, managed, stored, published or shared by Queensland Government departments, either by manual or automated processes.

Out of scope

The following are out of scope of the current guideline:

  • specific guidance on the selection of appropriate metadata schemas for particular subject domains
  • this guideline does not provide recommendations regarding the expected useful life of metadata.

Background

Along with other data and information, it is important that agencies actively manage metadata throughout its lifecycle in order to maintain relevance, accuracy and currency. As increasingly large amounts of data and information are produced and collected, effective metadata management helps to ensure the ongoing ability of agencies to understand, process, integrate, maintain and manage their data, systems and workflows. Metadata documents agency knowledge about its data and provides a consistent reference source to help users understand what data the agency holds, what that data represents and how it can be used.

Because metadata plays a vital role in relation to data and information discovery and use, it is important agencies put strategies in place to ensure metadata is managed throughout its lifecycle. Without metadata management, there can be no data management which will negatively impact on the ability of an agency to effectively use and reuse its data and information.

As Queensland government agencies create, collect, use and store increasingly large amounts of data, the role of metadata and its effective management, becomes increasingly important.

Metadata fundamentals

Metadata is often defined as data about data however in an increasingly complex data landscape, taking such a narrow approach to defining metadata risks reducing its perceived value to both business and technical users.

In addition to simply describing data, metadata also assist users (either human or machine) to understand business and technical processes, constraints on data usage, data quality, data security and data lineage. According to the Data Management Body of Knowledge (DMBoK) metadata describes:

  • the data itself (e.g. data elements, data models)
  • the concepts represented by the data (e.g. business processes, technological infrastructure)
  • the connections between the concepts and the data (e.g. relationships)

Therefore, metadata has the potential to provide an agency not only with an understanding of its data, but also its systems and its workflows, which is essential for both effective data management and use.

Without reliable metadata, an agency will struggle to effectively manage its information. Metadata provides the means for an agency to identify what data it has, where the data originated and how it flows through systems. It helps to define data quality, determine access rights and provides the organisational context to enable data to be located from a variety of starting points.

A metadata schema defines a comprehensive set of metadata elements for a dataset including any required fields, field types, definitions and data structures. Schemas provide procedural rules which ensure a standardised approach to both metadata creation and use, which in turn facilitates discoverability, interoperability and access.

There are many well established schemas which have been endorsed as standards for certain types of data and disciplines such as library science, education, archiving, e-commerce and the arts. Some examples of standards include DCAT for Open Data or ANZLIC for geospatial data.

A data dictionary contains field level definitions of the data elements of a database or metadata schema. It provides both users and creators additional context about the information each field can and should contain and is useful to maintain both system integrity and consistency. A data dictionary may also contain information such as data relationships, origins, usage and formats.

Metadata can be applied at a number of different levels. For example, metadata may be applied to:

  • Component parts of a data object (sub-item level) such as scenes in a movie, images on a webpage, chapters in a book or tables in a relational database.
  • A data object (item level) such as a book, a spreadsheet, a record or a webpage.
  • A group of data objects (collection level) such as a library or archival collection, a database or an information asset.

Agencies should determine the most appropriate level/s for the application of metadata, in line with their business requirements, user needs and technical capabilities. This should be performed with an understanding of agency responsibilities to share and publish metadata about their services, data and information assets.

Metadata is often categorised into various types which can help users understand the nature of information metadata contains and the functions it serves. Some types of metadata are categorised according to where the metadata originates (e.g. business, technical, operational), whereas some metadata is categorised according to how it is used (e.g. descriptive, structural, administrative).

Business metadata

Business metadata provides a business context to other data and therefore typically uses non-technical language to define data concepts, subjects, entities and attributes. Business metadata describes the content of a data asset and how to locate it in plain English and tends to be less structured than technical metadata. This category of metadata may include information which is useful for business decision making (such as business requirements, process flows and business operations) framed in terms that are relevant to the business. Business metadata is of particular value to business users but may also be used by technical and operational staff. Business metadata may include:

  • Business rules
  • Data quality rules
  • Business terms dictionary
  • Data governance and data lineage
  • Data stewards/owners
  • Value constraints
  • Security/privacy constraints
  • Data usage notes

Technical metadata

Technical metadata provides context around the technical (or internal) details of data including systems, process and data movements basically, the digital characteristics of a data asset or how systems function. This category of metadata provides additional information about data structure and storage as well as the applications and process used to manipulate the data. Technical metadata is useful for digital object management and operability because it describes the form and structure of a data asset including the size and structure of the data. Technical metadata may include:

  • Database column and table names
  • Column properties
  • Access permissions
  • File format schema definitions
  • Data lineage (upstream and downstream) documentation
  • Recovery and backup rules
  • Data access rights, groups and roles

Operational metadata

Operational metadata contains information regarding the processing and accessing of data, including data lineage, quality and provenance. This type of metadata describes the processes and events that occur within operational systems and the data objects which are affected. It is useful for tracking access and use of the data and may also identify how often the data is updated or refreshed. Operational metadata may include:

  • Details of batch programs of job logs
  • Results of audits
  • Error logs
  • Patches and version maintenance
  • Data archiving and retention rules
  • Technical roles and responsibilities
  • Data sharing rules and agreements

Descriptive metadata

Descriptive metadata describes an asset for the purpose of identification and retrieval and is therefore useful for discovery, assessment and identification. This type of metadata underpins the ability of users to browse, search, sort and filter information, and is typically produced by content creators using standardised attributes (e.g. title, abstract, author, keywords, unique identifiers) to describe assets. Descriptive metadata may include:

  • Catalogue records
  • Finding aids
  • Differentiations between versions
  • Specialised indexes
  • Curatorial information
  • Hyperlinked relationships between resources
  • Annotations by creators and users

Structural metadata

Structural metadata describes how the content of an asset can be used, reused and combined to form new assets and is therefore useful to match content with the precise needs of users. It describes how objects are organised and relationships within and among resources and their component parts (e.g. whether an asset is part of a single or multiple collection, number of pages per chapter, the structure of database objects). Structural metadata facilitates navigation and display of digital objects as well as helping to describe the relationship between two objects. Examples of Structural metadata include:

  • Table of contents
  • Chapters and parts
  • Indexes
  • Page numbers
  • HTML Tagging

Administrative metadata

Administrative metadata is used to help manage a resource throughout its lifecycle and may include technical information (e.g. file type, format, encryptions keys, passwords), preservation information (e.g. data refresh details, documentation of physical condition) and rights information (e.g. related to intellectual property and licensing). Examples of administrative data may include:

  • Creative commons licence
  • Permissions management
  • Acquisition information
  • Rights and reproduction tracking
  • Documentation of legal access requirements
  • Location information
  • Selection criteria for digitisation

Metadata management

To be data driven, an organisation should manage its data with metadata, and that metadata itself should in turn be managed. Regardless of types and uses of metadata, metadata management decisions should be driven by business requirements. All agencies will have different business drivers for managing metadata, which may vary across the organisation and across data repositories. Some common business drivers for implementing a consistent and structured approach to metadata management may include:

Understanding and describing existing data holdings

As well as providing a rich source of information about the context, history and origin of a data asset, metadata can also be used as a tool to help identify, locate and catalogue existing agency data holdings. Metadata is also useful in helping people from different parts of the organisation to identify differences and similarities between data assets. Understanding what data your agency holds is crucial to being able to use it efficiently and is also the first step to implementing an effective data governance strategy.

Facilitating data use and re-use

In order to use data appropriately and effectively, it must first be understood. Because metadata should accurately and consistently represent the content of data, it provides users with a level of confidence and understanding regarding both what the data is, and what it can be used for. Metadata can also help identify and enable multiple uses for the same data, such as strategic information within an agency, or information sharing between agencies.

Effective data governance

Data governance is concerned with maximising the value of data by exercising authority and control over data management practices. Effective data governance is underpinned by a consistent approach to metadata which promotes efficiency as well as knowledge of where data is located, what it means and what protections it requires. Metadata plays a critical role in relation to data governance, because it is the key to describing an organisations data and business processes, as well as their relationship to each other.

Increased confidence in data quality

Data quality is highly dependent on data governance, which in turn depends on effective metadata management. Because metadata describes data elements in terms of a controlled vocabulary (or data dictionary), it provides structure and consistency to those creating metadata as well as confidence to consumers regarding how the data can be used and whether it is fit for the intended purpose.

Enhanced discoverability

Metadata management can aid discoverability of data both within and between agencies by ensuring that it is described accurately, consistently and completely.This allows potential users, whether internal to the agency, external to the agency or a member of the community to discover, understand and request access to the data they require. If compiled into a data catalogue at the dataset level, metadata can act in a similar way to a library catalogue, allowing potential users to understand all relevant information (update frequency, security classification, licensing conditions etc) required to access and use the desired information.

Supporting data analytics

Precise data analytics relies on data which is both accurate and appropriate for the task at hand. Reliable and complete metadata, including consistent definitions of data elements, provides a level of confidence to those undertaking data analytics activities that the data they are analysing is fit for the intended purpose. Metadata provides a level of assurance to data analysts that the data they are using is not incorrect, out of date or unreliable.

Enabling compliance

Reliable and well managed metadata can help to ensure regulatory compliance in relation to agency specific legislation as well as the Information Privacy and the Right to Information Acts. Metadata can help to ensure that private data is adequately protected, and that information requested through the RTI process can be readily located within the designated timeframes. Effective metadata management can also assist agencies to meet the requirements of a range of other QGEA policies such as Information access and use (IS33), Information security policy (IS18:2018), the Queensland Government Information Security Classification Framework (QGISCF) and the Records governance policy.

Improving operational efficiency

Ensuring the effective management of metadata has the potential to produce a range of operational efficiencies such as streamlined workflows and improved communication particularly between data consumers and IT professionals. In addition, metadata management may facilitate the identification of redundant data and processes, reduce the amount of money spent on data storage and support better data driven decision making within agency business units.

As well as facilitating the realisation of the business benefits outlined above, effective metadata management can also help agencies avoid some of the risks associated with poor data management. These may include:

  • Errors in judgement due to incorrect or incomplete knowledge about the data
  • The inadvertent exposure of sensitive data or data misuse
  • Loss of organisational knowledge about agency data due to lack of documentation
  • Reliance on old or obsolete data
  • Increased cost of data storage and management due to duplicated or redundant data
  • Lack of consumer confidence in data due to incomplete or conflicting metadata
  • Doubt about the reliability of data and/or metadata
  • Poor decision making or increased time for decision making

As with other types of data and information, it is important that metadata is appropriately managed to ensure its ongoing relevancy and usefulness. Because it makes sense to manage metadata in association with the data and information it describes, the phases of the information asset lifecycle will be used as the basis for outlining metadata lifecycle management activities.

The objective of information asset lifecycle management is to optimise information asset acquisition, maximise the use of the information asset and reduce associated service and operational costs. Similarly, the objective of metadata lifecycle management is to optimise understanding of data and information, maximise its appropriate use and reuse and reduce agency time and effort in relation to managing, locating and understanding data and information holdings.

The lifecycle demonstrates typical activities and key business objectives of metadata management as it relates to the information lifecycle, from defining a metadata strategy through to archiving or disposal of metadata as required.

An overview of the metadata management activities conducted in each phase of the lifecycle is described in Table 1. These activities can be applied to all types of metadata, however as stated in the Metadata management principles, not all metadata is of equal value, and therefore agencies are encouraged to take a value-based approach to metadata management. This allows effort and energy to be focused on the management metadata associated with data and information which has the most business value to the agency, the Queensland Government and ultimately the people of Queensland.

Table 1 contains typical activities required to manage metadata throughout the information lifecycle. These activities are outlined in association with expected outcomes of the management activities and details of the potential stakeholders who may be involved.Agencies should develop formal processes, procedures and training to ensure that metadata lifecycle management activities can be effectively executed and controlled to meet business requirements.

LifecycleActivitiesOutcomesStakeholders

Plan

  • Understand metadata requirements (both business and technical) including any metadata-specific security considerations.
  • Establish baseline / assess current maturity levels.
  • Identify key stakeholders.
  • Developed phased implementation plan.
  • Document business problem and vision.
  • Plan for organisational and cultural change.
  • Establish training needs.
  • Secure funding (if required).
  • Review established metadata schemas and determine suitability.
  • Review established governance roles, process and bodies and determine suitability for incorporating metadata governance.
  • Understanding of what the organisation need metadata for (e.g. create new data, understand existing data, enable data movement, access data, share data etc.)
  • Business requirements understood
  • Stakeholders identified
  • Metadata sources identified
  • Agreement on future state and how to get there
  • Implementation plan developed
  • Business users/managers
  • Custodians
  • Information management specialists
  • Information technology specialists
  • Information security specialists
  • Data architects
  • Database administrators
  • Data governance bodies
  • Executive sponsor/champion
  • Communication and change managers
Lifecycle phaseActivitiesOutcomesStakeholders

Construct, create, acquire

  • Select and acquire appropriate existing metadata schema which meets business and technical requirements.
  • Avoid in-house development of bespoke metadata schemas.
  • Acquire or develop data dictionary / glossary if required.
  • Undertake assessment of metadata related security requirements/classification
  • Develop suitable metadata architecture.
  • Create data model for metadata repository.
  • Develop specific data governance processes or incorporate into existing governance structure.
  • Leverage existing metadata sources, information architecture and expertise.
  • Identify and consider risks and issues surrounding sharing and publishing metadata.
  • Organisational commitment secured
  • Agreement on how metadata will be created, maintained, integrated and accessed
  • Organisational understanding of business terms and usage
  • Organisations business concepts and terminology, definitions and the relationship between terms documented
  • Metadata management incorporated into overall data governance processes OR new governance processes established if none currently exist
  • Business users/managers
  • Custodians
  • Information management specialists
  • Information technology specialists
  • Information security specialists
  • Data architects
  • Database administrators
  • Data governance bodies
  • Executive sponsor/champion
  • External subject/domain specialists
Lifecycle phaseActivitiesOutcomesStakeholders

Commission, organise, store

  • Test the metadata schema against business and technical requirements.
  • Assign formal roles and responsibilities.
  • Collect and integrate metadata from diverse sources.
  • Communicate the necessity and value of metadata to stakeholders.
  • Document policies, procedures and work instructions.
  • Document metadata solutions.
  • Review metadata security classification and ensure appropriate security controls are in place.
  • Ensure that applied metadata enables information to be discovered easily and efficiently by users.
  • Similarities and differences between data understood
  • Ensure metadata quality, consistency, currency and security
  • Business engaged and willing to contribute
  • Roles and responsibilities allocated and understood. Appropriate training developed.
  • Business users/managers
  • Custodians
  • Information management specialists
  • Information technology specialists
  • Information security specialists
  • Data architects
  • Business analysts
  • Systems analysts
  • Data governance bodies
  • Project managers
Lifecycle phaseActivitiesOutcomesStakeholders
Access
  • Ensure authorised users and applications can access the required metadata.
  • Ensure both quality and security requirements are met and monitored.
  • Establish processes to support the appropriate sharing, harvesting, indexing, publishing or re-use of metadata.
  • Provide a standard way to access metadata
  • Metadata is delivered to data consumers and applications or tools that need it
  • Those responsible for interpreting metadata have the tools to do so
  • Integration of technical metadata with relevant business, process and stewardship metadata
  • Business users/managers
  • Custodians
  • Information management specialists
  • Information technology specialists
  • Information security specialists
  • Open data specialists
  • Data governance bodies
Lifecycle phaseActivitiesOutcomesStakeholders
Use
  • Provide support and training to staff on how to create, access and use metadata.
  • Regularly review metadata creator and consumer skills as well as supporting policies, procedures and work instructions.
  • Distribute and deliver metadata to authorised users and applications.
  • Query, report and analyse metadata.
  • Harvest and integrate existing sources of metadata.
  • Derive insights from multiple data sources.
  • Share, re-use and publish metadata in accordance with agency policies.
  • Monitor growth in use of the metadata.
  • Streamline workflows
  • Effective information use and re-use
  • Data content understood
  • Data used consistently
  • Data linage documented as it moves between systems
  • Visibility enabled through end to end lineage
  • More effective decision making
  • Improved productivity
  • Improved risk management
  • Metadata appropriately shared and published to facilitate information/data discovery
  • Business users
  • Technical users
  • Application developers
  • Data analysts
  • Data scientists
  • Archivists
  • Customers
Lifecycle phaseActivitiesOutcomesStakeholders
Assess

In conjunction with usage and in accordance with your data and information management strategy assess:

  • Business impact:
    • Ongoing business value and operational impact of the data supported by the metadata schema
    • Frequency and scope of use
    • Fit for current purpose
    • Supports operational and service delivery
  • Future business value:
    • Opportunities to support additional use cases
    • Delivery of measurable benefits
    • Potential for risk reduction
    • Options for growth and enhanced service delivery
  • Condition of metadata:
    • Current metadata management and associated governance strategies
    • Appropriate security classification of metadata
    • Opportunities to standardise and comply with existing standards
  • Metadata assessed (by both business and technical users) to examine its potential to deliver current and future benefits and satisfy business to ensure:
  • Sensitive information is protected
  • Data governance is supported
  • Process owners are held responsible for the quality of metadata
  • Process owners advise when metadata is incorrect of out of date
  • Standards for metadata quality are audited and enforced
  • Management strategies considered in the Maintain lifecycle phase are based on subjective assessment
  • Business users
  • Technical users
  • Information management specialists
  • Information technology specialists
  • Information security specialists
  • Data governance bodies
Lifecycle phaseActivitiesOutcomesStakeholders
Maintain

Based on the assessment phase, apply appropriate management strategies. These may include:

  • Optimise  investigate options to improve the metadata quality to better support current and future business needs.
  • Rationalise  identify options to improve the condition of metadata or reduce associated costs.
  • Enhance  ensure sufficient funding to maintain or improve the condition of metadata. Promote the leverage or re-use of the metadata to maximise its future value.
  • Replace  Maintain the current state of metadata in the short term until an alternative more suitable solution is identified.
  • Research and explore  Assess suitability against business and technical requirements and identify potential to deliver additional business value.
  • Decommission  metadata schema no longer fit for purpose OR data no longer required.
  • Disjointed systems connected. Existing metadata systems harmonised resulting in improved business value and usage.
  • Options for automation (tagging, profiling, semantic reconciliation, harvesting) explored. Consistency and data quality improved by implementing data dictionary/glossary.
  • Enhanced data quality and integration of metadata with relevant business processes. Feedback mechanisms for users created. Improved metadata accessibility and discoverability.
  • Policies, procedures, training, business glossaries and data dictionaries reviewed to ensure they are fit for current purpose while research and explore activity underway.
  • Market monitored for new tools and/or approaches and/or additional use cases identified.
  • Retire (see following lifecycle phase)
  • Business users
  • Technical users
  • Information management specialists
  • Information technology specialists
  • Information security specialists
  • Data governance bodies

Lifecycle phase

Activities

Outcomes

Stakeholders

Retire

  • Liaise with business users and other stakeholders when considering retiring/migrating schemas or disposing of metadata because:
    • The business requirement for using the selected metadata schema has significantly altered or no longer exists.
    • The data which the metadata describes has reached the end of its useful life and is being archived or destroyed.
  • Business users
  • Technical users
  • Records management specialists
  • Information management specialists
  • Information technology specialists
  • Information security specialists