What is Metadata?
Metadata, commonly defined as “data about data,” is a structured summary
of information that describes data. The term, however, is not restricted
to descriptions of data. More broadly defined, metadata is descriptive
information about any object or resource, as diverse as geospatial and
non-geospatial datasets, data analysis tools, computer models, Web sites,
graphics and textual information. At a minimum, metadata consists of the
standard bibliographic information that supports resource discovery. However,
it generally contains information that supports a wider range of operations,
such as management, evaluation, access, and use.
Because metadata serves a diversity of uses, there exists a number
of standards. These standards differ greatly in the level of information
they support. Essentially, one can look at both the uses of metadata,
and the various standards along a continuum of complexity. The
most basic record enables data and resource discovery, much like records
in a library catalog, whereas the most complex provides essential information
for processing and interpreting data, much like a user manual. Metadata
facilitates comparisons between data sets from different sources and
when placed in a searchable index, enables searching of domain specific
information, such as geographic location, title, or data type. Metadata
may also serve as a tool for organizing and maintaining an organization’s
investment in its data, by providing a systematic way of recording information
about the data it produces. Metadata may even provide
protection for the producing organization if a conflict arises over
the misuse of data. In essence, metadata is documentation that can answer
the who, what, when, where, why, and how questions, describing every facet
of the data or resource being documented—its content, quality, accessibility,
collection methods, processing, and availability.
What standard do I use?
Once convinced of the importance of documenting data with metadata,
or because it’s required— there is a federal mandate for federal agencies
and organizations receiving federal funds to document their data— it is necessary to select a standard. There are
several standards available to the user. The standard chosen may
depend on what is being documented (e.g., geospatial data, non-geospatial
data, Earth science data, non-data resources), why the documentation is being undertaken (e.g., discovery or processing), who is requiring the document (once
again, the federal mandate) or how and where the metadata will be made available (e.g., catalogs, clearinghouses, etc.).
As a creator and distributor of geospatial and global change data
and data-related products, CIESIN participates in two major clearinghouse and cataloging initiatives: the Federal Geographic
Data Committee (FGDC) Clearinghouse, and the NASA Global Change Master
Directory (GCMD).
CIESIN therefore creates metadata records using both the FGDC’s Content Standard
for Digital Geospatial Metadata (CSDGM), and the GCMD’s
Directory Interchange Format (DIF) and Service Entry Resource
Format (SERF), enabling interoperability with each of these catalog systems. These
standards, along with a few others, are introduced in the following
section. For more detailed information about any of these standards,
there are links to numerous tutorials and guides, as well as CIESIN’s “Guide to FGDC-Compliant Metadata.”
Standards
The Content Standard for Digital Geospatial Metadata (CSDGM), more
commonly referred to as the FGDC standard (both terms are interchangeable), was developed specifically to provide a common
set of terminology and definitions for documenting digital geospatial
data. It has been the federally endorsed metadata standard for geospatial
data in the United States. since 1994, when President Clinton signed Executive Order
No.12906, mandating that all federal agencies and organizations receiving
federal funds document their geospatial data using this standard. It
is currently used in more than 200 national and international catalogs
and clearinghouses.
The standard is large and fairly complex and specifies the content
of some 330 metadata elements, though only a subset of the 330 elements
are mandatory. Elements are specified as either “mandatory,” “mandatory
if applicable,” or “optional.” Using only the minimum number of mandatory
elements may be sufficient for data discovery. It may also be an appropriate
set for initial documentation. However, to assist in data transfer and
processing, the minimum required elements may not be sufficient.
Despite the size and level of complexity of this standard, it is
fairly flexible. It can be modified for specific data types
by employing an endorsed “profile” or “extension”, which is simply an
addition or a simplification to the standard. Two profiles, the Biological
Data Profile and Metadata
Profile for Shoreline Data, and one extension, the Extensions for Remote Sensing, have been endorsed by the FGDC. Although the FGDC standard was developed for geospatial data,
the Biological Data Profile, developed for the National
Biological Information Infrastructure (NBII) of the U.S.
Geological Survey (USGS), is a prime example of how the CSDGM can be used for
data that are not geospatial in nature (such as results from laboratory-based
research). The CSDGM is used in the NBII Clearinghouse initiative to provide
metadata-based descriptions of non-data resources, such as software
tools, data applications, reports, Web sites and other information products.
The Directory Interchange Format (DIF) is a de facto standard established
by the NASA’s Global Change Master
Directory (GCMD) in collaboration with the Committee on Earth Observation
Satellites (CEOS) International Directory Network (IDN). It was developed
to describe and catalog Earth science and Global Change related data
sets from a wide variety of disciplines, and serves the user community
in the discovery of Earth science data. It is used in a number of national
and international catalogs, most notably the GCMD.
The DIF is considered to be that set of attributes that are essential
for a user to determine if a dataset meets their needs. It does not
contain information concerning transfer and processing of data as does
the FGDC standard. The DIF is compliant with the ISO 19115.
The Service Entry Resource Format (SERF) is another de facto standard
established by NASA’s Global Change Master Directory (GCMD). It is used
for directory entries describing a service, rather than data. The SERF
was established for describing services directly related to the “processing,
viewing, analysis, archival, retrieval, production, interpretation,
acquisition, formatting, or indexing of Earth science data” —the tools
a GCMD user may need for manipulating data. Like the DIF, it too is
intended to serve the user community in discovery. It is a very simple
standard, requiring only seven elements.
The International Standards Organization (ISO) is a non-governmental
organization that establishes standards to facilitate the international
exchange goods and services. The ISO
TC211 is a special committee that works on standards for digital
geographic information, including the ISO 19115 standard for geospatial
metadata. In February 2001, the standard was approved for publication
as a “Draft International Standard (DIS).” Its development was influenced
by several standards, including the FGDC and ANZLIC, but is more comprehensive
than any of them. Finalized in 2003, FGDC and ANZLIC are establishing
“profiles” consisting of metadata elements unique to the ISO
19115 standard. All the major standards will be interoperable with this
standard.
The Australia New Zealand Land Information Council (ANZLIC) develops
policies, standards,and procedures to manage the capture, storage, maintenance
and transfer of land and geographic information in Australia and New
Zealand. The standard contains fewer elements than the FGDC standard,
but is consistent with FGDC’s guidelines on geospatial metadata.
The Dublin Core Metadata Initiative (DCMI) is “an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems.” Dublin Core was developed specifically
as a simple and flexible metadata format for cataloging and facilitating
the discovery of electronic resources and document-like objects on the
Internet. It is much simpler than any of the previously discussed standards,
as it was specifically intended for resource discovery. Although we
do not use the Dublin Core standard, we felt it worthy of mention for several
reasons. It is widely used in the library community and there are several
initiatives for mapping the Dublin Core standard to FGDC and DIF. Successful
mapping between these standards allows for interoperability with catalogs
using the Dublin Core standard.
The Data Documentation Initiative (DDI) is an effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences. The DDI metadata specification originated in the Inter-University Consortium for Political and Social Research (ICPSR) and is now the project of an Alliance of about 25 institutions in North America and Europe. Together, the member institutions comprise many of the largest data producers and data archives in the world. Virtually every kind of body of data is found in one or more of the archives.
How do I make my metadata accessible?
A catalog or clearinghouse is generally the destination of the metadata
record, providing access to one’s metadata records, and therefore to
one’s data or resource. At CIESIN, we have several different catalogs
as targets for our records; see our catalogs page.