What is Metadata?

Metadata, commonly defined as “data about data,” is a structured summary of information that describes data. The term, however, is not restricted to descriptions of data. More broadly defined, metadata is descriptive information about any object or resource, as diverse as geospatial and non-geospatial datasets, data analysis tools, computer models, Web sites, graphics and textual information. At a minimum, metadata consists of the standard bibliographic information that supports resource discovery. However, it generally contains information that supports a wider range of operations, such as management, evaluation, access, and use.

Because metadata serves a diversity of uses, there exists a number of standards. These standards differ greatly in the level of information they support. Essentially, one can look at both the uses of metadata, and the various standards along a continuum of complexity. The most basic record enables data and resource discovery, much like records in a library catalog, whereas the most complex provides essential information for processing and interpreting data, much like a user manual. Metadata facilitates comparisons between data sets from different sources and when placed in a searchable index, enables searching of domain specific information, such as geographic location, title, or data type. Metadata may also serve as a tool for organizing and maintaining an organization’s investment in its data, by providing a systematic way of recording information about the data it produces. Metadata may even provide protection for the producing organization if a conflict arises over the misuse of data. In essence, metadata is documentation that can answer the who, what, when, where, why, and how questions, describing every facet of the data or resource being documented—its content, quality, accessibility, collection methods, processing, and availability.

What standard do I use?

Once convinced of the importance of documenting data with metadata, or because it’s required— there is a federal mandate for federal agencies and organizations receiving federal funds to document their data— it is necessary to select a standard. There are several standards available to the user. The standard chosen may depend on what is being documented (e.g., geospatial data, non-geospatial data, Earth science data, non-data resources), why the documentation is being undertaken (e.g., discovery or processing), who is requiring the document (once again, the federal mandate) or how and where the metadata will be made available (e.g., catalogs, clearinghouses, etc.).

As a creator and distributor of geospatial and global change data and data-related products, CIESIN participates in two major clearinghouse and cataloging initiatives: the Federal Geographic Data Committee (FGDC) Clearinghouse, and the NASA Global Change Master Directory (GCMD). CIESIN therefore creates metadata records using both the FGDC’s Content Standard for Digital Geospatial Metadata (CSDGM), and the GCMD’s Directory Interchange Format (DIF) and Service Entry Resource Format (SERF), enabling interoperability with each of these catalog systems. These standards, along with a few others, are introduced in the following section. For more detailed information about any of these standards, there are links to numerous tutorials and guides, as well as CIESIN’s “Guide to FGDC-Compliant Metadata.”


The Content Standard for Digital Geospatial Metadata (CSDGM), more commonly referred to as the FGDC standard (both terms are interchangeable), was developed specifically to provide a common set of terminology and definitions for documenting digital geospatial data. It has been the federally endorsed metadata standard for geospatial data in the United States. since 1994, when President Clinton signed Executive Order No.12906, mandating that all federal agencies and organizations receiving federal funds document their geospatial data using this standard. It is currently used in more than 200 national and international catalogs and clearinghouses.

The standard is large and fairly complex and specifies the content of some 330 metadata elements, though only a subset of the 330 elements are mandatory. Elements are specified as either “mandatory,” “mandatory if applicable,” or “optional.” Using only the minimum number of mandatory elements may be sufficient for data discovery. It may also be an appropriate set for initial documentation. However, to assist in data transfer and processing, the minimum required elements may not be sufficient.

Despite the size and level of complexity of this standard, it is fairly flexible. It can be modified for specific data types by employing an endorsed “profile” or “extension”, which is simply an addition or a simplification to the standard. Two profiles, the Biological Data Profile and Metadata Profile for Shoreline Data, and one extension, the Extensions for Remote Sensing, have been endorsed by the FGDC. Although the FGDC standard was developed for geospatial data, the Biological Data Profile, developed for the National Biological Information Infrastructure (NBII) of the U.S. Geological Survey (USGS), is a prime example of how the CSDGM can be used for data that are not geospatial in nature (such as results from laboratory-based research). The CSDGM is used in the NBII Clearinghouse initiative to provide metadata-based descriptions of non-data resources, such as software tools, data applications, reports, Web sites and other information products.

The Directory Interchange Format (DIF) is a de facto standard established by the NASA’s Global Change Master Directory (GCMD) in collaboration with the Committee on Earth Observation Satellites (CEOS) International Directory Network (IDN). It was developed to describe and catalog Earth science and Global Change related data sets from a wide variety of disciplines, and serves the user community in the discovery of Earth science data. It is used in a number of national and international catalogs, most notably the GCMD. The DIF is considered to be that set of attributes that are essential for a user to determine if a dataset meets their needs. It does not contain information concerning transfer and processing of data as does the FGDC standard. The DIF is compliant with the ISO 19115.

The Service Entry Resource Format (SERF) is another de facto standard established by NASA’s Global Change Master Directory (GCMD). It is used for directory entries describing a service, rather than data. The SERF was established for describing services directly related to the “processing, viewing, analysis, archival, retrieval, production, interpretation, acquisition, formatting, or indexing of Earth science data” —the tools a GCMD user may need for manipulating data. Like the DIF, it too is intended to serve the user community in discovery. It is a very simple standard, requiring only seven elements.

The International Standards Organization (ISO) is a non-governmental organization that establishes standards to facilitate the international exchange goods and services. The ISO TC211 is a special committee that works on standards for digital geographic information, including the ISO 19115 standard for geospatial metadata. In February 2001, the standard was approved for publication as a “Draft International Standard (DIS).” Its development was influenced by several standards, including the FGDC and ANZLIC, but is more comprehensive than any of them. Finalized in 2003, FGDC and ANZLIC are establishing “profiles” consisting of metadata elements unique to the ISO 19115 standard. All the major standards will be interoperable with this standard.

The Australia New Zealand Land Information Council (ANZLIC) develops policies, standards,and procedures to manage the capture, storage, maintenance and transfer of land and geographic information in Australia and New Zealand. The standard contains fewer elements than the FGDC standard, but is consistent with FGDC’s guidelines on geospatial metadata.

The Dublin Core Metadata Initiative (DCMI) is “an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems.” Dublin Core was developed specifically as a simple and flexible metadata format for cataloging and facilitating the discovery of electronic resources and document-like objects on the Internet. It is much simpler than any of the previously discussed standards, as it was specifically intended for resource discovery. Although we do not use the Dublin Core standard, we felt it worthy of mention for several reasons. It is widely used in the library community and there are several initiatives for mapping the Dublin Core standard to FGDC and DIF. Successful mapping between these standards allows for interoperability with catalogs using the Dublin Core standard.

The Data Documentation Initiative (DDI) is an effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences. The DDI metadata specification originated in the Inter-University Consortium for Political and Social Research (ICPSR) and is now the project of an Alliance of about 25 institutions in North America and Europe. Together, the member institutions comprise many of the largest data producers and data archives in the world. Virtually every kind of body of data is found in one or more of the archives.


How do I make my metadata accessible?

A catalog or clearinghouse is generally the destination of the metadata record, providing access to one’s metadata records, and therefore to one’s data or resource. At CIESIN, we have several different catalogs as targets for our records; see our catalogs page.


