What is Clearinghouse?

The NSDI Clearinghouse Network, sponsored by the FGDC, is a
distributed system of agency servers located on the Internet that
contain field-level descriptions of available and planned digital
spatial data, applications, and services. This descriptive information,
known as metadata, is collected in a standard format to facilitate
query and consistent presentation across multiple participating sites.
Clearinghouse uses standards-based Web technology for the publication
and discovery of available geospatial resources through the Geospatial Platform portal.

The fundamental goal of Clearinghouse is to provide access to
digital spatial data and related online services for data access,
visualization, or order. The Clearinghouse Network functions as a
detailed catalog service with support for links to spatial data and
browse graphics. Clearinghouse metadata are expected to include
hyperlinks to online resources (e.g. map services, data download
locations, data access services, applications) within their metadata
entries to enable access to all facets of the described resource. Where
digital data are too large to be made available through the Internet or
the data products are made available for sale, linkage to an order form
can be provided in lieu of a data set. Through this model,
Clearinghouse metadata provides low-cost advertising for providers of
spatial data, both non-commerical and commercial, to potential
customers via the Internet.

Clearinghouse allows individual agencies, consortia, or
geographically-defined communities to band together and promote their
available digital spatial data through a federated metadata service.
These servers may be installed at local, regional, or central offices,
as dictated by the organizational and logistical efficiencies of each
organization. All Clearinghouse servers are considered "peers" within
the Clearinghouse activity -- there is no hierarchy among the servers
-- permitting query by any user on the Internet with minimum
transactional processing. When these Clearinghouse services are
registered with the Platform portal, the system will harvest and cache
a copy of the metadata for rapid retrieval, enabling search through a
single interface to all registered assets in the U.S.

Why promote the Clearinghouse Activity?

The development of the Clearinghouse among U.S. Federal agencies was
motivated by a desire to minimize duplication of effort in the
collection of expensive digital spatial data and foster cooperative
digital data collection activities. By promoting the availability,
quality, and requirements for digital data through a searchable on-line
system a Clearinghouse facility would greatly assist in coordination of
data collection and research activities. Clearinghouse also provides a
primary data dissemination mechanism to traditional and non-traditional
spatial data users.

Federal participation in the Clearinghouse is directed by Executive Order 12906 through its official creation
of the National Spatial Data Infrastructure. The FGDC is co-chaired by
senior officials in the Department of Interior and the Office of
Management and Budget.

Why not just use Internet search engines?

Digital spatial data and metadata are stored in many forms and
systems which make their discovery on the Internet difficult.
Structured metadata is typically exchanged in XML format with
significant meaning stored in 'fields' or XML elements rather than the
HTML documents typically indexed in search engines. Use of current web
indexing technology offers literal text search and matching for
metadata which happen to be stored in HTML, but do not generally
provide the indexing required for search of coordinates, dates and
times, and other numeric values. In addition, some entire collections
of metadata are being managed within dynamic databases whose content is
not accessible to search engines. The Clearinghouse functionality as
implemented in the geodata.gov portal goes beyond existing search
engine technology to include spatial query and permit simple search of
metadata based on location and full-text search. Field-level search is
also available to refine searches based on topical classification,
geography, time, and other key fields in ways not possible with
off-the-shelf search engine technology.

The general trend toward connectivity of spatial data producers,
vendors, and users on the Internet coupled with the provision of online
data via web services indicate a long-term public commitment to not
only on-line data discovery but direct data access by client processes
across internal and public networks. Clearinghouse provides a
standards-based solution to catalog interoperability on the Internet
today.

Who should participate in Clearinghouse?

Although initially targeted at federal agencies, the NSDI
Clearinghouse Network includes numerous federal, state, university, and
tribal metadata collections. Hundreds of metadata servers are also in
operation outside the United States supporting the same
interoperability standards. In short, any group regardless of size may
publish their metadata to the Clearinghouse and make it visible in
geodata.gov. Similar publishing portals exist in other countries for
the coordination and publication of geographic resources outside the
U.S. The federated catalog behind the NSDI Clearinghouse Network is
also registered with the Group on Earth Observation (GEO) and its
Global Earth Observation System of Systems (GEOSS). Thus U.S. content
is now also visible via the GEO Web
Portal.

The role of the FGDC in Clearinghouse is to collect stakeholder
requirements, design and deploy federated search, discovery, and access
solutions for the U.S geospatial community. The Geospatial Platform, in
concert with the data.gov initiative, provide community coordination of
the Clearinghouse, catalog, and its contributions to visualization,
analysis, and application development in the emerging Platform
environment. It is not the intent of the FGDC to create a centralized
data system but to facilitate access to agency-operated distributed
stores of spatial metadata, data, and services on the Internet.

What are the requirements for being a Clearinghouse provider and
user?

A prospective spatial data publisher must have a public-facing web
server with online access to metadata, catalogs, and spatial data. It
is recommended that metadata services be co-located on hosts with
spatial data collections to encourage synchronization between the
spatial data, services, and the metadata being served. A publisher can
share metadata through either 1) a Z39.50 server, 2) an OGC Catalog
Server (CSW), or 3) a Web Accessible Folder (WAF) -- a browse-enabled
directory on a host organization's web server that holds the XML
metadata for direct harvest by the portal. An online registry is operated by the
FGDC to track the operating details of existing Clearinghouse metadata
services. Prospective users of Clearinghouse must have access to a
current Web browser with a broadband connection to the Internet. Search
and visualization interfaces exist at geo.data.gov and GeoPlatform.gov
to provide custom levels of search access.

What information is accessible through Clearinghouse?

A "digital geospatial data set" is the primary item being described
with metadata in the Clearinghouse activity. The definition of a data
set can be adjusted to meet a given agency's requirements but it
generally corresponds to individual identifiable data products (e.g.
file, layer, service) for which metadata are customarily collected.
This may equate to a specific satellite image, a shapefile, or a
national vector data set, as managed by a data producer or distributor.
Collections of data sets (e.g. flight lines, satellite "paths", map or
data series) may also have generalized metadata that could be inherited
by individual data sets.

Other geospatial resources may be described in the FGDC or ISO
metadata, including online services (Web Map Service, Web Feature
Service), data download locations, interactive web applications,
documents, and other web-accessible resources. The Geospatial Data
Presentation Form field in the metadata record can store this
information, though other context can be inferred from the style of the
URL. Also, FGDC metadata allows for multiple online linkages to be
maintained in a metadata record, so multiple facets of the geospatial
resource may be described.

How does Clearinghouse work?

To provide search interoperability among different servers of
geospatial metadata, the search and retrieve protocol known as ANSI
Z39.50-1995 (ISO 23950) was initially selected by the FGDC
Clearinghouse activity. Although in use by a few organizations today,
it has been effectively replaced by the Open Geospatial Consortium
(OGC) Catalog Services specification, more specifically the HTTP
version known as Catalog Service for the Web (CSW). Multiple catalog
services and metadata collections (WAF) are registered with the
GeoPlatform.gov site. A periodic harvest of all metadata is performed,
and all metadata are indexed for search, as if all the metadata and
data resources were consolidated in one location, though they are
actually distributed among the agencies. This federated model preserves
the notion of 'data closest to source' allowing agencies full control
of the content, metadata, and update frequency.