AQ Use Case 3: Publish, Harvest, and Query Metadata via Clearinghouse

Transverse Use Cases 1-10 are
activity / capability elements common to most SBA scenarios. This is
a workspace for the implementation of Use Case 3 specific to the Air
Quality/Health SBA. The implementation of the UC components is accessible through the links.

Use Case 3 intends to verify the Publishing, Harvesting and
Discovering of metadata through the GEOSS Common Infrastructure (CGI)
components GEOSS Registry and GEOSS Clearinghouse and the Air Quality Community Catalog.

Actors and Interfaces

Actors:

AQ Community Catalog (Catalog Service Provider, CSR);

AQ Community Portal; General AQ Client

Interfaces:

AQ Community Catalog <- GEOSS Clearinghouse(s);

GEOSS Clearinghouse(s) <- AQ Portal & AQ Client

Initial Status and Preconditions

AQ Community Catalog (AQComCat), populated with Service Records (as needed)

Use discovery metadata to filter clearinghouse contents - (see Use Case 4 for using discovery metadata to filter)

For each 'Discovered' record, returns discovery metadata + link to full submitted metadata ??? (ESRI and USGS return the a link to full submitted metadata record

Post Condition

The GEOSS Clearinghouse is prepared to accept and process resource discovery queries from Clearinghouse clients and return desired metadata records. Clearinghouse has a service interface (API) for machine communication ????

Fields that must be entered by hand, not found in the OGC doc (Orange fields)

Fields that could be captured in OGC GetCap, but need to establish a convention (blue)

INSPIRES ISO 19115 is also compared to the core metadata and it is found to have a few more fields mainly about data usage/access constraints and provider information. This is another convention decision that needs to be made.

aqcommcat_discovery

aqcommcat_discovery

Goal for AQ Implementation of Use Case 3: To allow data providers to register their data in AQ community catalog that has a catalog service (CSW 2.0.2) registered in GEOSS Service Registry. The registered community catalog will then be harvested by GEOSS Clearinghouse and AQ Community Portal making data discoverable through the Clearinghouse.

- Used Dataspaces for Abstract and cut part of abstract into lineage statement if it told where dataset came from.

- Fixed metadata contact

- Fixed keywords and topic categories for AQ.

2. Make several ISO 19115 records by hand using the valid templates. (Done)

3. Put in the WAF (Done)

4. Register WAF as component in GEOSS Registry and registered WAF as service underneath the component (Done)

* In updated CSR include how often harvesting should be done. and harvesters will harvest on a schedule.

5a. Alert GEO Portals that the WAF is registerd (Done)

6a. Test GEO Portal Harvest (Done)

7a. Access Data access services through GEO Portals.

8a. Access raw metadata through GEO Portal RSS Feed (done)

Tasks for ISO Metadata:

1. Restructure catalog records in WAF, so that one record describes dataset with many parameters encoded.

- Currently we point to the getcapabilities, but the user doesn't know which

2. Restructure OGC GetCapabilities docs so that each GetCapabailities is for one dataset and can be used to automatically create the ISO 19115.

3. Find additional fields not in GetCapabilities that need to be entered by the provider to register in com. cat - if any.

CSW Interface: CSW 2.0.1: http://gesg.gmu.edu:8083/aq/srv/en/csw?request=GetCapabilities&service=CSWAll metadata created by Erin through the WAF are ingested and hosted by the CSW instanceSimple CSW 2.0.1 client is developed and will be deployed to the AQ portal and will be ready for test by next telecon. Upgrade from 2.0.1 to 2.0.2 is done initially and under test, will update when done. Finished 2.0.2 CSW will be registered into the GEOSS clearinhouse.

Tasks for CSW Interface: - CSW 2.0.2 will be integrated with the Semantic search. - All catalogs connected with ESIP portal will be added to the semantic search within AQ portal- Air Quality related semantic search will be added.

2. Register CSW service in GEOSS Service Registry and remove the WAF component

3. Test harvest and access of data access services through the GEO Portals

parallel tasks:

- Automatically generate DataFed records once we have a valid ISO 19115/19119 schema for discovery

- With ISO 19115/19119 template for discovery, we can work with GIOVANNI/others? to create additional metadata records for WAF

- Continue to add more metadata to the ISO records about usage, lineage etc.

- Need to figure out how dataspaces feeding in User-added content to ISO records and ISO feeding content to the wiki

CSW 2.0.2: The GEOSS Architecture Implementation Pilot CFP describes the community catalog records and the catalog interface to the GEOSS Clearinghouse in Annex B, Architecture. It says that ISO 19115 is recommended geospatial standard in the GEOSS 10-Yr Plan and that it will be listed in the GEOSS Standards Registry as a comprehensive geospatial metadata standard (Pg. 67). The Pilot also identifies that this second phase Pilot should: "focus [on] common metadata (core queryables and common responses) in the 2nd Phase of AIP is recommended to be the OGC CSW:Record which in turn is based on the Dublin Core metadata with a handful of extension elements' that support greater search/return by GEOSS communities. One of those 'profiles' is the CSW ISO 19115 Profile. The CSW core queryable fields for a dataset plus those needed for ISO19115 Profile are the white column in the table (ISO 19115/19119-Appl Profile CSW 2.0 pg. 30, tables 6, 10, 11).

The AQ Community uses standard OGC data access services, WMS and WCS, so we plan to build our catalog by harvesting the metadata already included in the WMS/WCS GetCapabilities documents and then including the addtional queryable fields needed. KML is included in the green columns because the KML could be extended to hold more metadata and those files could also be harvested by the community catalog.

Add tutorial to guide through process.

Discussion: Use space below for discussion on this topic. New threads are at the top.

From Erin (1/19/09): Hi All, The AQ WG is creating a Community Catalog in order to register the metadata for AQ datasets and make the datasets discoverable and accessible through the GEOSS clearinghouse. The Community Catalog will have a CSW 2.0.2 interface. We have started a workspace page on the google sites to outline the process of: creating the community catalog to ultimately discovering and accessing datasets through clearinghouse. We are going to start our catalog using the CSW record (from Record.xsd ) that is common to all CSW 2.0.2 implementations (neon green cells in image). I saw that this is what ESRI is using now. Then we'll test the access of our datasets through the clearinghouse/GEO Portals. After that we will expand our catalog to include the other fields in the attached image for the ISO 19115 Profile in CSW 2.0.2. From the CCRM perspective do you all see any issues with this method? With the catalog fields the next step for us is to implement the service - I think we can use the CSW2.0.2 GetCapabilities and DescribeRecords from Marten as templates, since our catalog will have the same fields. Marten, could you or someone else point me to what the GetRecords and return for that would look like? Is there anything that we are missing?

From Marten: hi, are you going to build a catalog service from scratch? or implementing available software? there are many aspects of CSW to account for that a single sample won't cover. I suggest to setup a conference call this week. would that work? I can host a webcast where I show some sample queries/responses

From Erin: Marten, other than ESRI GIS Portal Toolkit is there free software that we could implement? I looked at Geonetwork and it looked like they only support CSW 2.0.1. Our programmer is looking at the CSW today - he may be interested in a webcast/telecon later this week. I'll check. Thanks for the offer. I'm asking about the open source so that this can be more reusable. Thanks again!

From Erin Robinson:Hi Archie -Thanks so much for exposing what you are harvesting and the harvest is ok. This will be really helpful for AQ b/c we can look at these sites that are ok as examples of catalog services. We are going to implement a CSW 2.0.2 for our AQ Community Catalog as part of Use Case 3 of the pilot. For the listed CSW 2.0.X could you link to the get capabilities and describe record xmls?

From Archie: I wanted to shorten the links. The GetCapabilities link should always be the same - just add "?SERVICE=CSW&REQUEST=GetCapabilities" Sometimes there seem to be problems with case-sensitivity, but I claim those are bugs. The spec clearly gives the above. I thought about making the URLs links - I suppose I could go ahead and do that. Think it would be useful from the context of the spreadsheet? The DescribeRecord link ought to be derivable from the Capabilities response. I'm not tracking those anyway since we don't really use the DescribeRecord response contents - what we're extracting to our internal database is pretty modest.

From Erin Robinson: Maybe just a note on the page that if you want to see get capabilities add "?SERVICE=CSW&REQUEST=GetCapabilities" to the links below. The few links I tried didn't return anything or returned an error (JAXA) and I think they'd return capabilities if they had the extension.

From Archie: OK. If adding the string to the URL doesn't work in a particular case,I should probably add the URL that does work anyhow.

From Erin Robinson:For the AQ WG implementation of Use case 3 , I have been working on a comparison of queries by each of the GEO Portals and the CSW 2.0.2. required fields. Would you mind looking at this comparison and see if FGDC portal query is right - to create the table, I went to the FGDC portal and marked what I could search on as the queryable fields? Are the fields you harvest for the FGDC portal from the CSW2.0.x catalogs extracted from the describe record.xml?

From Archie: Probably. I haven't written the complete ISO record parser yet except for a pretty simplistic handling of some WMS records I get from GOS. We're really trying to do the bare minimum handling we can get away with. But no less. Most of the records harvested in right now are from the CSR, or are FGDC records obtained via Z39.50.

From Erin Robinson(1/9/09): Hello AQ WG and CCRM WG, The AQ WG has been working on development of a community catalog that will allow AQ datasets provided by the community to be accessed through the GEOSS Clearinghouse. Use Case 3 for AIP-2 is designed to publish, harvest and query metadata systems so that component and services are discoverable through GEOSS Clearinghouse. This task seems to match the AQ Community Catalog goal. I have posted an initial method and suggested set of minimum fields for the AQ Community Catalog to the AIP-2 Google Site as a way to work on this AQ implementation of use case 3. It needs community participation - both from the technical perspective of the CCRM WG as well as the AQ WG. From the CCRM, I think it would be helpful to have instructions/examples to implement the CSW 2.0.2 service. From the AQ WG perspective we need to agree on a common set of catalog fields that we will use to search and harvest data. I'd like to suggest that we use the comment part of this page to respond in order to capture the process that we will be going through to implement use case 3. I've posted this as a comment to get things started.

This CSW service is part of the GIS Portal Toolkit. I’m not sure what your approach is to provide CSW, but you may want to consider using GIS Portal Toolkit.

From Erin Robinson:Marten, We aren't exactly sure what our approach to provide CSW will be. We had thought we'd create a sql database and then provide the service interface on top of that. Would this work? Would the GIS Portal Toolkit provide an easier method? Also, I noticed the record response didn't include time of the measurement. Could we extend the record return to include temporal extent?

From Marten:If you choose to use GIS Portal Toolkit (GPT) then you won’t have to do any coding. It comes with its own CSW interface. I checked and saw that WSU already has a license for GPT (since 2006) and that WSU has a Educational site license to ESRI software. I checked my records and found Scott Horn (shorn@wustl.edu) took a training class in November 2007. If Scott is involved he may already have GPT 9.3 setup, otherwise we can arrange for a license. GPT 9.3 has a no-cost license. GPT can run with Oracle, SQLServer, and PostgreSQL databases. More info on http://www.esri.com/gisportal If you register with the ESRI GEOSS portal and send me your username, I can give you publishing permissions. That will give you a good idea of what you can do with GPT.

From Zhenlong Li: Some changes are done to AQ Portal in Tools tab. 1. Re-built the Time-enabled WMS Viewer portlet to fit the portal.2. Trim the boundary of the semantic search tool: GEOSS site link: http://geoaip-aq.wustl.edu/web/guest/4