Getting StartedwithLR Data Services

We notice you're not using Firefox. Some of the "Try It" features are only supported Firefox as they require a feature, E4X, that is present only in Mozilla's Javascript engine, which CouchDB uses. Please use Firefox if you wish to use "Try It" to it's fullest.

What Was Data Services?

Make it easier to get relevant data.

Get clean data.

In depth scrutiny of the LR data.

I.E. Schemas & Signatures validated, envelopes filtered.

Mostly external, complementary to a LR Node.

What Is LR Data Services Now?

A Way to Get the Data You Want

Only interested in one specific kind of data? Data Services aims to provide a way to extract the data that's relevant to you through simple customization.

It's A Design Pattern, With Some "Batteries Included".

Follow simple conventions to identify discriminators within the LR documents

Reuse or modify community-sourced libraries to aid in extracting data for your use case

A Path Towards Making LR use Fewer Resources

We know LR can potentially hold mountains of data, and has a few data extraction services that utilize loads of storage space. This is a way to extract data in a very focused manner, which will result in substantial storage savings.

What Is a Discriminator?

Discriminators in Data Services are the characteristics of the data that you've identified that you want to capture.

This allows you to exclude data you do not know how to process or recognize as quality information.

For example, if you were only interested in ASN urls contained within the <conformsTo> element of XML metadata that uses a Dublin Core Terms schema, the identified ASN url becomes your discriminator.

What Can Data Services Do?

Following the conventions outlined, you will be able to extract data using:

Resource Locator by Discriminator

Resource Locator by start of Discriminator

Resource Locator by Timestamp

Discriminator by Resource Locator

Discriminator by start of Resource Locator

Discriminator by Timestamp

Discriminated Resource Locator by Timestamp

All Discriminated Resources

The Alignment to Standards Prototype Example

The prototype implementation can provide a Data Service that will allow use of the extract service to request resource data and aggregations where ASN's are Discriminators, `resource_locator`'s are Resource resource_locator, and `node_timestamp`'s are Timestamps, which will enable us to get:

`resource_locator` by ASN

`resource_locator` by start of ASN

`resource_locator` by `node_timestamp`

ASN by `resource_locator`

ASN by start of `resource_locator`

ASN by `node_timestamp`

ASN `resource_locator`s by `node_timestamp`

All ASN `resource_locator`s

Conventions

To follow the K.I.S.S principle, we're adopting a set of conventions to make adding new data services simple. These conventions are part of a CouchDB design documnent.

A field named dataservice so the extract service can locate the implementation. The field will contain a map that indicates the name and description of the data service. More details for this TBD!

Views must follow a specific naming convention.

View functions must emit specific keys according to the view they implement.

Timestamps must be represented as seconds from epoch.

List should be named respective of their output format. Included in the prototype is a reference sample for to-json.

The Extract Service Output Specification

The Extract Service Example Output

This is an example of the data service output with doc_ID's being returned.

List Functions in Detail

List functions are responsible for formatting the response for the service output.

As the previous slides defined and displayed there are two basic parts to the response, the response wrapper, and the results.

Data Service Response Wrapper

The list function will need to group each result in the "documents" property of the response wrapper.

Data Service Result

Review of Prototype List Implementation

List functions group results and embed results into the Extract service's format for records in the documents list. Because the amount of data requiring processing could be large, we must try to always design list functions to buffer little and mostly stream.

Customize or Make your Own List Function

The prototype implementation should suffice as an adequate skeleton for making enhancing the prototype or building your own from scratch

Start out by copying one of the samples.

Don't forget any required files withing the lib folder. These helper functions have been abstracted to accomodate varied discriminator formats. In most cases unless you need supplemental_data, you can use the default to-json as is without modification.

Remember to stream as much as you can, this will help keep your system resources under control.

The Extract Service

Provides a common HTTP interface to access data services with simplified parameters.

What it does is create a bridge between the raw CouchDB view and list functions and query parameters, so you don't have to fully understand much of the complexity of CouchDB's query solution. The Extract service transforms your parameters into the equivalent query for CouchDB

If running your own node, and you understand CouchDB's API, nothing prevents you from using the views directly for doing queries not supported by the extract service.

A partial resource locator you wish to harvest data that uses the specified value as a prefix. (i.e. resource-starts-with=http://shodor.org will return all resources from http://shodor.org.

discriminator-starts-with

The partial discriminator you wish to harvest data to be used for find the range that uses the specified value as a prefix.

ids_only

Presence of the value will cause the resource_data values to be a list of doc_ID's instead of full resource_data documents (default behavior)

Extract Service Parameter Matrix

Data services aims to allow you to narrow in on your data. So not all parameters work as expected with all data service views. Here is a breakdown of what parameters work together with specific view. Remember not to be concerned about the characteristics of the data returned, because the map functions have already taken care of that for you. If the data-service map functions only emit keys for documents that contain the word "exciting", that means anything this service returns will be at the very least, "exciting". The parameters just control how much to return.

View

Parameter Set

Description

discriminator-by-resource

resource

Get a list of discriminators for a specific resource locator.

discriminator-by-resource

resource-starts-with

Get a list of discriminators that where the resource locator starts with a specified prefix.

discriminator-by-resource-ts

resource, from, until

Get a list of discriminators for a specific resource locator between for a specified period of time.

discriminator-by-resource-ts

resource-starts-with

Get a list of discriminators that where the resource locator starts with a specified prefix, include the timestamp in the result

discriminator-by-ts

from, until

Get a list of discriminators for a specified time period.

resource-by-discriminator

discriminator

Get a list of resource locators for a specified discriminator.

resource-by-discriminator

discriminator-starts-with

Get a list of resource locators that start with the specified discriminator as a prefix.

resource-by-discriminator-ts

discriminator, from, until

Get a list of resource locators that for a specified discriminator for a specified period of time.

resource-by-discriminator-ts

discriminator-starts-with

Get a list of resource locators that start with the specified discriminator as a prefix. Timestamps are included in the output.

resource-by-ts

from, until

Get a list of resource locators for a specified period of time.

Extract Service API - Try It

Here are some example requests that you can try against a real data service install. Click on the line to populate the example into input box or edit by hand. Click Run to execute the request. This can take a bit of time, and cause your browser to complain. It's okay to click "Wait" or "Continue" if prompted by your browser until it completes.

GET /extract/standards-alignment-lr-paradata/resource-by-discriminator?ids_only&discriminator=["matched"]

GET /extract/standards-alignment-lr-paradata/resource-by-discriminator?ids_only&discriminator-starts-with=["matched","http://purl.org/ASN/resources/S"]

GET /extract/standards-alignment-dc-conformsTo/discriminator-by-resource?resource-starts-with=http://www.shodor.org/interactivate/activities/Advanced

GET /extract/standards-alignment-lr-paradata/resource-by-discriminator-ts?ids_only=true&discriminator=["matched","http://purl.org/ASN/resources/S1000132"]&from=2012-02-28T16:59:31Z&until=2012-02-28T16:59:31Z