This version is closed except for technical editing by Jose Cuadra and Bob Morris in furtherance of TDWG submission. Nobody else should edit it.

The Schema Draft version has been advanced to 0.9 to change the architecture of the ServiceAccessPoint class, making what were previously access point properties into values of a single property of the access point. Track the history for details.. BobMorris

If you are unfamiliar with MRTG, please read the MRTG Non normative document before editing this page. It lays out why there is perceived a need for a biodiversity media resource metadata schema, and how we attempt to use existing metadata standards where possible. This page should be largely confined to discussion of what's missing, and what's good or bad about the specifications that are here. Critique of the architecture as a whole should be at MRTG Architecture

To reduce edit conflicts, please read and apply the suggestion in Template:WIP. Remember that the WIP warning will not be visible until you save.
Sign and date your comment entries. This is best done by using the signature icon in the edit button collection at the top of the edit window.

The Comments field in the item description should be used for comments proposed to be part of the standard. Comments on the standard should be in the Discussion section. Sometimes I have refactored the spreadsheet items to reflect that distinction.

Welcome User:JoseCuadra of GBIF who will be working on the XML Representation of MRTG v0.8

Should there be an optional(?) boolean property IsPrimaryRecord that advises that the record has defaults or other information that applies even in the face of other MRTG records, e.g. in other languages? --BobMorris 21:08, 28 March 2009 (CET)

Suppose I have records in two languages. Only the six mandatory items are required to be in each record. It might be a good strategy to make a third record that has whatever language neutral optional stuff there might be. This could be quite a lot, whereas the language specific stuff might be small. That strategy would be enhanced if a client that asks for the language specific record can, from it, determine that it is advisable to seek one further record---let us call it the "primary record"---with language-neutral data in it. Absent some mechanism supporting that, a provider's only strategy is to supply all the language-neutral data in every language specific record. --BobMorris 14:59, 29 March 2009 (CEST)

I believe this may be a best practice recommendation, but I miss the need for a separate data element. It certainly would be good to clarify handling of metadata in multiple languages. It is standard practice to supply missing information from other languages even in non-professional software, where non-translated parts of the interface may appear in English. We could add a recommendation that in the presence of multiple languages, it may be advisable to provide one record with metadata language = language-neutral (ISO code “zxx”), and that user clients are expected to add this to the language-specific metadata. Can you rewrite that and add to metadata language? --GregorHagedorn 18:47, 29 March 2009 (CEST)

I'd like to get rid of the two orphaned namespaces, "photo:" and "eml:". Anything better? I have omitted mention of them in the non-normative document v 0.6.5 which names only the "principal" namespaces. --BobMorris 03:15, 30 March 2009 (CEST)

General discussions

Do we need a separator convention for multiple values? --BobMorris 18:34, 8 February 2009 (CET)

We could give a recommendation to use the semicolon rather than the comma in cases where multiple values are expressed using a separator. In the case of xml/RDF repeated element occurrence should be the canonical method. Do we agree? --GregorHagedorn 14:08, 22 February 2009 (CET)

Per dinner conversation today, we should clarify who the audience is for this schema. I said it is not image end user human consumers, but rather image producers, their programmers and DBAs. These actors have the responsibility to make that useful to clientele. This means close examination of which fields should have machine readable focus, perhaps having both machine readable, strongly typed representation, and plain text.--BobMorris 22:21, 22 February 2009 (CET)

We've discussed this before, and I agree with the above. I jsut want to add that although the audience is primarily programmers and the schema must be machine readable, of course the goal is that the end user finds the metadata products informative and beneficial for discovery. Also, most initial mapping is done by humans (at least for my project), so the vocabulary must be human-readable.AnnetteOlson 00:11, 4 February 2010 (CET)

Open from Recommendations to GBIF: support metadata provider GUID schemes for identifying resource. Are we adequately documented? --BobMorris 16:35, 27 February 2009 (CET)

This is under active discussion in the TDWG-TAG mailing list and we should not discuss it here. Where we require a GUID, we don't require a specific scheme.

We have no(?) general extension mechanism or advice for adding new Items. --BobMorris 17:38, 14 March 2009 (CET)

Missing elements

Terminology of this specification

There are many ways to organize metadata specifications, particularly as to the nomenclature of the constituents of the metadata. In this document and the associated non-normative documentation, we willfollow closely (sometimes verbatim) a portion of the Dublin Core Metadata Initiative (DCMI) metadata nomenclature as described in Section 2.3 of the DCMI Abstract Model (http://www.dublincore.org/documents/abstract-model/).

A term is a metadata item that forms part of the description of a multimedia resource.

A term has a type which is one of "Property" or "Class", We refer to a term of type Property as simply a Property, similarly for Class.

A value is a resource - the physical, digital or conceptual entity or literal that is associated with a property when a property-value pair is used to describe a resource. Therefore, each value is either a literal value or a non-literal value.

A literal value is a value which is a literal.

A non-literal value is a value which is a physical, digital or conceptual entity.

A literal is an entity which uses a Unicode string as a lexical form, together with an optional language tag or datatype, to denote a resource. In MRTG, the language tag appears as a value assigned to the metadata record.

A Property is a term that has a value. The datatypes of values are specified in this document. Typically the values are either a member of a fixed set of literals, a URI, a numerical type, free text, or the datatype and values from an external controlled vocabulary referenced in the standard.

A Class is a term that has a set of Properties. Thus, the values of the properties in this set define what it means for a resource (whether multimedia or not) to be a member of the class.. Typically if M is a resource and C is a class, we say "M is a C". We attempt to minimize the number of classes, because we want to support simple serializations, notably text files such as "Comma Separated Values" (CSV), in which structured representation is cumbersome or impossible.

A Vocabulary is a set of terms.

Multimedia Resource is anything that a provider identifies as belonging to one of the possible values of the MRTG Type term and one of the Subtype term values. A mechanism is provided by which providers can supply a privately defined subtype that will not collide with the MRTG defined Subtype values.

A MRTG record is a set of terms with any property values conforming to this document, and which contain at least the six mandatory terms described below, and which describes a single multimedia resource (possibly including a Collection). One of these, the value of Identifier is a Globally Unique IDentifier (GUID), which may have been assigned to the resource by an external authority or by the provider of the metadata record.

Every MRTG term has a plain text Name, a URI and a plain text normative Definition. URI's for terms conform the the http URI scheme (See, http://en.wikipedia.org/wiki/URI_scheme, http://www.w3.org/TR/uri-clarification/, or http://www.ietf.org/rfc/rfc2396.txt ). Informally, one may understand this thusly: an http URI has the syntax of an http URL, but there is no expectation that putting it in a web browser will result in any information being returned to the browser, and if there is, it may have no relevance. This conformance requirement applies only to the URIs that identify MRTG terms. Any others, such as might arise if the values of MRTG propertiers are taken from another controlled vocabulary chosen by the user, as a few MRTG properties permit. In this case, those values may involve URIs conforming to a scheme given by that external vocabulary.

Because http URIs are rather lengthy, MRTG documents follow a standard practice of introducing a short abbreviation comprising a "namespace qualifier" and a mnemonic name closely related to the term's Name. The result is known in XML parlance as a qualified name. For example the documentation below for the Identifier term renders its URI as " dcterms:identifier" but hovering over it will reveal that its actual URI is http://dublincore.org/documents/dcmi-terms/#identifier. In fact, most of the URIs for terms borrowed from external vocabularies (about half of them) do in fact resolve to something in relevant documentation for that external standard. Sometimes it is not precise because the documentation is a PDF document and several (different!) URIs might apparently resolve to the same place. Keep in mind that any fortuitous resolution of an http URI is not related to its use as an identifier, no matter how informative that resolution may be. That said, MRTG solicits discussion on the wiki at points where contributors find our association of a MRTG term with that from another standard as misleading or otherwise inappropriate.

Comments

Hmm. A printed copy of this cannot reveal the full URIs...--BobMorris 03:56, 27 March 2009 (CET)

The stuff about value above is verbatim from DCMI Abstract model. Should each point be quoted and have citation or is the stuff at the top enough? Also, it is fairly arcane, but Gregor complained about lack of definition of value (which is not a term in the DCMI Abstract Model).--BobMorris 03:15, 30 March 2009 (CEST)

Management Vocabulary

An arbitrary code that is unique for the resource, with the resource being either a provider, collection, or media item.

Comments:

Recommend to follow dwc best practices. Using multiple identifiers implies that they have a same-as relationship, i.e. they all identify the same object (e.g. an object may have an http-URL, and lsid-URI, and a GUID-number).

We need clarification about "unique". I favor GUID so that the Identifier unambiguously identifies the resource, no matter how one has acquired the metadata record. If we had a GUID for the metadata originator (provider?), then the pair <originatorGUID, Identifier> would suffice, but we don't have that. Even in that case we will need to say something like "the Identifier uniquely specifies the resource within all MRTG resources offered by the organization originating the metadata record"--BobMorris 01:12, 30 March 2009 (CEST)

I do not understand how an Identifier is repeatable. Can someone please explain what this means? Particularly if we seek a GUID, there should be a one-to-one correspondence between an identifier and a resource. --Steve Baskauf 20:29, 7 May 2009 (CEST)

Multiple identifiers must have a same-as relationship, i.e. all identifiers point to the same object. Requiring that the object has only a single identifier I believe is asking too much. Books have multiple ISBNs, a specimen will often have already have a museum-barcode identifier, etc. - still providing a URI for those in addition to existing identifiers would be desirable. I believe fitness for purpose of identifiers differs, depending on purpose... --GregorHagedorn 00:17, 11 May 2009 (CEST)

I understand the need of more than a single <dcterms:identifier> in the same context, but I'm afraid that it may create implementation problems: we may need to distinguish different <identifier>s because they should be rendered or processed in different way. In our case they only could be distinguished by their contents, that may not be always possible. To make schema more flexible and extensible, I would allow attributes from user's namespace (unspecified by schema like my:process_url="http://example.com/getMyImage/"). Such 'foreign' attributes are not allowed by DwC schemas but why not to have a different, <mrtg:identifier>? If an idea of any Attribute is not acceptable, I would suggest, at the very least, an optional id attribute for every element in MRTG namespaces (particularly in wrappers like <MRTGCore id="value">: another problem with <identifier> elements that they may not be suitable as primary key in the relational database, and in any case, ID attribute are so handy that I would not ignore such an opportunity. --AlexeyZinovjev 13:37, 27 November 2009 (CET)

Do we mean to require the dcmi URL, or do we accept these dcmi Labels? --BobMorris 17:38, 14 March 2009 (CET)

Now that DwC has been accepted as a TDWG standard, I have returned to a previous task, which was to try to hammer out a schema for the SERNEC plant image collection. This collection will integrate live plant images with images from specimens. Thus the schema will mostly be imported from the DwC schema (for specimen metadata) and the MRTG schema (for images) when it is done. Both schemas include the "dcterms:" namespace and accept dcterms:type as the element to identify the class into which the resource falls. However, the problem is that the recommended terms given under the DwC dcterms:type and the terms given here for dcterms:type are in conflict. Discussion Continues on MRTGv08 Type term inconsistent with DwCSteve Baskauf 05:34, 14 October 2009 (CEST)

This does not apply to Collection objects. The vocabulary may be extended by users provided they identify the term by a URI which is not in the mrtg namespace (for example, using "http://my.inst.org/namespace/metadata/subtype/repair-manual". Conforming applications may choose to ignore these.

I don't follow the extension suggestion. Values of mrtg:Subtype are not given a URI. Is the suggestion that people can define, e.g. myURI:Subtype? Is this kind of extension permitted anywhere? Do we mean to specify a recommended semantics for it? --BobMorris 04:17, 22 March 2009 (CET)

I think values of mrtg:Subtype are in the mrtg namespace. I added an example, does this make it clear?

--BobMorris 23:28, 15 August 2009 (CEST) says: This seems difficult to model in OWL 1.0 and maybe in XML Schema. In OWL, supports enumerated data types via the use of owl:DataRange, albeit clumsily. However, reasoning on enumerations is limited and may not support the ability to reason on extensions, possibly unless we simply make the Subtype values be of type xsd:string:

"Tools may vary in terms of support for datatype reasoning. As a minimum, tools must support datatype reasoning for the XML Schema datatypes xsd:string and xsd:integer. OWL Full tools must also support rdf:XMLLiteral. For unsupported datatypes, lexically identical literals should be considered equal, whereas lexically different literals would not be known to be either equal or unequal. Unrecognized datatypes should be treated in the same way as unsupported datatypes. OWL 1.0 DatatypeSupport

Oh, wait. In order to make MRTG Type be identical to dc:type, the type names have to be the names of subclasses of MediaResource. Thus so should be these. Then the extension issue goes away in RDFS at least.

Concise title, name, or label of institution, resource collection, or individual resource. This field should include the complete title with all the subtitles, if any.

Comments:

The title facilitates interactions with humans: e.g. the title would be used as display text of hyperlinks or to provide a choice of images through pick list. The title is therefore highly desirable and an effort should be made to provide it where not already available. The taxon name(s) will form a good substitute title.

Point in time recording when the last change to metadata (not necessarily the media object itself) occurred. The date and time must comply with the World Wide Web Consortium (W3C) datetime practice, which requires that date and time representation correspond to ISO 8601:1998, but with year fields always comprising 4 digits. This makes datetime records compliant with 8601:2004. AC datetime values may also follow 8601:2004 for ranges by separating two IS0 8601 datetime fields by a solidus ("forward slash", '/'). See also the wikipedia IS0 8601 entry for further explanation and examples.

Comments:

Use case: a) for incremental harvesting: holder of metadata who also holds resource may be receiving metadata more frequently than underlying resources and can figure out whether updating the resource is necessary. This is not dcterms:modified, which is referring to the resource itself, but not its metadata.

W3C datetime types were based on IS0 8601 but these standards are not the same. For example, xs:date or dateTime used by XML Schemas do not allow date ranges like 2009-01-01/2009-12-01; see also notes to another term(s) of datetime type below. --AlexeyZinovjev 21:12, 11 December 2009 (CET)

Language of description and other metadata (but not necessarily of the image itself) represented in ISO639-1 or -3.

Comments:

This is NOT dcterms:language[1], which is about the resource, not the metadata. This is deliberately single-valued, imposing a requirement that multi-lingual metadata be represented as separate, complete, metadata records in which also the language-neutral items appear. Consumers can re-combine records by identity of Resource IDs (which is highly recommended to supply).

A free-form identifier (a simple number, an alphanumeric code, a URL, etc.) that is unique and meaningful primarily for the data provider.

Comments:

Ideally, this would be a globally unique identifier (GUID), but the provider is encouraged to supply any form of identifier that simplifies communications on resources within the project and help to locate individual data items in the providers data repositories. It is the providers decision whether to expose this value or not.

Should this be part of the extended set, not core? (From spreadsheet) Yes. Now it is. --BobMorris 20:54, 2 February 2010 (CET)

There is a problem more general than associating a Commenter with their Comments if we allow repeatable Commenters. See Linking MRTG elements. --BobMorris 20:54, 2 February 2010 (CET)

We have the Reviewer Comments below, and the definition states for Commenter that the provider is NOT asserting Commenter has expertise, so I changed the Comments section for this item to read "makes no claim as to competency," and I took out the word "review," so as not to confuse it with the Reviewer Comments belowAnnetteOlson 00:11, 4 February 2010 (CET)"

If present, then resource is peer-reviewed. The notation of whether an expert in the subject featured in the media has reviewed the media item (or collection?) and approved its metadata description. Must display a name or the literal "anonymous" (= anonymously reviewed).

Date that the media resource was altered. The date and time must comply with the World Wide Web Consortium (W3C) datetime practice, which requires that date and time representation correspond to ISO 8601:1998, but with year fields always comprising 4 digits. This makes datetime records compliant with 8601:2004. AC datetime values may also follow 8601:2004 for ranges by separating two IS0 8601 datetime fields by a solidus ("forward slash", '/'). See also the wikipedia IS0 8601 entry for further explanation and examples.

I doubt that this can always comply to both standards at the same time -- W3C datetime (e.g., xs:date, xs:dateTime) and IS0 8601. For example, xs:date or dateTime used by XML Schemas do not allow date ranges like 2009-01-01/2009-12-01. At the same time, when looking at definition of dcterms:modified, I did not notice any restrictions to format in their schemas. This raises a question if using dcterms:modified with a claim that it should confirm to ISO is allowed, unless it's only a recommendation.--AlexeyZinovjev 21:12, 11 December 2009 (CET)

The date (often a range) that the resource became or will become available. The date and time must comply with the World Wide Web Consortium (W3C) datetime practice, which requires that date and time representation correspond to ISO 8601:1998, but with year fields always comprising 4 digits. This makes datetime records compliant with 8601:2004. AC datetime values may also follow 8601:2004 for ranges by separating two IS0 8601 datetime fields by a solidus ("forward slash", '/'). See also the wikipedia IS0 8601 entry for further explanation and examples.

I find it ambiguous whether it includes ranges---probably it does. Anyay, I have added a template invoked by {{MRTGdatetime}} intended to be in every term requiring a datetime. It attempts to disambiguate the range issue by refering to ISO 8601:2004, which explicitly includes ranges. Discussion and improvement of the issue should take place in the Template:MRTGdatetime --BobMorris 17:35, 29 March 2009 (CEST)

There is a semantic issue here: a resource may be available and later be made unavailable, but the metadata could usefully remain available, e.g. for occurrence evidence. As defined, that case is not covered. Is it important? --BobMorris 23:35, 13 March 2009 (CET)

I agree with this field referring to the resource, not the metadata…, so for this should be good. I do not think it is important to note when metadata is available unless the metadata is copyrighted, which is very rare in our community. If someone seeks info on date available for metadata, we can add it to the next version?…AnnetteOlson 00:11, 4 February 2010 (CET).

A policy governing addition of items to a collection. Examples are planned deliverables and estimate for future changes.

Comments:

Although an important management item, the relevance of this to consumers of metadata is limited to specific cases; e.g. where the Accrual Policy specifies that data are available only for a limited period.

Dublin core "rights" is potentially more general, but we follow the more specific use of IPTC CORE 1.1, i.e. focussing on copyright here. --GregorHagedorn 07:48, 4 May 2009 (CEST)

One of the problems is multiple languages. Plain text, but make recommended guidelines for originating nation. Responsibility of metadata provider to get that right. IF this information is not available, a mechanism should be provided to state that this. (not that it is empty for other reasons)" (From spreadsheet) -- I can't really parse this comment so I don't know how opine what is needed in the proposed item. --BobMorris 18:58, 7 February 2009 (CET)

For resources in the public domain, would a statement such as "public domain" be used to populate this element? There would be some value to having at least one element in the schema having a controlled vocabulary that would include "public domain" as a value so that users could search for copyright-free resources. Maybe a controlled vocabulary is excessive; rather one could simply state that this element should be give the value "public domain" if the resource is not subject to copyright.

I would agree with Steve - it is common practice to put public domain in this field if a resource is not copyrighted. NBII definitely puts in public domain here, and with us moving to GBIF and others a large number of federal, public domain resources, I think it is important to allow this. I would recommend revising the Definition above to indicate that option. I did go ahead and add it is an example AnnetteOlson 00:32, 4 February 2010 (CET)

The license statement defining how resources may be used. Information on a collection applies to all contained objects unless the object has a different statement.

Comments:

Example: "Available under Creative Commons by-nc-sa 2.5 license".This also informs on the commercial availability of items. Buying an identification tool or media resource is essentially the purchase of an individual license. Examples for such License statements: “Available through bookstores” for a commercially published CD, in License; “Individual licenses available for purchase” for a high-resolution image (note that the medium or low resolution levels of the same image may be available under Creative Commons!)

Gregor wants this mandatory with default value something like 'Consult copyright owner'. I find that superfluous and favor at most a best practice statement, or just remain silent on the point. Gregor argues that license is key to determining fitness for use, but I think that is true only when the use involves copy. Simply examining the object requires no license, nor, in some jurisdictions copying it for internal use without republishing. In addition, fitness for use of the underlying object is irrelevant to some uses of the metadata itself, e.g. as evidence of species occurrence, biological relationships, etc.--BobMorris 01:12, 30 March 2009 (CEST)

For many purposes such as creating species pages or identification keys, a permission is required. Embedding an image or deep linking to a video or sound stored on a different server is still considered a copyright violation in many countries. Linking to a portal where the user can find the image or sound (Bob's scenario) is not a violation, but also not very acceptable to the user. — Key to Nature would welcome incentive or additional motivation to point publishers to consider their position on a license but this could also be in instructions. It is also unfortunate, but probably on purpose, that the xmp term (UsageTerms) will point most publishers into another direction than "License Statement" --GregorHagedorn 07:48, 4 May 2009 (CEST)

This should not be mandatory, as it would not apply/is unnecessary for any public domain resources, which are the majority of ours media resources; we (NBII) do require it for copyrighted images but that is only because our policy is to serve only images under a Creative Commons license or similar defined usage; Other image galleries also have a history of recording Copyright info, but not License Statement, so requiring this field would need the creation of additional metadata. Finally, I would argue that linking to a portal where the user can find the image or sound is a very acceptable practice to many users, as it is the basis for users discovering our gallery, and thus permission to practice that does not need to be defined using License Statement. Strongly recommend this field, yes, require noAnnetteOlson 00:32, 4 February 2010 (CET)

The legal responsibility for choosing a correct graphical representation must lie with the provider of metadata and can not be assumed by a service that offers are search or reporting user-interface. Example:

I find the citation to photo: no more compelling than the IPTC "Generic Specification" 'Credit Line, which could be given the IPTC namespace without having to introduce the "Photoshop Implementation" citation. --BobMorris 03:15, 30 March 2009 (CEST) . I have changed the URI by fiat --BobMorris 18:55, 29 January 2010 (CET)

The item name "Object Logo URL" and definition seem to be in contradiction. Either it is a logo of the resource (e.g. a logo for an institution or movie, or an attribution logo (e.g., for an image the owners or providers logo). I therefore propose to change the item name to "Attribution Logo URL". --GregorHagedorn 11:30, 29 March 2009 (CEST)

The URL where information about ownership, attribution, etc. of the resource may be found.

Comments:

This URL may be used in creating a clickable logo. Providers should consider making this link as specific and useful to consumers as possible, e.g., linking to a metadata page of the specific image resource rather than to a generic page describing the owner or provider of a resource.

The term "Object Link URL" should be renamed - the definition and comment imply that it is tied to attribution. The desire to link this to specific metadata may be expressed in comments. --GregorHagedorn 11:30, 29 March 2009 (CEST)

An identifiable source from which the described resources was derived. It may be digital, but in any case should be a source for which the originator intended long-term availability.

Comments:

If image, key, etc. was taken from (i.e. digitized) or was also published in a digital or printed publication. Do not put generally "related" publications in here. This field normally contains a free-form text description; it may be a URI: (“digitally-published://ISBN=961-90008-7-0”) if this resource is also described separately in the data exchange. Can be repeatedable if a montage of images.

think important for copyrighted works, but question of definition of "published." includes when Flickr and YouTube. But needs to be a stable, long-term preserved resource. Will not change and disappear. Libraries best practice document says "Use [Source] only when the described resource is the result of digitization of non-digital originals. Otherwise, use Relation." Possible use of Relation field is "IsAVersionOf"

The IPTC Core Source field differs from what is considered DC source, which is "A related resource from which the described resource is derived." We use a similar definition - The related, non-digital resource from which the described resource is derived, but I think we all agreed that non-digital is not critical here, and should include both digital and printed publications. But I agree with point above that if digital needs to be a long-term stable resource, that is an official "publication, otherwise use Relation field." --AnnetteOlson 22:53, 6 March 2009 (CET)

I think that it is a mistake to give the name "Published Source" to the Dublin Core element "Source". As noted above, DC element "Source" references the resource from which the focal resource (e.g. an image resource) is derived. In many (perhaps the majority) of cases of images in a biodiversity context, the image will be of a physical resource such as a museum specimen, an herbarium specimen, or an individual live organism in its environment. All of these resources can and should be assigned LSIDs which would be the appropriate resource to reference in this element. If the focal resource (e.g. an image) were at some point were published, the image as seen in the publication would be derived from the focal resource, not the other way round as it seems to be intended here. I think that this somewhat "backward" way of conceptualizing Source results from the assumption that many images included in databases that may use the MRTG schema will be gleaned from resources on the web. That may be true initially, but eventually many (if not most) images that will be valuable in a biodiversity context will be created by people who are imaging collections or photographing live organisms. The Source element needs to be understood in its correct DC context and instances where an image is found elsewhere as a part of another media resource should be noted using one of the other terms mentioned in this discussion. I understand that an image digitized from something like a journal article or key would in a philosophical sense be derived from the article or key itself, but if one contacted the author to obtain the image directly (a desirable outcome because the quality would be better than a scan) would you say that the image came from the article? I think most people would say that it was the other way round. --Steve Baskauf 05:57, 28 April 2009 (CEST)

I think I am on Steve's side on some of this. I also have some concern that the commentary in the spec for dcterms:source[1] suggest that it will ultimately change taking a class range and become more complex than we desire here. There is a further problem, in that the dc spec recommends as best practice the use of a URI. We seem to discourage that in our Comments, and also seem to impose the further requirement that if a URI is provided, it must identify something that is also described in the same metadata document. --BobMorris 06:55, 1 May 2009 (CEST)

I don't think I agree with all of Steve's arguments, only his conclusion. In particular, whatever the outcome, it must not depend on there being only one object depicted in the medium, nor must it assume that it is an organism, nor that if part of an organism that the taxon is relevant. For example a single picture might have some twigs from several species in it to represent an illustration of "stipule". --BobMorris 06:55, 1 May 2009 (CEST)

TO Do : The dcterms:source definition is at variance from our Comments here in several ways:

It recommends that a formal identifier be used where possible. It is silent about free form

It does not require that the source have "previously described in the data exchange"

Steve's arguments are certainly worth considering. I wonder however, whether some sources, like publication and specimen, don't merit special, inheritable information. Essentially, if source IS a publication, resulting in image A and this has been modified by image processing to create image B, source of image B would only inform about image A, but no longer about the publication. Even worse, if the same thing happens with a specimen, tracing the fact that image B is an image of a specimen would require access to the metadata on the "source", i.e. image A. Essentially, splitting "published source" and "Associated Specimen Reference" from "Derived From" is meant to handle this: direct source in a chain of digital modification in "Derived From", ultimate sources in "Published Source" and "Associated Specimen Reference". However, Steve is certainly right in asking to consider the future, where little is "scanned" from publications. So if generalizing: Which source information should be inherited down a derivation chain and which not? --GregorHagedorn 00:17, 11 May 2009 (CEST)

I think this is generically a problem of record provenance, but it is clouded by the case of images of specimens that are vouchers for some kind of scientific inferences which are meant to be supported by the specimen. Part of the problem may also arise from the controversial nature of "electronic vouchers", such as pictures. Some use cases of an image in a chain of modifications might require easy access to the ultimate (i.e. "original") image of the specimen. Others might require knowledge of the actual manipulations (e.g. by retrieving the elements of the derivation chain). To the extent that one may consider an original publication as a voucher for a scientific inference ("This paper is evidence that species A is different from species B"), I think that the problem is more general than Steve's remarks might lead one to conclude. Images are often the only voucher for observations. Even for traditional voucher specimens, I have heard it proposed that imaging live, or otherwise unprocessed specimens before preserving them should be adopted. Hence, it seems to me that this is not a problem that will go away in the future. Quite the contrary, it will become more important as the preponderance increases of scientific inferences being drawn from images of biota. OK, OK, I don't know where I stand on the issue for the schema. I just think we have to get it right. Does this need to be on a new page with summary and link here? --BobMorris 07:21, 11 May 2009 (CEST)

There is an rdfs semantic issue having nothing to do with the above. It is that dcterms:source is defined to have "non-literal value" as defined by the DCMI Abstract Model. These are defined to be "physical, digital or conceptual" objects, whereas "literal" values are

"A literal is an entity which uses a Unicode string as a lexical form, together with an optional language tag or datatype, to denote a resource (i.e. "literal" as defined by [RDF])."

I cannot see what a non-literal value should be in RDF for dcterms:source, and if in RDFS this is a Class property, what class should be its range.

I have returned to this issue again because of my need to consider the simultaneous databasing of media resources with physical resources. At SERNEC we want to have a database that contains records for physical individual organisms in the wild, physical preservedSpecimens, direct digital images of the individual organisms, and digital images of the specimens. Metadata for the latter three categories of resources can be placed as records in a single database using Darwin Core terms to describe the biodiversity aspects of all the resources and MRTG terms to describe the media aspects of the two categories of images. However, to keep this straight, we need to be able to have a field that indicates an identifier for the resource from which the subject resource was derived. For example, we need to say that a specimen was derived from a certain individual, a live plant image was derived from a certain individual, and that a specimen image was derived from a certain specimen. Under Dublin Core, the appropriate element to describe this relationship would seem to be dcterms:source "A related resource from which the described resource is derived." Arguments given above have supported and opposed the use of dcterms:source in this way for images. However, what occurs to me now is that if dcterms:source is specifically adopted by MRTG to mean a "Published Source" rather than source in a "derivation chain" sense as I would like to use it in our circumstances, it is setting up a conflict for users like me similar to what we just finished going through with dcterms:type. I really can't use dcterms:source to indicate that an individual in the wild was the source of a physical specimen if that term is then going to have a different interpretation for the images that also inhabit the same database (i.e. I can't say that the specimen was the dcterms:source of the digital image of the specimen because MRTG says that dcterms:source means a published source for the digital image).

After looking at the DC terms again, it seems like there may be other DCMI terms that have meanings closer to "Published Source" than dcterms:source. For example, dcterms:isPartOf is defined as "A related resource in which the described resource is physically or logically included." That seems to me to be closer to the situation where an image was originally part of a digital or print publication than "from which the described resource is derived" (i.e. dcterms:source).

dcterms:isPartOf suffers from the same problem as dcterms:source in that its DCMI definition also states that it should "be used with non-literal values". I may be misunderstanding this since I'm a novice, but it seems like the intention here is that both of these terms should be used with a unique identifier, ISBN, etc. rather than a literal. That could certainly be the case in the SERNEC situation where the value for dcterms:source would be the globally unique identifier for the specimen or individual, and in the use for a Published Source where the value of dcterms:isPartOf might be something like an ISBN.

Anyway, my point is that the use of dcterms:source as it is described here could prevent someone from using dcterms:source in other circumstances where it is probably the most appropriate term to apply.

First, how does the definition of Metadata Provider differ from the item above - Provider? Currently the same. Provider is more the metadata plus the resource. Metadata provider is just the metadata provider, but I think it isn't that useful here. provider covers it. I agree in part with action note below about creator, though NBII only has Metadata Contributor, not Metadata Provider or Creator. Contributor is vague and can cover both provider who compiles metadata and a creator, multiple people can be Metadata contributors. However, we are going to set up a system where the metadata can be copyrighted also. we just haven't worked out the details.--AnnetteOlson 23:04, 6 March 2009 (CET)

I agree, this is not useful. In K2N we have dropped the idea of "Provider" and use "Service Provider" and "Metadata Creator" (which usually will be an institution. I have further up annotated further up that Metadata Creator has no longer a place. Do we consider this not useful? --GregorHagedorn 11:45, 29 March 2009 (CEST)

Action Note: when restructuring into an "Agent" schema in Copenhagen 2009, the metadata creator seems to have been dropped. This role is different from the Metdata Provider, which provides a service (e.g. database access) but does not necessarily claim a copyright on descriptions or abstracts. To our legal requirements, the role of a metadata creator is important. IIM, photoshop and XMP recognize an equivalent agent in the metadata item in IPTC CORE 1.1: Description Writer, using the term photoshop:CaptionWriter[10] (modified --GregorHagedorn 09:42, 9 September 2009 (CEST))

Content Coverage Vocabulary

Description of collection or individual resource, containing the Who, What, When, Where and Why as free-form text.

Comments:

It optionally allows to present detailed information and will in most cases be shown together with the resource title. If both description and caption (see below) are present, a description is typically displayed instead of the resource.
Should be a good proxy for the underlying media resource. Interpretation depends on type.

An image may contain language such as superimposed labels. If an image is of a natural scene or organism, without any language included, the resource is language-neutral (ISO code “zxx”). Resources with present but unknown language are to be coded as undetermined (ISO code “und”). Resources only containing scientific organism names should be coded as "zxx-x-taxon" (do not use the incorrect “la” for Latin). If there is no language code available, you must use the ISO extension mechanisms (x-XXX or XXXXXXX, CITE).

Geography Vocabulary

Introduction

Location created and Location shown are separated in the current version of IPTC, and the metadata working group (MWG 2008) also recommends this. We will follow this, to support the expected future increase of automatic GPS based coordinate recording in recording devices. As a special case, the MRTG group recommends to change the semantics of location shown in the case of biodiversity specimens, where the original location differs from the current location at which the specimen is collected. In this case, LocationShown should exclusively refer to the location where a specimen was originally collected (gathering or sampling location). Use LocationCreation to express the location where the media was created (a specimen was digitized).

The primary location of interest is the Location shown; where a existing data do not provide a differentiation, best practice is to assume that it is Location shown. However, for future device-recorded GPS data, it is highly desirable to distinguish them.

TODO in this document: Decision: Core is higher elements of LocationShown down to Sublocation

I implore you not to relegate decimal latitude and longitude to an extension. There is no more fundamental way to express a location on the surface of the earth and it is at least theoretically (if not already actually) possible to generate many of the other elements listed here from the latitude and longitude. Given that it is likely that in the near future most images will have this information automatically embedded, it will become very easy information to obtain and provide. Therefore MRTG should encourage users of the schema from the start to include latitude and longitude with all of their media resources as what is effectively a GUID for location. I think that having only verbatum latitude and longitude and putting them in an extension will prove to be a mistake in the long run.

I completely agree with Steve and unless there is strong objection I intend to do the following:

add decimalCoordinates, decimalLatitude, decimalLongitude items

Write some guidance following the current DwC on the subject, especially calling attention to this sentence in the DwC verbatim georeferencing: "If possible, these coordinates should also be translated into the combination of decimalLatitude, decimalLongitude, geodeticDatum, and coordinateUncertaintyInMeters, but only if you really know what you are doing - coordinate transformations can be challenging"

possibly add the additional stuff DwC adds that lets you know whether the coordinates you list are known to be dependable.

Suggests that MRTG's recommendation is that applications may be skeptical about the authority and accuracy of coordinates if they do not follow the recommendations of DwC. I'm pretty sure that the importance of this is highly dependent on applications. For example, niche modelers or other users of GIS layers (should) already understand that even decimal coordinates without a known datum can have seriously misplaced position on the globe.

Possibly add the entirety of

The above tells me that we will have a serious documentation problem to serve both the needs of lightweight users like people using images from their cell phones and other consumer-grade location aware capture devices, and users whose need is for highly accurate occurrence location that has to be coordinated correctly with other GIS layers.

Here's an idea: We recommend, that, and provide support for, a user who wants the full expression of a DwC Location (or other DwC also serve a real DwC record and we give a service address for it.) This sounds like a big deal which might deserve a special page for discussion....

LocationShown only: MRTG desires to supports Bounding Box. dwc used the concept of a bounding box in earlier versions [1], but as of 2009-02 it seems to have been replaced by the item "dwc:FootprintWKT". MRTG supports dwc:FootprintWKT and FootprintSpatialFit.

A bounding box describes an approximation on the horizontal extent of the subject coverage represented by a rectangle-like shape. A reference of structuring the polygon data must be added.

Iptc does not seem to have elevation, depth, or altitude. Any other geo-reference elements present in dwc may be used.

Location created is especially relevant where automatic GPS data are recorded in the media recording device.

All geoelements are repeatable, because individual resources (like a movie or a combination of several images into a plate) may relate to subjects in different geolocations).

NOT SURE whether this should apply to properties, or rather to the structures (Location Shown and Location Created) --GregorHagedorn 00:30, 2 March 2009 (CET)

After reading this section I was left wondering to what kind of media this group thinks this schema will be applied? Since this group is affiliated with GBIF and TDWG, I was assuming the media would be primarily associated with documenting occurrences and after a web search have concluded that my assumption was correct. Then why are the locality-related elements in this section based primarily on IPTC (a press telecommunications organization) standards and not Darwin Core??? I will be the first to admit that there are problems with DwC, but the solution is to fix DwC, not to use a different system. There are a number of us at SERNEC who at this moment working to create a community image collection that integrates live-plant images and images of plant specimens and we plan to use this schema together with DwC to allow the live plant images to serve as occurrence records. If this schema adopts the IPTC locality elements, then people like us who are considering live plant images to document occurrences will be put in a position of choosing between: 1. including duplicate locality metadata under two systems (MRTG/IPTC and DwC), which would be confusing and silly, 2. ignoring the MRTG/IPTC locality elements entirely and just using DwC, which would put us at odds with other biodiversity image providers that are using the MRTG schema, or 3. ignoring DcW and using the MRTG/IPTC elements, which would make it impossible for us to have a unified system and difficult for our live plant images to feed into the stream of biodiversity metadata that feeds GBIF. Who are we trying to facilitate here, news photographers or the biodiversity community? I really think that the purpose of this schema should be to add metadata elements that are currently missing and which are needed to service media resources, but not to invent what is essentially a competing system with what is currently being used by the biodiversity community.

I don't follow Steve's point above. This is a superset of DwC geography, and all of it is optional. Anybody who has data with DwC geography tags can just hang them on a mrtg:MediaResource and can also ignore those outside DwC --BobMorris 20:33, 7 September 2009 (CEST)

Also, the MRTG committee rebelled against its original charge of only providing for MRTG to document species occurrence and went way further, including to cover identification tools, ecological images, and perhaps things at least as broad as media illustrating the SPMInfoItems types--BobMorris 20:33, 7 September 2009 (CEST)

I feel like this has coalesced into something usable and without significant conflict with DwC (for those in the media community who want to use DwC). The basic universal identifier is: DecimalLatitude, DecimalLongitue (the coordinates), CoordinateUncertaintyInMeters (the uncertainty), and GeodeticDatum (the reference systeim). The basic hierarchic descriptors are: WorldRegion, CountryCode, ProvinceState, and Sublocation. I have one remaining question. I cannot find the controlled values for WorldRegion in the IPTC documentation. I'm hoping that this will be the two letter continent codes listed at http://en.wikipedia.org/wiki/List_of_countries_by_continent_(data_file) but I'm having trouble finding out what they are called officially. Darwin Core says to use the "ISO 3166 Continent code" in the "Continent" term, but I'm having trouble finding that on the web - mostly I find ISO 3166 COUNTRY codes. Anyway, assuming that these two letter codes are synonymous between IPTC WorldRegion and DwC Continent (for non-marine users), then at least for most users the first three levels are likely to be synonymous between the IPTC and DwC systems: Iptc4xmpExt:WorldRegion=dwc:Continent, Iptc4xmpExt:CountryCode=dwc:CountryCode, and Iptc4xmpExt:ProvinceState=dwc:StateProvince. To provide the remaining free-form description at the lowest level Iptc4xmpExt:Sublocation=dwc:Locality. Any other MRTG and DwC elements can be used at the provider's discretion. Am I getting this right?

I don't really understand with what this element and the following one will be poplulated?? Is it some kind of identifier? A value made by concatenating other terms?

I have a somewhat philosophical problem with these two terms because I believe that in the circumstance where a physical object is digitized, there really should be two separate records with their own identifiers for the physical object and its digital representation. For example, if there is a 35 mm slide of a habitat which is then digitized, the location given in the 35 mm slide record should be the "location shown", and the location given in the digital image record should be the "location created". In this circumstance, there isn't really a need to have two different terms, rather one term "location" (i.e. the location where the thing was created) works for both. This is assuming that there is a machine-readable mechanism to indicate that the digital image was derived from the 35 mm slide so that the digital image consumer can discover the location shown through resolving the metadata from the 35 mm slide that from which the digital image was derived. Likewise, if an herbarium is imaging its specimens, the location in the specimen record would be the "location shown" and the location in the digital image record would be the "location created". Again a means would be required for connecting the record for the digital image to the record for the physical specimen in order for the location shown to be discovered.

There is a similar problem with "Date and Time Digitized" and "Original Time and Date" about which I've already commented. These two issues are really the same and whatever the solution is to one of them is the solution to the other.

I realize that many users will not care about this distinction and will just want to have a record for the digital object and use the two fields that you define. But what I'm still trying to figure out is how users such as the herbaria that are digitizing specimens will use these elements if they have two separate records (one for the image and one for the specimen).
Steve Baskauf 13:46, 31 October 2009 (CET)

There are scenarios where both LocationShown and LocationCreated are important, e.g. a remote sensing image. For most biodiversity media, the LocationShown is likely the more important if the metadata author feels that the LocationCreated is irrelevant and should not be provided. However, in no case should consumers of MRTG metadata assume any particular value for missing metadata.

World region classification, such as continent, waterbody, island group, or island names, preferred from a controlled vocabulary (to be defined).

Comments:

We believe it is important to follow the XMP and IPTC standard set for media metadata and implemented in media management software. DarwinCore here forces primary metadata providers to classify world region terms into. This can, however, relatively easy be achieved by metadata aggregators (e.g. using biogeomancer-like services).

Optionally, the geographic unit immediately below the country level (individual states in federal countries, provinces, or other administrative units) in which the subjects (e. g., species, habitats, or events) were recorded by the media (if such information is available in separate fields).

Free-form text location details down to the village, forest, or geographic feature etc., especially information that could not be found in a gazetteer.

Comments:

We distinguish Locality in the sense of dwc (= a complete description of a locality, with the possible except of country names etc. separted in the dwc:HigherGeography, and Sublocation in the sense of IPTC/XMP, i.e. the remainder free-form text location within a fully hierarchically arranged grouping (earlier IPTC versions used “Location”, but this has been renamed as of 2008).

I don't think that the description in the comments of the sense of Locality in dwc (at least as it stands now) is accurate. DwC:locality is defined as "Less specific geographic information can be provided in other geographic terms (higherGeography, continent, country, stateProvince, county, municipality, waterBody, island, islandGroup)", not a complete description of all levels in the hierarchy. Maybe I'm just not understanding the comments, but I think that the current dwc:locality is the same as what we are defining here as sublocation.

Coordinates, elevation, etc. TODO: should this become extensions to the IPTC structure?

Latitude and longitude of geographic coordinates. Both decimal representation (use "." as decimal point) or degree-minute-second (use " for minutes and ' for seconds) may be used. End the latitude with N or S, or prefix the value with "+" for northward and "-" for southward. . End the longitude with the letters E or W, or prefix the value with "+" for eastward and "-" for westward. Use the comma (",") to separate latitude from longitude. If positive/negative values are being used instead of prefix letters, it is essential to place the latitude first; otherwise it is recommended. A geodetic datum (such as WGS84 used for GPS measurements) may optionally be added in parentheses at the end. Examples: "27°59'16?N, 86°56'40?E (WGS84)" or "+49.5000°,-123.5000°" (for decimal degrees and using positive/negative values).

Comments:

This may be derived from the GPS of camera, not location shown. Where the provider has the data separated, recommended best practice is to the separately provided Latitude and Longitude metadata items; this item is in support of metadata where the coordinates are not separated and the provider is unable to provide reliable separation.

It would seem to me that it would make more sense to have the term named "Latitude" refer to the DwC element "DecimalLatitude" rather than VerbatimLatitude. Both the verbatim latitude and longitude (perhaps pulled directly from EXIF data) can be examined by a human user if desired through examination of the Geo-coordinates (= DwC VerbatumCoordinates) item. If a provider is going to go to the trouble to parse out the individual latitudes and longitudes (which many will!), they will undoubtedly convert them to the most machine-readable format (DecimalLatitude and DecimalLongitude) which can then be used without further conversion in GIS or Web (e.g. Google Maps) applications. Those two elements (DecimalLatitude and DecimalLongitude) are not at present found in this schema.

It would seem to me that it would make more sense to have the term named "Latitude" refer to the DwC element "DecimalLatitude" rather than VerbatimLatitude. Both the verbatim latitude and longitude (perhaps pulled directly from EXIF data) can be examined by a human user if desired through examination of the Geo-coordinates (= DwC VerbatumCoordinates) item. If a provider is going to go to the trouble to parse out the individual latitudes and longitudes (which many will!), they will undoubtedly convert them to the most machine-readable format (DecimalLatitude and DecimalLongitude) which can then be used without further conversion in GIS or Web (e.g. Google Maps) applications. Those two elements (DecimalLatitude and DecimalLongitude) are not at present found in this schema.

Not sure, perhaps over-atomized? Description needs explanation how circular versus rectangular precision (the latter occurs if longitude/latitude have separate precision estimates) is to be expressed, and how to express the measurement unit. --GregorHagedorn 07:59, 23 February 2009 (CET)

I do not think that this is a correct representation of CoordinatePrecision, at least in the most recent DwC schema. What is described in the definition here is actually CoordinateUncertaintyInMeters. CoordinatePrecision is "A decimal representation of the precision of the coordinates given in the DecimalLatitude and DecimalLongitude" which I think is actually a more useful element than CoordinateUncertaintyInMeters, at least if we are planning for a future when most data will be collected by GPS enabled cameras. For most current GPS receivers, the value for CoordinatePrecision would be 0.00001 (decimal degrees) which could be automatically assigned by the provider given the source of the data (i.e. GPS). In cases where data providers are limiting public access to precise locality data by reducing the precision of the coordinates that are provided (see DwC Generalizations), they are most likely to do so by lopping off digits from their raw coordinates. Under that circumstance, providing a value for CoordinatePrecision would be much more straightforward than providing a value for CoordinateUncertaintyInMeters. Providers who are crunching "old" data (e.g. from museum specimen data) are going to have to do some kind of calculation or conversion anyway and they can just as easily give their precision in decimal degrees as meters.

This may be correct for altitude/height, but not elevation. Elevation is correct for geolocation, altitude for observer or subject position. However, it seems at the moment there is no altitude or height above local surface (e.g. for shots from a tower) present in MRTG --GregorHagedorn 15:20, 18 June 2009 (CEST)

The specific elevation or range of elevation at which the media was recorded, including units (elevation is defined as zero being mean sea level). This is the position of camera - any additional elevation of the subject itself should be put in description.

Semantics of Elevation versus Altitude: According to definitions of elevation versus altitude in Wikipedia, elevation is correct for geolocation, altitude for observer or subject position. --GregorHagedorn 07:59, 23 February 2009 (CET)

I believe some field should record the elevation of the geolocation, another the camera altitude. It may be that this is not currently captured, since a geolocation elevation may be missing. -- Proposed description of geolocation-elevation: "Elevation (height of ground level above mean sea level) of observation position. For human-held digital cameras (recording GPS-based height) it is permissible to use the position of the camera instead. A geodetic reference datum may be added in parentheses." --GregorHagedorn 07:59, 23 February 2009 (CET)

I'm unhappy about including units. Would require parsing for machine use. Better to have everything that needs units have a units property.--BobMorris 20:30, 8 February 2009 (CET)

I think computers should work where they easily can, so I prefer to keep it simple and allow various expressions of elevation, including textual ones. --GregorHagedorn 14:08, 22 February 2009 (CET)

Any reason not to just adopt NCD URI here? Do we risk raising the question of whether this applies only to collections? (It doesn't). Alternative is mrt namespace. --BobMorris 06:40, 4 April 2009 (CEST)

The date of the creation for the original resource from which the digital object was derived or created. The date and time must comply with the World Wide Web Consortium (W3C) datetime practice, which requires that date and time representation correspond to ISO 8601:1998, but with year fields always comprising 4 digits. This makes datetime records compliant with 8601:2004. AC datetime values may also follow 8601:2004 for ranges by separating two IS0 8601 datetime fields by a solidus ("forward slash", '/'). See also the wikipedia IS0 8601 entry for further explanation and examples.

Comments:

What is what constitutes "original" is determined by the metadata author. Example: Digitization of a photographic slide of a map would normally give the date at which the map was created; however an photographic work of art including the same map as its content, may give the data of the original photographic exposure. Imprecise or unknown dates can be represented as ISO dates or ranges. Compare also Date and Time Digitized.

Controlled vocabulary of subjects that help provide search capabilities. Terms from various controlled vocabularies may be used. MRTG-recommended vocabularies are preferred and may be unqualified literals (without a URI). For terms from other vocabularies either a precise URI should be used, or, when providing unqualified terms, to provide the source vocabulary in Subject Category Vocabulary.

Comments:

Recommended sets include: (PROVIDE REFS) Nasa GCMD, K2N BioComplexityThesaurus, GEMET, Can include major groups such as vertebrates, fungi; ecosystem terms?? apparatus terms?? such as…, aquatic vertebrates, forest fires. In the case where the unqualified terms from different vocabularies are homographs, the MRTG recommendation provides and order of preference for assigning terms to specific vocabularies. This includes other formal classifications (published in print or online) such as habitat, fuel, invasive species, agroproductivity, fisheries, migratory species etc

"Maybe. But we tend to do the same thing in the places where we need to, so I am not sure why we would do this. Again, for it to have any utility, a query agent would need to always search it, no matter what its specific desires." Bob Morris, 2009-03-02

Any vocabulary or formal classification from which additional terms in Subject Category have been drawn.

Comments:

The MRTG recommended vocabularies do not need to be cited here. There is no linkage between individual Subject Category terms and the vocabulary; the mechanism is intended to support discovery of the normative URI for a term, but not guarantee it.

Tags may be multi-worded phrases. Where scientific names, common names, geographic locations, etc. are separable, these should go into the more specific metadata items provided further below. Examples: "flower diagram". Character or part keywords like "leaf", "flower color" are especially desirable.

Taxonomic Coverage Vocabulary

A higher taxon (e.g. a genus, family, or order) at the level of the family or higher, that covers all taxa that are subject of the resource.

Comments:

Example: “Aves” for a bird key or a bird image collection. Do not add a rank (“Class Aves”) in this field. If the resource contains a single taxon, this should be placed only in Scientific Name, leaving Lowest Common Taxon empty. Where the subject of an image are several species of ducks with trees visible in the background, Taxonomic Coverage would still be Anatidae (and not Biota).

For an RDF representation of MRTG, there is a problem here and in a few other places where the Item URI comes from an RDF ontology. It is this: ncd:taxonCoverage[2] is an object property, but the commentary seems to suggest here that we mean a datatype property, e.g. a string. --BobMorris 00:02, 11 October 2009 (CEST)

Scientific taxon names of organisms represented in the media resource (with date and authorship information if available) of the lowest level taxonomic rank that can be applied.

Comments:

The Scientific Name may possibly be a Genus or Family name, if this is the most specific identification available. Where multiple taxa are the subject, multiple names may be given. If possible, add this information here even if the title or caption of the resource already contains scientific names. Where the list of scientific names is impractically large (e. g., media collections or identification tools), the number of taxa should be given in Taxon Count (see below). If possible, please do not repeat the Taxonomic Coverage here. Do not use abbreviated Genus names ("P. vulgaris"). It is recommended to provide author citation to scientific names, to avoid ambiguities in the presence of homonyms (the same name created by different authors for different taxa). Identifier qualifications should be supplied in the Identification Qualifier (DO WE HAVE THIS???) term rather than here.

Common (= vernacular) names of the subject in one or several languages. The ISO language name should be given in parentheses after the name if not all names are in Metadata Language.

Comments:

Applicable only if the resource relates to a single taxon. The ISO language codes after the name should be formatted as in the following example: 'abete bianco (it); Tanne (de); White Fir (en)'. If names are known to be male- or female-specific, this may be specified as in: 'ewe (en-female); ram (en-male);'.

"This field needs to be defined more, and intent decided on, but could be important." (From spreadsheet)

"this is a Darwin Core question; can refers to one of the authoritiative GSD, such as IT IS. Question comes up here of matching names and sources, and the capabilities of coding that." (From spreadsheet)

One or several scientific names that are synonyms to the Scientific Name may be provided here.

Comments:

The primary purpose of this is in support of resource discovery, not developing a taxonomic synonymy. Misidentification or misspellings may thus be of interest.
Where multiple taxa are present in a resource and multiple Scientific Names are given, the association between synonym and name is not discoverable.

An exact or estimated number of taxa represented by the media resource.

Comments:

It is recommended to give an exact or estimated number of specific taxa in any case, even were a complete list of taxa is not available or practical. Please try to give this information even where not required. The count should best contain only the taxa covered fully or primarily by the resource. For a taxon page and most images this will be “1”, i. e. other taxa mentioned or in the background should not be counted. However, sometimes a resource may illustrate an ecological or behavioral entity with multiple species, e. g. a host-pathogen interaction. This should be a single integer number. Leave the field empty if you cannot estimate the information (do not enter 0).. Has to be featured in the media.

A single number in this item is assumed to be a count of taxa at the lowest applicable taxon rank. Where it is desired to specify counts of genera, families, etc., additional taxon counts may be added which in parentheses provide the rank (in the metadata language) at which the count was taken. Example: "12 (family)".

Although of special interest for collections, this is also highly relevant to singular resources addressing many taxa (such as identification tools).

I would suggest the use of standardized views (Vulpia 7:16-30) as a way to deal with this element and the following three elements. A standardized view represents an orientation of a particular organism part that has been found to present useful taxonomic characters or recognizable features for identification. I have found through experience that there are a limited number of such views for a particular group of organisms and have defined sets of standardized views for woody angiospermsherbaceous angiosperms and Gymnosperms as collections in Morphbank. If standardized views became a part of this schema, one element would probably be required to specify the view set and another element would be required to specify the view within the set. It would also be necessary to figure out how to formally define the views, establish controlled vocabulary, and to define view sets for other groups of organisms based on the knowledge of photographers with experience in that group. The advantage of this system is that the view sets would be contain views that were relevant to that group - it is difficult to create generic views applicable to all organisms because of morphological differences among groups and because on a large taxonomic scale organisms don't even have the same features. Having a way to specify a particular useful view (such as a frontal view of a flower) makes it much easier for users to search for the specific type of image they want. Although I don't have experience with defining view sets beyond plants, this concept could theoretically be extended to other non-organismal subject of biological interest, such as ecoregions, habitats, etc.

Although this element is generally useful for animals, it is problematic or even irrelevant for most plants. In most cases, sex is a property of a plant part, not the organism itself. For many plant features (e.g. bark, leaves, buds) sex is irrelevant or undeterminable. In a plant context, including sex in the definition of a view (see above) of floral parts or cones makes more sense. A given individual plant may have an image of a male cone, a female cone, and bark where sex is irrelevant.

In a similar vein to my comment on sex, this element has very limited use for many plants. For example, most long-lived trees cycle annually through an "immature" (from a meristematic point of view) stage - budburst, to sexual maturity (anthesis), to fruit development. Thus "life stage" is really more relevant to a plant part than the plant as a whole, except in the case of a seedling where unquestionably the entire plant is immature. Again, the concept of a view can handle this by referencing life stage for individual parts or the whole organism (e.g. floral development) when relevant and leaving it out when it is not (e.g. for leaf images).

Free form text describing the techniques used to prepare the subject prior or while creating the media resource.

Comments:

Examples for such techniques are: Insect under CO2, cooled to immobility, preservation with ethanol or formaldehyde. See also Resource Creation Technique for technical aspects of digital media object creation.

Technical Metadata Vocabulary

The location at which the media recording instrument was placed when the media was created.

Comments:

The distinction between location shown and created is often irrelevant, and metadata may be assumed to be referring to location shown. However, in the case of position data automatically recorded by the instrument (e.g. EXIF GPS data) LocationCreated should be used to maintain information accuracy.

Date the first digital version was created, where different Date and Time Original (e.g. where photographic prints have been scanned). The date and time must comply with the World Wide Web Consortium (W3C) datetime practice, which requires that date and time representation correspond to ISO 8601:1998, but with year fields always comprising 4 digits. This makes datetime records compliant with 8601:2004. AC datetime values may also follow 8601:2004 for ranges by separating two IS0 8601 datetime fields by a solidus ("forward slash", '/'). See also the wikipedia IS0 8601 entry for further explanation and examples.

Comments:

This is often not the file creation or modification date. Use the international (ISO/xml) format yyyy-mm-ddThh:mm (e. g. "2007-12-31" or "2007-12-31T14:59"). Where available, timezone information should be added. In the case of digital images containing EXIF, whereas the exif capture date does not contain time zone information, exif GPSDateStamp and GPSTimeStamp may be relevant as these include time-zone information. Compare also MWG (2008), which has best practice on handling time-zone-less EXIF date/time data.

As of the date in which I'm making this comment, this term doesn't yet have a normative URI. I don't know if that means that the normative URI hasn't yet been decided, if this was overlooked, or if there was some disagreement of what it should be. It seems clear to me that it should be dcterms:created which is defined in a straightforward manner as "Date of creation of the resource".

I am, however, left scratching my head as to why this element (which seems to me to be a very fundamental property of a digital media resource) has been relegated to a "Technical extension" rather than as one of the core elements. I suspect that this is another instance where my experience has caused me to have a different outlook than others. 99.9% of my images are taken of existing physical objects with a digital camera rather than scanned from existing artwork, film slides, or taken from a publication. In my circumstance, the "Original Time and Date" either doesn't mean anything or is the same as "Date and Time Digitized". If I take a picture of a tree, what does the core term Original Time and Date ("The date of the creation for the original resource from which the digital object was derived or created.") mean? The date the tree was planted? The date I discovered it? If the tree isn't the original resource and the digital image is the original resource, then the definition of "Original Time and Date" doesn't make any sense.

I'm not sure what the answer is to these questions, I just know that I don't understand how I would use Original Time and Date. With my orientation, if I were writing the schema, I would probably have made Date and Time Digitized a core element (since all digital media resources have it) and relegated Original Time and Date to an extension (since it only applies to a subset of digital media resources which are digital representations of physical media representations of physical objects). If it were ten years ago, I would consider my outlook as the unusual one because at that time probably most digital media resources were being created by digitizing existing physical media items. However, I would venture to say that at the present, the fraction of new digital media items that are being generated directly (either through digital photography or generated directly from software such as GIS or animation software) is much higher than the fraction being created from digitizing physical media items. The fraction will probably be even greater in the future. So why is the digital creation date part of an extension (i.e. I'm assuming considered less important than Original Time and Date)?
Steve Baskauf 05:22, 28 October 2009 (CET)

There is a huge amount of legacy digital media, e.g. people's slides. If you took a picture on such media and subsequently digitized it, the Original Time and Date is the date at which the photograph was taken, and the Date and Time Digitized is the date and time at which the digital record was made. If you are extracting a time and date recorded by the device itself, and you intend that the subject was that originally depicted, e.g. the digital record is a picture of a tree, not a picture of a picture of a tree because you are imaging a photograph, then these two notions of date are generally different. The Original Time and Date is usually the more biologically interesting and that is why it is in the core. The Date and Time Digitized may be more of curatorial interest than scientific interest. Indeed, there are ongoing arguments about whether a image of intellectual property is new or derivative property, and in such arguments, the distinction may be especially important. We felt that the date at which the scene was originally captured has greater scientific import than the date at which the digital record is made---in case they are different---we put the former in the core but not the latter. People putting metadata on images made by contemporary cameras are likely to find that their best practice is to make Original Time and Date be that recorded by the device, assuming it is properly set. --BobMorris 20:19, 23 January 2010 (CET)

Free form text describing the device or devices used to create the resource.

Comments:

It is best practice to record the device; this may include a combination such as camera plus lens, or camera plus microscope. Examples: "Canon Supershot 2000", "Makroscan Scanner 2000", "Zeiss Axioscope with Camera IIIu", "SEM (Scanning Electron Microscope)".

Annotating whether and how a resource has been modified or edited significantly in ways that are not immediately obvious or expected to consumers is of special significance. Examples for images are: Removing a distracting twig from a picture, moving an object to a different surrounding, changing the color in parts of the image, or blurring the background of an image. Modifications that are standard practice and expected or obvious to users are not necessary to document; examples of such expected include changing resolution, cropping, minor sharpening or overall color correction, clearly perceptable modifications (adding arrows or labels, combination or multiple pictures into a table. If it is only known that significant modifications were made, but no details are known, a general statement like “Media may have been manipulated to improve appearance” may be appropriate.

reference to an instance of a class describing network access to the media resource, or related resources, that the metadata describes. What constitutes a class is dependent on the representation (i.e. XML Schema, RDF, etc.)

Comments:

Use with the properties below. In particular, there is little point to having an instance of this class without a value for the Access URL and perhaps the Format. Implementers in specific constraint languages such as XML Schema or OWL may wish to make those two properties mandatory on instances.

The technical format of the resource (file format or physical medium).

Comments:

Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]. This item is recommended for offline digital content. In cases where the provided URL includes a standard file extension from which the format can be inferred it is permissible to not provide this item.

*Thumbnail: ServiceAccessPoint provides a thumbnail image, short sound clip, or short movie clip that can be used to represent the media object, typically at lower quality and higher compression than the preview object. A typical size for a tiny thumbnail image may be 50-100 pixels in the longer dimension.

Trailer: ServiceAccessPoint provides video clip preview, in the form of a specifically authored "Trailer", which may provider somewhat different content than the original resource

Good Quality: ServiceAccessPoint provides a good quality version of the media resource intended for resources displayed as primary information; e.g. an image between 800 and 1600 px in width or height.

Best Quality: ServiceAccessPoint provides the highest available quality of the media resource, whatever its resolution or quality level.

Since some of these are important and some not, it is unclear to me what advice to offer as to whether application builders are in trouble if they ignore some of them. --BobMorris 19:14, 28 January 2010 (CET)

Since the permitted values are identifiers, not labels, I really think it better to have them in CamelCase, with no whitespace. If there are no persuasive arguments otherwise, I will change them to that.--BobMorris 19:14, 28 January 2010 (CET)

Best practices are: Extent as length/running time should use standard abbreviations of the metadata language (for English "20 s", "54 min"). Extent of images or video may be given as pixel size ("2000 x 1500 px"), or as file size (using kB, kByte, MB, MByte).

I'm worried that this term is too comprehensive and in particular not amenable to machine processing. Note that IS0 8601 provides standards for duration. Maybe we should refactor this term into separate, machine parseable, fields. Surely IPTC has size fields. --BobMorris 17:35, 29 March 2009 (CEST)

IPTC has neither size nor extent. XMP has Rendition Class with attributes specifying some aspects of size, plus xmp:Thumbnails, an array of thumbnail objects. It further has xmpDM:videoPixelDepth and color depth properties. I could not find an image pixel extent property, although I believe it should exist there. --GregorHagedorn 08:53, 3 April 2009 (CEST)

However, I am not too worried about machine processing here. If we want to provide users with help to estimate fitness of purpose prior to access the media object itself, as is stated in our MRTG reports, we need this. Nothing wrong with making it better machine processable, but in the absence of a good solution for that, keep it human accessible. And I believe good programmers can make such information machine accessible... --GregorHagedorn 08:53, 3 April 2009 (CEST)

The definition of version should be limited to content that is identical except for quality, availability, license, etc. When is a descriptions necessary? Such cases are probably better served if handled as separate resources, and given a resource to resource relation like derived from. the version-specific description should thus be dropped. --GregorHagedorn 12:11, 29 March 2009 (CEST)

Definition is too vague about "and by which the media resources are linked to their collection." --BobMorris 20:53, 13 February 2009 (CET)

We had issues with people misunderstanding CollectionID as ID of , rather than ID to. I propose to rename this to MemberOfCollectionByID? --GregorHagedorn 12:23, 23 February 2009 (CET)

Why not just MemberOfCollection with spec being that the value is an ID? We don't have any other MemberOfCollection? --BobMorris 23:54, 13 March 2009 (CET)

the natural response by non-programmers for "member of which collection"? would, in my opionion, be to give the collection by its title. We know that is not unique enough. I like names to be clear, but if MemberOfCollectionByID is sounding too complicated, we just have to make sure that the definition is clear. Problem when not specifying which ID or what form is that we cannot validate that MemberOfCollection = "XYZ Butterfly images" is invalid. --GregorHagedorn 18:34, 14 March 2009 (CET)

WE NEED TO SETTLE THIS NAME.

How about "ID of Containing Collection" with another item "Name of Containing Collection", perhaps with the first being preferred. --BobMorris 18:52, 14 March 2009 (CET)

?? We need a comment that this need not be the ID specified in the Identifier metadata item, although that is a recommended practice. --BobMorris 17:38, 14 March 2009 (CET)

Do we need to specify semantics of inheritance? If R is a MemberOfCollection A, which is a MemberOfCollection B, are applications allowed to conclude that R is a MemberOfCollection B? --BobMorris 17:38, 14 March 2009 (CET)

Usage remains unclear to me. duplicate of Resource ID? Why restricted to media resources, but not Collections? Should this be something like "relation or unknown kind"? --GregorHagedorn 12:23, 23 February 2009 (CET)

Definition is too vague about "and by which the media resources are linked to their collection." --BobMorris 20:53, 13 February 2009 (CET)

Is there no requirement that the ID be the MRTG ID? --BobMorris 17:38, 14 March 2009 (CET)

a Globally unique ID of the provider of the MRTG record that is being provided.

Comments:

If the resource is not a provider - this item is for relating the resource to a provider, using an arbitrary code that is unique for a provider, contributing partner, or aggregator, or other roles (potentially defined by MARC, OAI) and by which the media resources are linked to the provider. -

"need to figure out how to handle aggregation depending on whether cataloging the collection, or an individual media (or a provider), this would be the dc identifier or relation field" (from the spreadsheet). - Definition is too vague about "and by which the media resources are linked to their collection." --BobMorris 20:53, 13 February 2009 (CET)

We discussed possible misunderstandings of provider in Woods Hole. Should this perhaps be: ServiceAttributionURI: "A URI that identifies the primary provider of either the data or metadata, whichever is desired and agreed upon by media resource and metadata provider. Client software displaying the results of metadata searches are being requested to display for each resource the following attributions (if available): resource creator, copyright owner, collection context, and the ServiceAttributionURI. If ServiceAttributionURI matches a homepage of a data or service provider record, in addition to the URI title, description, or logo are requested to display." --GregorHagedorn 12:23, 23 February 2009 (CET)

for NCD Institution Identifier comes closest: "The URI (LSID or URL) of the institution. In RDF this will be used as URI of the institution resource."

A reference to an original resource from which the current one is derived.

Comments:

Derivation of one resource from another is of special interest for identification tools (e. g. a key from an unpublished data set, as in FRIDA, or a PDA key from a PC or web key) or web services (e. g. a name synonymization service being derived from a specific data set). It may very rarely also be known where one image or sound recording is derived from another (but compare the separate mechanism to be used for quality/resolution levels). – Human readable, or doi#, or URL.. Simple name of parent for human readable. Can be repeatedable if a montage of images.

This is not intended for different display resolutions, for which different quality level URLs are available

What is the difference between Derived From and Published Source?

This is more general and published source carries a special semantics of published item. I am not sure both are needed though! --GregorHagedorn 01:21, 3 March 2009 (CET)

This element suffers from the same problem of confusion about "what is the source from which a resource is derived" as I discussed in the Published Source item. --Steve Baskauf 17:25, 30 April 2009 (CEST)

For NCD Derived Collection ncd:[2] comes closest "A "derived" collection record. The record has been derived from a query on an item-level database e.g. all items from Australia.".

Supports to find a specimen resource, where additional information is available. If several resources relate to the same specimen, these are implicitly related. Examples: for NHM “BM 23974324” for a barcoded or “BM Smith 32” for a non-barcoded specimen; for UNITS: “TSB 28637”; for PMSL: “PMSL-Lepidoptera-2534781”. Ideally this could be a URI identifying a specimen record that is online available.

"question of whether we have both an encoded URI/vocab, but also get a free text." from spreadsheet

Is this the same as DwC:RelatedResourceID - "A global unique identifier to a related resource."

I won't repeat what I already wrote in the discussion of Published Source, but again it seems to me that what we need here is simply Dublin Core "Source", which is clearly defined as "A related resource from which the described resource is derived". It seems to me that we are creating unnecessary complexity by defining separate terms depending on if an image came from a specimen, a live plant, or a digitized slide. All of these physical resources can be referenced by the URL of their online record (and they should have one) or eventually by their LSID (assuming LSIDs ever get off the ground). If there aren't online references, then create a VerbatimSource field to be populated with whatever text description there is for locating the resource. But don't create several different elements when one will do. --Steve Baskauf 17:25, 30 April 2009 (CEST)

At the moment, I somewhat agree with Steve, but (a)MRTG metadata need not refer to a digital object, so there is not necessarily an existing URL to give context and (b)dc:Source seems too vague to me. If something is marked as dc:Source, how do we know it refers to a specimen? --BobMorris 22:08, 23 January 2010 (CET)

This will need coordination with the Observation Working Group --BobMorris 20:53, 13 February 2009 (CET)

Same thing here as above. Why are we creating another specific field when one generic field will do? If it is necessary to know what kind of resource the source is, then create an element called SourceType and populate it with valid Type terms from a controlled vocabulary, i.e. Darwin Core Type vocabulary. Thus a more generalized system could be: Source (hopefully an LSID or URL to an online record), VerbatumSource (a human-readable text description of where to find the source), and SourceType (a controlled DwC vocabulary, including physical specimen, observation, still image, etc.)

↑plus = ? The correct resolution for the namespace mentioned in the Ipct PDF is not given there, and the namespace http://ns.useplus.org/ldf/vocab/ associated with the plus ns prefix does not have the term like CopyrightOwner in it. This may be a version issue and needs further research.