Software which works with discussion threads may use this vocabulary
as a standard way to exchange threading information. In addition, the
vocabulary can be used to store any number of posts from any discussion
forum in a standard way. All discussion venues are treated equally,
so data from multiple media may be combined into a single data set.
This allows one to treat all weblogs and message boards as though
they were a single forum.

In addition, this vocabulary includes certain properties
specific to weblogs which can be used to
describe some common relations, such as the recommendations
many weblogs make (the “blogroll”).

Background

The goal of the thread description language is to describe all forms
of on-line discussion, including weblogs, message boards, Usenet, e-mail,
instant messages, and anything else which can be described in terms of
Posts.

The basic unit of an on-line conversation is the Post. A discussion
comprises a set of posts by various authors which are related to each
other. This set of related posts is a thread. Structurally, threads
can be either implicit or explicit, and they may be linear or forked.

In an explicit thread, the posts which compose the thread are
marked as being part of the same thread. Implicit threads, however,
must be derived from the pattern of references found among a set of
posts.

In a linear thread, each post follows another, as one
finds in an instant message conversation or “unthreaded” message
board. In a forked thread, any given Post may be followed by
multiple posts, forming a tree of responses. For maximum
flexibility, posts in a forked thread may also follow
mulitple posts, in addition to being followed by multiple
posts.

The Thread Description Language is a set of
RDFclasses and properties
which are used to describe discussion threads and forums. By
implementing it in RDF,
we gain interoperability with other vocabularies, extensibility,
and a well-defined serialization format, RDF-XML.

Each post is identifed by a URI,
and relations between them are implemented by properties. To
represent the connections between posts in a linear thread, we
use the sequence properties, which are
“next” and its variants. For a forked thread, we use the
reference properties, which are “refersTo”
and its variants.

[@@ notes
- goal is to describe all forms of discussion threads:
weblogs, msg boards, usenet, mailing lists, &c
- use RDF as a standard way to describe things, not necessarily
for internal representation
- useful to have a standard way to express "X is last post of Y", even if
it's not useful to store it
- RDF-XML is a handy way to store these facts in a file
- this means that TDV can represent any discussion forum - ThreadML
Uses and applications:
- A model for those developing dicussion-thread-based software
- Exchange format for threading information extracted from blogthreads,
message boards, etc.
- File format for storing threads from message boards, usenet, IM, etc
]

Of these, only the Thread Description Language, Dublin Core, and
RDF vocabularies
are required. The others are used in examples.

[@@ The W3C seems to recommend that RDF namespaces end in “#”,
as in “http://www.eyrie.org/~zednenem/2002/web-threads/#” ]

Methods

Some examples to give an idea of how existing discussion forums may be represented.

Weblogs

While individual weblogs might not be considered a discussion forum,
one can look at the weblog community as a whole as a giant, distributed
discussion. This is enabled by the use of URI references to identify
posts and hyperlinks to make references. Naturally, the permanent address
of each post (often called the “permalink”, although that conflates the
act of pointing with the object being pointed at) would be used to identify
each post. The hyperlinks contained within the post serve as references.
(This raises the question of how one determines which hyperlinks a given
post contains. This is beyond the scope of this document, although the
associated coding
convention describes one such method.)

The universe of weblogs (or “blogosphere”) is implicitly threaded,
and several proposals exist for standard ways of indicating explicit
threads. Standard hyperlinks imply the “refersTo” property. (Again,
the coding convention describes how other references may be specified.)

Any given weblog post belongs to a Weblog, although that membership
may not be derivable from the post’s encoding. Most weblog posts are
encoded in some variant of HTML,
but for minimal confusion it is recommended that the value of the
“content” parameter be given as an XHTML 1.1 fragment.
This avoids incompatability with RDF-XML, and should not result in any
loss of information.

Linear message board

In a linear, or “unthreaded”, message board, each post follows another in
chronological sequence. If individual posts can be assigned URI references, then
they can be represented as explicit, linear threads.

Each thread is represented by a Topic, and the first post is
identified by the Topic’s “first” property. The posts use
“next” to indicate the following post. For an example of a Topic
encoded this way, see this
RDF-XML transcription of a Quick Topic thread.

Forked message board

In a forked, or “threaded”, message board, each post either begins
a thread (represented here by a Topic) or replies to an existing post.
They are explicitly threaded, because each post is associated with a
specific thread. The threads themselves are forked, because posts may
have multiple responses. (Some forked message boards also provide
a chronological ordering for posts, allowing the thread to be viewed
as a tree or a sequence.)

As with linear boards, each thread is
represented by a Topic, and the first post in the thread is identified
by the Topic’s “first” property. The posts relate to each other through
“refersTo” (at a minimum; a message board can probably assume “commentsOn”
or better for direct replies) and possibly also through “next”. The
Topic will usually be part of a Forum. In large message boards, the
Forums may themselves be organized into larger Forums.

Weblog with comments

Many weblogs have a comments feature which allows readers to respond
to posts within the weblog itself. These comments add to the pre-existing
inter-weblog discussion, and can potentially be referenced themselves
by other weblog or message board posts. Interestingly, one could
consider news sites which feature “talkback” to be examples of this
pattern.

A set of comments to a single weblog post constitute an explicit
thread which may be forked, linear, or both depending on the message
board setup. In this case, the original post does double duty as a
Post and a Topic. If the comments are linear, the Topic’s “first”
property identifies the first response. Otherwise, the Posts in the
Topic use “refersTo” to indicate whether they are responding to the
weblog post or to another post in the Topic.

Note that “http://example.org/blog/455” is not explicitly identifed as a
Topic; this is all right, as most software would be able to figure it out
as it has the “first” property and is the value of several posts’ “inTopic”
property. However, different implementations may present different
information or the same information in different ways.

Note also that all three posts in that example are part of the same
document. This is also not required. All that is important is that the
three URI references
are different.

Usenet

Each Usenet message is required to have a unique message ID. This forms the basis
of part of the news: URI scheme.
A message with the ID “1998090902325900.WAA04282@example.org” has the address
“news:1998090902325900.WAA04282@example.org”. Specific Usenet newsgroups are also
given unique names such as “alt.example” or “rec.arts.tv.mst3k.misc”. These are
similarly represented by the news: URI
scheme as “news:alt.example” and “news:rec.arts.tv.mst3k.misc”. Message IDs
always contain a commercial at-sign (“@”), and newsgroup names never contain
one, so there is no possibility of confusing a message ID and a newsgroup name.

Newsgroups are implicitly threaded. Each message contains a header specifying
the posts it refers to (ie, posts earlier in its thread). This header corresponds
to the “refersTo” property, and the messages listed are its values.

The newsgroups themselves are represented as Topics, and Usenet as a whole
can be thought of as a Forum. Crossposted messages have multiple values for
“inTopic”.

Although the Usenet headers only provide for references made between
Usenet messages, applications are free to infer additional references
from URIs contained
in the message text.

[@@ there should be a note somewhere about MIME and content]

E-mail messages

Like Usenet messages, e-mails contain message identifiers
which are required to be unique. These form the basis of the mid:
URI scheme, as with
“mid:528FA5637F7A16419AD8FC006128E6DCBC6836@example.org”. Thus, they too
can be represented as posts. Additionally, some mail clients include a
references header when making a reply (although this is not common
practice) which allows for some “refersTo” properties to be inferred.

E-mail as a whole is implicitly threaded (to the extent that
references can be inferred from context). Mailing lists can be represented
as Topics, particularly if they include the List-URL header or otherwise
have a unique address. Because mailing lists are centrally managed, they
can have sequential and forked threading.

Instant Messages

While there do not appear to be any standards for identifying instant
messages, there are some defined URI
schemes which are sufficiently decentralized to be useful. One could, for example,
assign each instant message a UUID,
which are defined in such a way that it is highly improbable for two
items to be given the same identifier. (This is problematic, as the
uuid: URI scheme
is unregistered.)

Instant messages are implicitly threaded and sequential, but they can be
organized into Topics. One (non-optimal) way to do it is to explicitly
identify the topic when saving an instant messaging conversation: all the
messages being saved are considered part of the Topic. The Topic is given
a UUID, and each messages
is declared a Post. The Posts can be given
fragment identifiers based on the Thread’s address.

This method for representing instant messaging conversations is only
one possibile way to apply the thread description language, and it does
have the disadvantage that, if both parties export the conversation
they will assign different URIs
to the Topic and Posts. While a superior method will undoubtedly present itself
in the future, this one is good enough to put IM
on the same footing as weblogs, message boards, Usenet, and e-mail.

Web-based archives

While weblog and message-board authors are free to link directly to
Usenet and e-mail messages, they generally will not because browsers cannot
dereference news: and mid: URIs.
Thus, a method is needed to identify some
URIs as aliases
for other URIs.
The appropriate choice here is probably daml:sameIndividualAs [@@sp?].

Classes

We define five types of resources for dealing with threading. One,
Post, is essential. The other five are collections of posts with various
different uses and properties.

Post

The fundamental atom of discussion. A post may be a single posting to
a weblog or message board, a Usenet message, an e-mail, an individual
statement in an IM conversation,
or anything else that can be part of a thread and can be assigned a
URI reference. Posts
typically have a single author and do not change over time. Posts may appear
in multiple locations (such as the archives and front page of a weblog).
The location specified by their address is their permanent location, others
are possibly-temporary mirrors.

Archive

A resource where one or more posts are located. Multiple posts appearing
as part of the same archive are distinguished by fragment identifiers.
Aside from being repositories for posts, archives have no major significance.
A resource may be a post and an archive at the same time.

Topic

An explicitly-declared thread. Topics group posts not by their
location (as archives do) but by some common relation, such as being
a direct or indirect response to a resource. Topics correspond to
message board threads and to the commenting features supported by some
weblogs. In the latter case, the post to which the comments are made
can also be the topic.

Forum

A collection of topics or sub-forums, such as a message board
which supports multiple threads. Some message boards separate
topics in broad categories; they can be viewed as a forum containing
multiple sub-forums, each of which contains several topics. The
URI used to
should resove to an introductory page which might list sub-forums
or a selection of topics. [@@ needs tweaking; newsgroups should
qualify as forums, what about mailing lists?]

Weblog

A set of posts controlled by a single authority. Weblogs are
popular form of personal web site used for publishing essays,
pointing to interesting web resources, self-promotion, or many
other uses. Weblogs have a number of common features which are
described later. The URI
used to identify the weblog is also the address of the weblog’s front page,
which frequently mirrors the most recent posts.

Properties

Cataloging

Rather than create a new vocabulary to describe common properties
such as titles, authors’ names, and so forth, we specify use of the
Dublin Core Metadata Set.

Some of the Dublin Core elements likely to be applied to posts,
topics, forums, and weblogs are:

dc:title

The title, name, or subject line of a post, topic, forum, or
weblog. As a rule, this should contain only information unique to
the resource, so “Re: Nixon’s dog” is fine but “SoAndSo Discussion
Forum—Re: Nixon’s dog” is probably not.

dc:creator

A string identifying the author or authors of a post, or the
creator of a topic, forum, or weblog. This could be a name, a
nickname, or some other identifying string. (If the intent is for
others to know what it means, don’t be too clever.)

dc:date

A string specifying the publication time of a post. It should
be formatted according to the ISO 8601 profile specified in
the W3C date/time note as a day, minute, or second. Times are
interpreted to mean “sometime in that period”, not “the start of
that period”. Thus, “2002-07-20” means any time during July 20,
2002. (Note that timezone information is required for minutes and
seconds.)

dc:description

A string describing a post, topic, forum, or weblog. Note that this
should not be used to present the content of the resource or an excerpt,
use the properties in the content section
for that.

dc:contributor

A string identifying someone who contributed to a thread, forum,
or weblog. Similar to dc:creator in syntax.

dc:rights

A string describing the copyright and other rights associated
with the subject, or a resource containing just a description.

Note that the “dc:” prefix is not part of the names themselves.
It merely indicates that these names are taken from the Dublin
Core metadata set, as was declared earlier.
Although “dc:” is commonly used, it is the namespace
URI which
identifies the namespace, not the prefix.

Although simple Dublin Core systems will expect plain-text
values for these properties, it is possible to provide more
structured values by using the rdf:value property. For instance,
if we wish to refer to the author of a Usenet post, we can do so
with a plain literal:

However, we may wish to describe Mr Example’s name
and e-mail address separately. Using a vocabulary such as
FOAF, we can
indicate that Mr Example is a person (and not, say, a
corporation), give his name and e-mail separately, and still
be understood by generic Dublin Core processors.

Processors which understand Dublin Core but not
FOAF will
use the rdf:value property and reason that the creator of that
post is “Joe Example <joe@example.org>”. Processors which
do understand FOAF will
reason that the creator of the post is a person named “Joe Example”
who has the e-mail address “joe@example.org”.

Membership and containment

Four relations indicate that a resource is part of or belongs to a larger
resource.

inArchive

The archive where a post is permanently located.

inTopic

A topic to which a post belongs.

inForum

A forum to which a post, archive, or topic, or smaller forum belongs.

inWeblog

A weblog to which a post, archive, topic, or forum belongs.

Five relations indicate smaller resources contained within a larger one.

hasPost

A post located in or part of an archive, topic, forum, or weblog.

hasArchive

An archive belonging to a forum or weblog.

hasTopic

A topic that is part of a forum or weblog.

hasForum

A forum that is part of a larger forum or weblog.

hasWeblog

A weblog that is part of some larger resource.

References

These properties apply to a post and describe the references it
makes.

refersTo

A resource to which this post refers.

followsUp

A post which this post corrects or updates.

commentsOn

A resource which this post discusses or responds to.

agreesWith

A resource which this post agrees with or amplifies.

disagreesWith

A resource which this post rebuts or presents evidence
contrary to.

pointsTo

A resource which this post refers to but does not discuss.

quotes

A resource which this post quotes

These properties apply to any resource and identify a post
which refers to it in some manner.

referredToBy

A post which refers to this resource.

followedUpBy

A post which updates or corrects this post.

commentedOnBy

A post which discusses or responds to this resource.

agreedWithBy

A post which agrees with or amplifies this resource.

disagreedWithBy

A post which rebuts or presents evidence contrary to this resource.

pointedToBy

A post which refers to this resource but does not discuss it.

quotedBy

A post which quotes this resource.

Sequence

In addition to the graph formed by inter-post references, posts
can also be organized in an order, as occurs in a linear thread.

first

The first post or archive of a topic, forum, or weblog.

last

The last post or archive of a topic, forum, or weblog.

next

A post or archive which follows this post or archive in
a topic, forum, or weblog.

prev

A post or archive which preceeds this post or archive in
a topic, forum, or weblog.

Content

There are several useful applications which require representing
the actual content of a post, such as storing a thread in a
self-contained file. Rather than define a new file format, we
stretch the meaning of “metadata” slightly and declare the
“content” property.

content

An XML
fragment representing the content of this post.

excerpt

An XML
fragment representing part of the content of this post.

Note that the value is described as an
XML fragment,
not a text string. This is because the content of many posts will
be best described in XML
(or languages such as HTML
which have XML equivalents).

Some guidelines are in order to avoid a situation like
RSS, where
HTML is
escaped and reencoded in XML.
To represent arbitrary XML
content in RDF-XML, RDF
defines the rdf:parseType="Literal" processing instruction,
which indicates to the RDF
parser that the contents of an element should not be parsed for further
RDF statements.

In this particular example, the post’s content is an
XHTML fragment
(assuming that the “html” namespace prefix is defined appropriately
elsewhere in the document). Implementers should be aware of two points:

The meaning of an XML
fragment is dependent on what namespace prefixes are declared. Thus,
regular expressions and other text-based, non-parsing approaches to working with
XML will not
always work as expected. Similarly, HTML
content must be expressed in well-formed XML
(this can be done with no loss of information, because
XHTML includes all
HTML elements).

The content of the post must survive
XML processing,
so any elements containing semantic whitespace (ie, where spacing
is important) must warn the parser that the spacing is significant
by using xml:space="preserve". This includes the
HTMLpre element,
as one can’t expect general XML
tools to have special knowledge of the
XHTML namespace.

Character strings containing no
XML markup
can still be considered
XML fragments,
which is useful for describing posts such as e-mail and Usenet
messages. Because the post content will undergo
XML parsing,
any reserved characters (“<”, “>”, and “&”) must be
escaped and the xml:space="preserve" instruction
should be used to preserve whitespace-based formatting. To produce
readable markup, applications may insert newlines before and after
the post content. (If a post begins or ends with a newline and
that newline is considered important, then an additional newline
must be inserted so that parsers will not strip it out.)

Note that the final </content> is not indented.
This is because the last character in the element is a newline. If
it had been indented, then the two newlines and the whitespace used to
indent the tag would have been included in the post content.

The purpose of the content property is to represent the content of
a post, so Usenet and e-mail headers should not be included. If the information
in the headers is deemed important and not covered by an existing
RDF property,
then a new property should be created.

Weblog-specific

These properties identify elements found in many weblogs.

currentPosts

A sequence (rdf:Seq) of posts which are considered to be
“current”. For example, the posts currently present in the weblog’s
front page can be considered current.

recommends

A resource, such as another weblog, which is linked to
in a prominent place in a weblog (often called the “blogroll”).

hasLinksAt

A page which lists recommended sites, often but not always
the same as the front page of a weblog.

hasRSSFeedAt

An RSS feed which
may be associated with the weblog, usually to list or syndicate
current posts.

hasTDLFeedAt

A resource containing a description of the weblog in Thread
Description Language; at a minimum, it must provide dc:title and
currentPosts properties for the Weblog and a dc:title property
for each Post listed in currentPosts

hasTDLContentFeedAt

Similar to hasTDLFeedAt, except that each post must also
include the content property

The hasRSSFeedAt, hasTDLFeedAt, and hasTDLContentFeedAt provide
ways of differentiating between the
RDF applications
which can be used to describe weblogs. They derive from rdfs:seeAlso,
which indicates a resource which might provide additional information
about the subject (in this case, a Weblog).

A highly-flexible weblog might include multiple metadata resources for
various purposes, and allow users to select among them by providing
a brief listing of its metadata. For example, the weblog
“http://example.org/blog” might provide a metadata listing at
“http://example.org/blog.meta” which tools could use to locate appropriate
information. (The method the weblog uses to indicate the existence of
this listing is not defined, although there is an informal tradition of
using link elements with the relation “meta” for indicating metadata.)

The last item, which contains a TDL description of Example Weblog’s
blogroll, includes information about its format to indicate that it is
not an HTML links page.

This is not a suggestion that weblogs abandon
RSS in favor of this standard
for providing feeds. RSS has
many applications beyond describing weblogs, and the case for using
a special language for weblogs is not currently compelling. Furthermore,
RSS 1.0 is based on
RDF, so the
two vocabularies can easily be used simultaneously.