Content Syndication

Table of contents

Since SPIP1.9, the syndication system (see the article «Les fichiers backend», translation pending) has been extended: you can now share the attached document urls (podcasting), transfer keywords (tags) from one site to the other as well as the section (or category) of the articles. This way, you can syndicate, the full content of your articles.

In this article, we suppose that the syndication feed provided by the source site provides a rich content comparable to the one provided by SPIP’s template dist/backend.html.

Quick site referencing

To register (reference) a site with SPIP, you first need to locate it’s RSS (or Atom) syndication feed. These feeds are indicated in different ways on websites. It can sometime be discovered automatically by the browser which will show you a small characteristic icon. In that case SPIP might also be able to detect it automatically and you will only have to specify the URL of the site to start the syndication. Otherwise, you’ll have to specify the full URL of the RSS file.

When the syndication is activated, the articles provided by the source feed will be recorded on your site.

Deciding of what your feed emits

The default templates provided by SPIP include an RSS feed template dist/backend.html, from which you can get inspiration to create your own.

However, if you do not want to put your hands in the template code, you can still configure the default backend from the private area. In the configuration panel, you will be able to decide if your site’s feed provides:

the full content of your articles (in HTML format),

a small summary of each article (in a textual format),

In the first case (which is the default configuration), the entire content of the recent articles published on your site will be readable with an RSS reader. It is therefore possible to copy them entirely. This can be used to create site mirrors, or portal sites created from other websites content.

Deciding what you want to receive

If the source site you are syndicating is not distributing its full content, there is nothing you can do about it. However, in the case where its feed distributes the full content, SPIP can record this HTML content and the images it contains to display it.

You can select, site by site, what type of syndication you need: either the full HTML content (if available) or only a simple summary in textual format.

You can also choose what to do with the articles that are not any more in the feed (which is generally limited to the 15 most recent articles of the site). SPIP offers the possibility to either:

remove them from the database (after a two months period),

directly mark them as “refused”.

These options enable, for example, the management of a news portal site where the feeds change quickly (press agency, popular tags on a photo sharing site, etc.); or an exact mirror of a site by keeping track of the articles that are removed from the site.

Deciding what to display

This section is relevant if you want to edit your templates.

- the source:

Usually, the tag #NOM_SITE that returns the name of the syndicated site would be sufficient to display the “source”. However, with the growing number of content aggregators (like web portals for example), the real source is not always the syndicated site. Hopefully, the RSS format provides the information about the original article source (with the <source> tag). The syndication system provided by SPIP can extract this information and if they are present, you can use the tags
#SOURCE and #URL_SOURCE to display the name and the url of the site.

- the tags:

If some keywords are correctly attached to the syndicated articles in the RSS feed, they will be recorded by SPIP; because of the particular RSS format, they are not separated in multiple keywords but stored in only one field tags of the table spip_syndic_articles.

The tags can come in two formats in the source site feed:

if the feed uses the notation <dc:subject>Tag</dc:subject>, SPIP records the tag as is,

if the feed uses microformat notation (as does the default feed provided with SPIP) <a rel="tag" href="url to the keyword page">Tag</a>, then SPIP will record the tag with its URL.

You can use the #TAGS tag in your template to display all the tags attached to a syndicated article (we will see later the filters available to process the tags individually).

Note : SPIP syndication system can process tags coming from the sites
del.icio.us (collective bookmarking), flickr (photography) and connotea (annotation of scientific articles) and automatically assign the right URL to them (as it is not provided by these site’s RSS feeds).

- the section:

In many web applications (blog system, link directory), the category (or sometime directory) can be compared to the section concept in SPIP. It is therefore natural to use the RSS notion of <category>...</category> to record the section in which the article is published.

As the tags, this information can be displayed in your templates with the tag #TAGS (see the next section for more information).

- the attached documents:

In SPIP, you can attach documents to your articles. In that cases, these documents will be referenced (by their URL) in your RSS feed by the tag:
<enclosure ... />. This is called podcasting.

If the source site you are syndicating also provides this enclosure information, then SPIP will be able to record the documents [1].

The enclosures information is also stored in the #TAGS template tag (and can be extracted with a special filter as we will see in the next section).In the private area, if an article contains an enclosure, it will be identified by a small clipper icon.

Using the tag #TAGS

As we have seen in the previous sections, the template tag #TAGS will display many information at once about keywords, sections and documents. However, SPIP marks all the links to each independent items by using microformats:
- <a rel="tag" ...> for the tags/keywords
- <a rel="directory" ...> for the section/category
- <a rel="enclosure" ...> for the documents/podcast

If you want to display only one type of item contained in this tag, you can use the afficher_tags to extract it:

[(#TAGS|afficher_tags{directory})]

[(#TAGS|afficher_tags{tag})]

[(#TAGS|afficher_tags{enclosure})]

(By default, [(#TAGS|afficher_tags)] without parameters will be the same as [(#TAGS|afficher_tags{'tag,directory'})].)

For the attached documents, a specific filter exists.
afficher_enclosures will display the clipper icon instead of the normal links:

[(#TAGS|afficher_enclosures)]

* * *

Working on the HTML content of the syndicated articles

Example case: we syndicate a photoblog site. This site will systematically provide — as articles — a small comment and a photo.
This one will probably be inserted directly in the article HTML with a tag <img .../>. However, we might want to display, on our site, only the photo; therefore, we need to extract this HTML <img .../> tag.
The extraire_balise{xxx}filter has been designed for this purpose. It will extract the first HTML tag <xxx /> from the text on which it is applied.

From this, you can perform many operations:
- [(#DESCRIPTIF|extraire_balise{img})] will display the photo;
- [(#DESCRIPTIF|extraire_balise{img}|extraire_attribut{src})] will display its url;
- [(#DESCRIPTIF|extraire_balise{img}|extraire_attribut{width})] will display its width;
- you can even change the style of the image:

Note: the HTML content provided by external sites is, by definition, considered as “untrusted” and potentially dangerous. SPIP will therefore systemically apply the safehtml filter to remove any JavaScript and other sensible elements.

Other filters related to the syndication

You might want to transform one syndication format to another (to create your own feed template). SPIP provides a few filters that can be applied to the #TAGS tag, to transform from and to the microformats notation:

tags2dcsubject,

enclosure2microformat,

microformat2enclosure.

References

[1] Note that the document is not copied on your site, only it’s URL and some information (title, size, format) will be recorded.In addition, SPIP adds some flexibility to the original RSS format as it allows multiple enclosure per article.