Building an Extensible RSS Parser in C#

Daniel DeLay

January 28, 2011

The Need for an Extensible RSS Parser

This came up while attempting to parse an RSS feed with an existing XML parsing library. The project required the usage of additional fields that were available in the feed, such as "dc:publisher". However the XML parsing library wasn't easily extensible to allow for consumption of these fields. Editing the exiting library code directly would cause anyone who uses the library to have their implementations look for these custom fields. And copying the parser out of the library and embing it directly into the project would not be a best practice for code reuse. What would have been useful is a parser in the library that could easily be extend to include any additional fields requested.

In my quest to find the most awesomely extensible parser ever, my solution touched on a number of programming topics including:

Laying Out the Requirements

The base RSS parser should be able to live in a library file and not require editing for new projects.

The base RSS parser should be extensible so that I can specify the fields that I want to consume that vary from project to project.

The model should implement an interface so that usage of the RSS parser remains consistent.

I would like to use the .NET XmlSerializer class along with XmlElement attributes because I like the readability and intuitiveness it provides when defining new fields to consume.

Mocking up the API

From a design perspective, coding to an API is a great way to start. Here is a good post on API-First Design. The first line should let me specify my feed source, deserialize it, and return to me a feed object:

TFeed feed = Deserializer.GetFeed<TFeed>(source);

This line implies that we’ll have a static deserializion class with a GetFeed() method. Generically, I’ll want to specify the type of my feed which will be an extension of the RSS Base class and as an argument it will take a feed source. In this case the source will be a URL, but we’ll overload it so that it can take a FileStream as well. Once I have my feed object I’m going to want to be able to get an array of channels. We’ll look at using an Interface here to ensure we always have a GetChannels() method.

List<TChannel> channels = feed.GetChannels();

Finally, once I have a channel, I’m going to want a method to get my RSS items. We’ll also use an Interface here to make sure we always have a GetItems() method.

List<TItem> items = channel.GetItems();

Building the Library Code

Now that we have requirements and specifications, let’s put together some code. Here’s a Deserializer that takes in a generic feed type and feed source, and returns a feed object:

Next, we’ll set up our abstract base classes. We’ll have one for a feed object, then a channel object, and lastly an item object. These constitute the standard structure of any RSS feed (feed/channel/item) and distinguishes it for a more generic xml deserializer model.

Each class is abstract so we’ll need to extend these in our project’s implementation.

Each class has a Serializable attribute

Each class has an XML Root attribute that follows the standard RSS structure of feed/channel/item.

Within each class we identify our required RSS fields with XmlElement attributes. Here is a link to a resource that shows which fields rss requires as well as the standard optional fields: http://www.w3schools.com/xml/xml_rss.asp

The Date field isn’t a required RSS field. This could technically go in our extended implementation. Note the usage of the XmlIgnore attribute in the BaseRSSItem class, this takes the pubDate string returned from the deserialization process and converts it to a Date object. I would have liked to place this post-deserialization process in either an OnDeserializationCallback or OnDeserialzed event but it appears these are not supported when using the XmlSerializer object.

Common RSS Namespaces

Let’s assume that we have this code compiled into our utility or library dll. We can then setup our use case scenario and override our base class. Suppose we want to consume a feed from a source produced by BlogEngine.NET. This platform loads in many of the optional RSS elements as well as a plethora of namespaces such as our good friends Dublin Core, BlogChannel, PingBack, Slash, and Geo. Here is an example: Here is a resource that shows many of the most commonly used namespaces found within RSS feeds.

Extending Our Library Code

Our scenario will be simple, we want all the required fields that the base class provides plus we want the Publisher field from Dublin Core. We'll make a class called CustomRSSFeed and extend the base classes like this:

And that’s the can of corn. Attached here is a VS 2008 solution which has these samples rolled into something you can open and test straight away. There are even some unit tests put in there for good measure, remember that all library code for your company should contain unit tests.