Tuesday April 28, 2009

Silverlight can be used to enhance the user experience for SharePoint users. Most data stored in SharePoint can be displayed directly by existing Silverlight controls and/or the controls available in the Silverlight Toolkit available on Codeplex. The exception to this is the displaying of rich text formatted text field data. RTF formatted text is stored in a simplified HTML format in SharePoint there current is no easy way to correctly display this type of text data in Silverlight… until now.

The primary mechanism for displaying text data in Silverlight is the TextBlock control. The TextBlock control can be used for displaying text runs of changing fonts and decorations, such as bolding and underlines. This means we can used the TextBlock control to represent “most” of the text data in a SharePoint RTF field, however, the simplified HTML used to represent RTF formated text in SharePoint can also contain hyper-links and images. To correct represent all aspects of SharePoint RTF data we actual needed to create several different Silverlight controls.

To correctly represent RTF text data we will need to parse the HTML mark-up from in each RTF text field and translate each HTML mark-up tag into an appropriate Silverlight control. HTML data that conforms to HTML standards also conforms to generic XML standards, so you would assume that RTF text field data could also be parsed using the built-in XML document parser. Unfortunately the HTML fragments used by SharePoint do not conform to the latest HTML standard. RTF text data contains many tags that are not closed. For example, a line break tag that conforms to the latest HTML standard looks like this “<br/>” , note the closing slash, which as pre the XML standard indicates that the br tag is closed, with no contained text or child tags. The SharePoint RTF editor uses the HTML 1.0 version of this tag “<br>” with no closing slash. In fact, if you attempt to use the HTML 4.0 compliant tag “<br/>” the SharePoint text box editor will helpfully convert it back into the HTML 1.0 version “<br>”. For HTML browsers this works just fine as they are able to easily read both HTML 1.0 and HTML 4.0 tags, however, the XML parser requires that all tags fully conform to the XML standard, which requires that all open tags have a corresponding close tag, or that the opening tag contain a closing slash like our “<br/>” example. This of course, means that the built-in XML parse cannot be used to parse SharePoint RTF text data.

Previous attempts at creating Silverlight controls that can handle HTML data have relied on the browsers ability to parse. This approach does work, however, in practise, the mechanism to do this appears to be very slow.

The approach taken here is to create an HTML parser to parse the simplified HTML used by the RTF text fields. The HTML parser will be constructed in three layers as follows:

HtmlTextBlock Parser Layer #1

The first step in parsing text is to create the ability to read the text one character at a time, provide the ability to “look ahead” one or more character, and keep track of where the parser is in the text stream, in terms of line number and column number, in case an error or warning needs to be reported to the user. The StreamReader class is a convenient standard class for access text data one character at a time.

In Siliverlight the following code appears to be the best way to initialize the stream reader:

Once the steam reader is initialized I like to wrap this layer of a text parser in a simple set of methods:

char nextChar() a method to return one character a time and to track the character position it terms of line number and column number. This method also has the ability to indicated when the end of stream has been reached by returning the “character” EOF_CHAR or ‘\0’.

void pushChar(char c)

a method to push one character back into the character stream to provide the ability for the next level of the parser to "look ahead" one or more characters.

Note one of the tricks used in the pushChar() method is the use of the private variable m_previousLineLength. This allows pushChar to correctly restore that correct column across one carriage return. The method could be made more generic to allow more that one carriage return to be pushed back into the character stream, but this is not really neccsary for the type of parsing we are going to do here.

HtmlTextBlock Parser layer #2

The next layer of our parser will be responsible for parsing the HTML tags and HTML entities such as “&amp;” etc. I have wrapped this layer in three methods:

publicHtmlNodeNextNode() - read and remove the next node from the node stream.

publicHtmlNode PeekNode() - read the next node in the node stream, but do not remove it.

The third layer of the parser reads the HTML node stream and converts it into a set of Silverlight controls. Parsing HTML nodes into corresponding Silverlight controls is relatively straight forward if you don’t have to worry about HTML style such as padding, margins, font size and font decorations. To simplify the parsing and displaying the correct style in Silverlight I decided to create a specialized HtmlStyle class to manage style and a set of custom Silverlight controls to mirror the functionality of each HTML tag. The following UML class diagram represents the HtmlStyle class:

The main functionality of the HtmlStyle class is to replicate the cascading effect of HTML CSS style definitions. So HtmlStyle instance are created in a parent-child hierarchy and as HTML tags are encountered the latest HtmlStyle is managed by pushing and popping HtmlStyle objects on and off of a stack. Each attribute or property of the HtmlStyle class either has an assigned value or the value is retrieved from it’s parent HtmlStyle.

The following table shows the HTML tag and the corresponding Silverlight control used to render it, as well as the base Silverlight class that it is derived from:

HTML Tag

Rendering Control

base class

Notes

<a>

HtmlAnchor

HyperlinkButton

<b>

HtmlStyle

n/a

An HtmlStyle object is created with FontWieght set to Bold.

<blockquote>

HtmlBlockQuote

HtmlDiv

<br>

HtmlLineBreak

Control

<div>

HtmlDiv

Panel

<em>

HtmlStyle

n/a

An HtmlStyle object is created with FontStyle set to Italic.

<font>

HtmlStyle

n/a

An HtmlStyle object is created with all appropriate style properties set. If the style defines a background, and additional HtmlDiv element is created to allow background to show up correctly.

<i>

HtmlStyle

n/a

An HtmlStyle object is created with FontStyle set to Italic.

<img>

HtmlImg

Canvas

<p>

HtmlAnchor

HyperlinkButton

<strong>

HtmlStyle

n/a

An HtmlStyle object is created with FontWieght set to ExtraBold.

<u>

HtmlStyle

n/a

An HtmlStyle object is created with TextDecorations set to Underline.

<ol>

HtmlOrderedList

HtmlList

<ul>

HtmlUnorderedList

HtmlList

<li>

HtmlListItem

StackPanel

HtmlList

StackPanel

This is the generic base class for HtmlOrderedList and HtmlListItem.

whitespace and text

TextBlock / Run

n/a

All whitespace and text is represented using standard Silverlight TextBlock and Run classes.