Background:ISO-8859-1 is a subset of UTF-8, which supports many languages.
windows-1252 is like ISO-8859-1, except it also uses positions 80 through 9F for "special" windows characters -- see http://en.wikipedia.org/wiki/Windows_1252

These special characters (the most common being "smart" or "curly" quotes and the "emdash" (double dash) most commonly find their way into html by way of Microsoft Word content being pasted into email, web content, CMS, feeds, and blogs.

Almost all modern browsers use a "loose" interpretation of 8859-1 and the windows characters appear as intended (one that does not always appear or print right is the euro â‚¬ sign).

The problem:Pages interpreted as strict ISO-8859-1 or UTF-8, including all properly-served xml content (such as RSS feeds) will not reproduce the windows-1252 characters, nor will they be automatically replaced, nor will the feeds validate properly.

To make matters worse, xml <title> and most channel fields disallow the use of character entities, so substitution with their decimal, hex, unicode, or html equivalents (like â‚¬ or &euro;) will invalidate the feed. Furthermore, direct replacements like this only work if the server encoding is UTF-8 (irrespective of your feed declaration line). So the only place direct replacements with their UTF-8 equivalents works is in the <description> fields.

Towards a solution:I have written two subs which will be incorporated into my RSS Feed Style after a little more testing.

Sub replace1252 is for ISO-8859-1 content, and also where html substitutions are not allowed, such as xml <title> fields. It makes reasonable replacements of windows-1252 characters with generic equivalents, based on their most common usage. It is also the safest route for compatibility and printability.

Sub preserve1252 is a little riskier, because it must be used only with server-encoded UTF-8 content (irrespective of your xml encoding declaration), and only where html content is allowed, such as xml <description> fields, and on web pages, etc.

Since those of you who have encountered these issues for x80 through x9F characters in your own feeds will want to test the replacements, here is a complete set of those "problem" characters to paste into your test content (don't worry about control characters, they get ignored anyway):[quote][size=150]â‚¬ â€š Æ’ â€ž â€¦ â€ â€¡ Ë† â€° Å â€¹ Å’ Å½ â€˜ â€™ â€œ â€

Last edited by Musicvid on Fri Jun 06, 2008 12:11 am, edited 2 times in total.

Other methods are to escape them with their HTML entities in RSS <description> fields, and replace or eliminate them in <title> fields, where HTML is not allowed. You can use Perl regular expressions to accomplish any of those tasks.