Are you aware that the quotation marks in the @siwisdom <http://twitter.com/siwisdom>
tweets display as &ldquo; and
&rdquo; in clients like TweetDeck? Perhaps you
should switch to using regular ASCII double quotes.

Regards and Happy New Year.

Yes, I'm aware. They show up on the main Twitter page as well, and there
isn't much I can do about it, other than sticking exclusively with ASCII
and forgoing the nice typographic characters. It appears to be related to
this rabbit hole, only in a way
that's completely different.

What's going on here is explained here:

We have to encode HTML entities to prevent XSS attacks. Sorry about the lost
characters.

And XSS has nothing to do with attacking one website from
another, but everything to do with the proliferation of character encoding
schemes and the desire to fling bits of executable code (aka ``Javascript'') along with our bits of
non-exectuable data (aka ``HTML''). The problem is keeping
the bits of executable code (aka
``Javascript'') from showing up where it isn't expected.

But in the case of Twitter, I don't think they actually understand how
their own stack works. Or they just took the easy way out and any of the
``special'' characters in HTML, like ``&'', ``<'' and ``>'' are
automatically converted to their HTML entity equivelents ``&amp;'', ``&lt;'' and
``&gt;''. Otherwise, to sanitize the input, they would need to do the
following:

get the raw input from the HTML form

convert the input from the transport encoding (usually URL
encoding but it could be something else, depending upon the form)

possibly convert the string into a workable character set the program
understands (say, the browser sent the character data in WINDOWS-1251,
because Microsoft is like that, to something a bit easier to work with,
say, UTF-8)

if HTML is
allowed, sanitize the HTML by

removing unsupported or dangerous tags, like
<SCRIPT>, <EMBED> and
<OBJECT>

Fail to do any of those steps, and well … “1 h@v3 h@cxx0r3d
ur c0mput3r!!!!!!!11111” And besides, I'm probably missing some sanitizing
step somewhere.

Now, I could convert the input I give to Twitter to UTF-8 and
avoid HTML entities
entirely, but then I would have to convert my blog engine to UTF-8 (because
I display my Twitter feed in the sidebar) and while it may work
just fine with UTF-8, I haven't tested it with UTF-8 data. And I
would prefer to keep it in US-ASCII to avoid any nasty surprises.

Besides, I shouldn't have to do this, because that's
why HTML entities were
designed in the first place—as a way of presenting characters
when a character set doesn't support those characters!

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go
ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or
entry, if there is only one entry). The titles are the permanent
links to that entry only. The format for the links are
simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are
interested in, say 2000/08/01,
so that would make the final URL:

You may also note subtle shading of the links and that's
intentional: the “closer” the link is (relative to the
page) the “brighter” it appears. It's an experiment in
using color shading to denote the distance a link is from here. If
you don't notice it, don't worry; it's not all that
important.

It is assumed that every brand name, slogan, corporate name,
symbol, design element, et cetera mentioned in these pages is a
protected and/or trademarked entity, the sole property of its
owner(s), and acknowledgement of this status is implied.