The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Weird Characters <?> and Â

I am working on re-launching a website, and I am having some data display issues that Im not sure if its CSS problems, general UTF character issues or other.
--------------------------------------------------------------------------Here is what the live site looks like (good):
--------------------------------------------------------------------------

--------------------------------------------------------------------------Here is what my new site looks like, with identical data (bad):
--------------------------------------------------------------------------
--------------------------------------------------------------------------

Did you copy and paste the text from somewhere? If so, try deleting the space between those outline numbers and the text and adding new spaces. I don't know if that makes sense, but it works for me when I see those funny marks.

@mixmaster -- yes the text was copied from a client, I would edit everything, but there are literally thousands of these instances. Im a little puzzled because this is working on my current site, no weird characters... but now on my new demo site i guess my character encoding is wrong.

The important thing is that the encoding you use matches the encoding you declare. Otherwise browsers will misinterpret the ones and zeroes and show the wrong characters.

I generally recommend UTF-8 if at all possible, since that allows you to literally represent any ISO/IEC 10646 character. If you use a more restricted encoding, like ISO 8859-1, you'll have to use entity references or numeric character references to represent characters that aren't can't be represented literally (like dashes, curly quotes, ellipses, non-Latin characters, etc.)

Originally Posted by ripcurlksm

Here is my current header:

The header is meaningless if your web server declares the encoding in the Content-Type HTTP header (which is very common).

Assuming your server also declares the encoding as UTF-8, you must make sure that every character in your document is encoded as UTF-8.

The code fragments you posted indicate that you've written the document in something like MS Word and saved as HTML. As far as I know, Word doesn't use UTF-8 by default, but a Microsoft-specific encoding (Windows-1252).

That would explain the question marks in the black diamonds, since Windows-1252 characters with a code position greater than 127 will be invalid characters when interpreted as UTF-8 (which uses two octets to represent those characters).