The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Character Encoding and Funny Characters

Hi,

Please can someone give me some advice, we're having problems with character encoding and funny characters being displayed on our sites. Basically we used to develop all our sites using a standard charset=ISO-8859-1 charset.

But we've since started to use UTF-8, I guess maybe since we started using Wordpress.

However we've found that when people copy and paste stuff from MS Word for example, it often includes the ` character and other's which aren't in either character set. This means it's replaced with either a wierd question mark or some other characters.

We've tried str_replace to remove them and also other functions which are supposed to make the content 'compatible' with other types, but nothing seems to work properly.

Any ideas?

What is the best character set to develop sites in? We've also noticed problems when we're including an XML feed which is written in UFT-8, on a site which is otherwise encoded in ISO-8859-1 format.