What Are Those Unknown Characters in My Blog Comments?

Little squares, boxes, lines, and funky symbols are often found in blog comments. A friend new to blogging called me about these wanting to know if her blog had been hacked or her computer had a virus. “It looks like alien writing from Star Trek!”

Those strange symbols and characters in your blog comments are foreign language letters that your browser cannot interpret. It doesn’t recognize the language. It is not a problem with WordPress or whatever blogging program you are using. Your browser isn’t translating the characters into their appropriate letters.

Think of it this way. Your web browser comes with a default language. In my case, FireFox comes with English built-in. It recognizes English letters and displays them on the screen for me to read. Because English is one of the romantic languages, using letters similar to French, Italian, Spanish, German and other similar languages, those languages are readable from within my browser.
To add more languages, which would convert those funky characters and symbols into something readable, you must install the different language character sets, called character encoding or HTML character entities, for those languages.

Should You Add More Foreign Languages?

Do you really want to add foreign languages you cannot read to your computer or web browser? While most modern computers can handle extra loads on the memory, older computers can’t. Why add language character sets if you can’t read them?

You can translate them using Google Translate or another online machine translation program to figure out what they said, but these are limited to only a few languages and may not include Japanese, Chinese, Russian, Hindi, Arabic, and other language characters. Online machine translations for many of these languages are still in development for free access.

You can delete the comments, though those who have those languages enabled in their browsers will be able to read the comments, so I recommend you leave them – unless you are sure they are comment spam. Comment spammers use foreign languages to fool you into thinking these are legitimate comments. The URL that accompanies the blog comment, if one is present, usually gives you a clue as to whether or not the comment is spam. If it is, mark it as spam. If in doubt, delete it. Otherwise, leave it.

No, your blog comments have not been hacked and your computer probably does not have a virus. And no, these aren’t languages from Star Trek. They are just a machine interpretation limitation. Maybe someday we won’t have to add languages to our computer or browser in order to see their character sets. Maybe someday, instantaneous translations will be built into our browsers so we will be able to read any blog in any language in our own language, opening the world up to new life, new civilizations, to boldly go where no one has gone before.

Lorelle, I think you’ll a little confused on this issue. You don’t really need to “install” other languages and they don’t use extra memory. It’s just encoding and can be changed from View>Encoding. Unicode for instance should do fairly well at displaying other languages.

In the past when I’ve added/installed (depends upon the process and operating system) languages, some do consume a bit of RAM, though negligiable on modern computers. Yes, it is just “encoding” but if the person wants to write in that language as well as read, it is a little more complex.

As a side note: In some cases the weird symbols ARE on the server and are not a browser language issue. This is mostly true in cases where you have a very old site and used a non UTF-8 encoding for your database (such as the default Swedish language used by MySQL if you don’t assign it another type). In those cases you can end up with some funny symbols due to the language type of your database not correctly serving up characters from a different language. This typically happens when you reload a database from a backup and the language types conflict between the backup file and the server language setting.

You can also induce some weird characters on the server if you cut and paste from a Word document that has “Smart Quotes” turned on, since those characters do not have a proper encoding within the language standard and end up looking like little black boxes when displayed on your site.

Yes, those are more reasons. The strange characters you are seeing in the image I’ve included in the article are from this WordPress.com blog, on the hottest and most up-to-date servers in the world by Layered Technologies. 😀 It’s my browser not converting the character set. Once I have those languages in my browser, they will convert.