The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Language settings for Russian website

Hello there, it's the first time that I'm going to do a small site in Russian (a translation of an existing English site). I've trawled the Internet for a definitive answer on what the language settings have to be in the HTML code, but I'm not so sure now...
I think that I need:
<META http-equiv="content-type" content="text/html; charset=utf-8">

Do I also need:
<html lang="ru">
If so, what exactly does this do?
I've never put <html lang="en"> in any of my English sites...

I think that I need:
<META http-equiv="content-type" content="text/html; charset=utf-8">

This is only correct if you save your HTML files with a UTF-8 encoding. It will be ignored if your web server sends the encoding information in the real Content-Type HTTP header.

Originally Posted by spirelli

Do I also need:
<html lang="ru">
If so, what exactly does this do?

It declares that the natural language of the document is Russian. Yes, you should definitely use this attribute.

Originally Posted by spirelli

I've never put <html lang="en"> in any of my English sites...

You should! It can be used by screen readers to select the correct synthesizer library, and it is also used by search engines to classify your content so that users can search for pages in a particular language.

With the windows charset, any characters outside that win-1251 range you want to put on your web site will look like a big ugly box with four squirlies in it (that's what FF shows anyway) while Safari has a little boxie with a ? and other browsers just have the ? and someone has a black diamond with a ? in it. And we don't all have a Windows machine so we don't all have whatever all Windows has in its charset.
Though as I understand it the windows charset is mostly ASCII which is a subset of utf-8 characters-- utf-8 will just recognise more characters and utf-8 covers cyrillic (what you're using) Greek and the slavic letters with the thingies on the s's : )

Although Tommy is "ru" the right one? I might be confusing it with something else but I thought the russian would be the py? Or рф?

*edit n/m, it's "ru"

*edit2 you may also want to have a meta tag with the language as well. I've heard filthy rumours that some user agents look at the lang attribute on the html tag (like you have) but that others only look for the meta tag. To cover my butt on my sites I've had this (excuse the XHTML, legacy):

I put it and the charset meta before the title cause my titles are (usually) also Dutch.
As my copy of JAWS has Finnish and two sorts of English, but no Dutch, I can attest to the difficulty listening to a language pronounced really wrong. Even when you can see and read where you are, you have trouble following : ( I actually have to translate my forms into English to check functionality : (

UTF-8 is an international standard that allows you to use literal characters for any Unicode character. You won't need any entity references (like &hellip;) or numeric character references (like &#38;#8230;).

Windows-1251 is a Microsoft-specific encoding based on – but not equivalent to – the standard ISO 8859-5. It only allows a small number of characters (about 220) to be represented literally. For anything else you need to use entity references or NCRs.

Originally Posted by spirelli

Which one should I use?

If at all possible, use UTF-8. If not, use ISO 8859-5 rather than the proprietary Microsoft encoding.