But moreover, it'll also validate the value on the server side, with a call to [string bytelength], which correctly handles multibyte characters.

The error message to the user will be "Term name is 3 characters too long". The reason we don't explicitly say what the maxlength is, is that it depends on the presence of multibyte characters. So telling him to remove 3 will always work. If he removes multibyte characters, he could get away with removing fewer, but if he removes 3, he's guaranteed to be safe.

I've also added it to ad_form:

{term_name:text {label "Term name"} {maxlength 20}}

Please use liberally on all your forms, so we can avoid those nasty DB errors causing 500 internal server errors, just because the user typed a few characters too much.

I am not sure bytelength is the right thing to use. I guess in general it will be conservative but if your db is
utf-8 and you have a varchar(20), isn't is 20 characters even if they are multibyte? Conversely if your db is iso-8859-1
and you enter high bit characters bytelength will say 2
bytes but the representation in the DB will be one byte.

I'm not sure why you don't want to use string length. Here's what the manual says about bytelength

string bytelength string
Returns a decimal string giving the number of bytes
used to represent string in memory. Because UTF-8
uses one to three bytes to represent Unicode char¡
acters, the byte length will not be the same as the
character length in general. The cases where a
script cares about the byte length are rare. In
almost all cases, you should use the string length
operation. Refer to the Tcl_NumUtfChars manual
entry for more details on the UTF-8 representation.

So it seems to me string length works fine. Have you seen evidence otherwise?

If you have something else, for example SQL_ASCII, in there then those are single byte encoded databases. As far as I understand it's in almost any case the right thing to create your database as UNICODE when you want to be able to store data in different encodings.

The error that your maxlength procedure catches indicates that something else is going wrong before, because in that case you would end up storing a single international character (e.g. a german umlaut) as two characters in the db, which leads to lots of other problems. For example a query that selects a substring could split the 2-byte character in two pieces. You should have created your database UNICODE encoded or in the encoding that understands the characters that you need (e.g. LATIN1).

As far as I know there is no way to change the encoding of an existing database in postgresql apart from pg_dump'ing the contents, recreating the database in the desired new encoding and importing the data. I've never done that myself so I don't know wether it's necessary to specify encodings for pg_dump or psql when importing. You propably need to find a trick to tell pg to export the chars that were wrongly saved as 2 characters as one, or run a regexp over the dump file.

Regarding the missing documentation, I added a comment to the installation page.