I am planning to develop a website that require that the users register a username and a password. When I let the user choose a password, what chars should I allow the users to have in the password? is there any that I shouldn't because of security issues with the http protocol or implementation language?

I haven't decided for a implementation language yet but I will use Linux.

8 Answers
8

From a security/implementation perspective, there shouldn't be any need to disallow characters apart from '\0' (which is hard to type anyway). The more characters you bar, the smaller the total phase space of possible passwords and therefore the quicker it is to brute-force passwords. Of course, most password-guessing actually uses dictionary words rather than systematic searches of the input domain...

From a usability perspective, however, some characters are not typed the same way on different machines. As an example, I have two different computers here where shift-3 produces # on one and £ on the other. When I type a password in, both appear as '*' so I don't know whether I got it right or not. Some people think that could confuse people enough to start disallowing those characters. I don't think it's worth doing. Most real people access real services from one or maybe two computers, and don't tend to put many extended characters in their passwords.

Maybe it is useful to have a warning like you have characters in you password which are not at the same place on every keyboard. Are you sure you want to proceed? But when viewed strictly, this only leaves about 20 keys which are same on most keyboards (not counting Dvorak and the like).
–
Paŭlo EbermannFeb 12 '11 at 2:10

10

@Earlz This question is about password if you're storing it in plain text, you're doing it wrong.
–
HoLyVieRFeb 13 '11 at 15:36

1

Because it's more likely to be handled incorrectly by the underlying implementation. You don't want to start truncating passwords just because somewhere in the bowels of your runtime they get converted to C strings.
–
user185Feb 3 '13 at 8:28

There can be issues with non-ASCII characters. A password is a sequence of glyphs, but the password processing (hashing) will need a sequence of bits, so there must be a deterministic way to transform glyphs into bits. This is the whole murky swamp of code pages. Even if you stick to Unicode, there is trouble afoot:

A single character can have several decompositions as code points. For instance, the "é" character (which is very frequent in French) can be encoded as either a single code point U+00E9, or as the sequence U+0065 U+0301; both sequences are meant to be equivalent. Whether you get one or the other depends on the conventions used by the input device.

A Unicode string is a sequence of code points (which are integers in the 0 to 1114110 range). There are several standard encodings for converting such a sequence into bytes; the most common will be UTF-8, UTF-16 (big-endian), UTF-16 (little-endian), UTF-32 (big-endian) and UTF-32 (little-endian). Any of these may or may not start with a BOM.

Therefore a single "é" can be meaningfully encoded into bytes with at least twenty distinct variants, and that's when sticking to "mainstream Unicode". Latin-1 encoding, or its Microsoft counterpart, is also widespread, so make that 21. Which encoding a given piece of software will use may depend upon a lot of factors, including the locale. It is bothersome when the user cannot log on his computer anymore because he switched the configuration from "Canadian - English" to "Canadian - French".

Experimentally, most problems of that kind are avoided by restricting passwords to the range of printable ASCII characters (those with codes ranging from 32 to 126 -- personally I would avoid space, so make that 33 to 126) and enforcing mono-byte encoding (no BOM, one character becomes one byte). Since passwords are meant to be typed on various keyboards with no visual feedback, the list of characters should be even more restricted for optimal usability (I daily battle with Canadian layouts where what is written on the keyboard does not necessarily match what the machine thinks it is, especially when going through one or two nested RDP connections; the '<', '>' and '\' characters are most often moving around). With just letters (uppercase and lowercase) and digits, you will be fine.

You could say that the user is responsible; he is free to use any characters he wishes as long as he deals with the problem of typing them. But that's not ultimately tenable: when users have trouble, they call your helpdesk, and you have to assume part of their mistakes.

Why avoid spaces? Perhaps str.replace them out if you think it may cause compatibility problems anywhere, but being unable to make decent passphrases really bothers me. Nobody has spaces in passwords, which is exactly why I use them very frequently.
–
LucFeb 8 '14 at 0:23

On a typical keyboard, the space bar makes a distinctive sound, making it an easy prey for shoulder surfers.
–
Thomas PorninFeb 8 '14 at 12:16

One should probably say "if using Unicode, then as always, there is trouble afoot" instead of "even if you use Unicode". It is Unicode that creates trouble (as always!), not code pages. Entering the same password with the same codepage results in the exact same bits, reliably. It is only Unicode that screws things up. Entering the same PW using a different code page will produce different bits. This is actually a tiny bit of extra security (2-3 bits to guess the codepage) for a dictionary attack. It also makes using a snooped PW harder (possibly running into login fail limit trying CPs).
–
DamonFeb 9 '14 at 11:18

I think that unless a 'virtual keyboard' or a similar tool is available, that would produced characters in uniform way, we have alphanumeric characters only. The location of all the rest can differ on different keyboards. If a user should access the service from another location, that could lead to efficiently locking them out of service.

I would suggest using virtual keyboard as a way to send exactly the same character representations (it was said about Unicode above already) in the same manner no matter what system/keyboard/whatever is used. Thus there will be no need to exclude any character that could be typed on any keyword.

*, ? and %: As these are often used as wildcards they may confuse the underlying programming language.

Tab, Return, NewLine, Vertical Tab, Escape: Such special characters can solicit weird behavior from your programming language OR from the browser used by the customer. (If the customer uses several different browsers it is quite possible that one will allow these to be entered and another browser not. Effectively locking the customer out of his account on that browser.)

\ is often treated as an escape character that gives the character that follows special meaning.
E.g. "\n" is newline in many cases. "\t" is tab.
If your programming language (or the customers browser) does this you are back to the possibility of receiving the characters I mentioned above.
So it is probably best to dis-allow \ altogether just to be safe.

I'm sorry for the downvote, but there is a reason for it: issues in the underlying programming language are never a reason to prohibit characters from being used. This is what mysql_real_escape_string, or better, parameterized queries are for. User data should never ever ever be interpreted as being executable code regardless, if this happens you'll have much bigger problems than just password storage. Asterisks, question marks, percentage signs and backslashes are perfectly fine characters that I use and want to continue using in my passwords. Besides, didn't we hash them before storage?
–
LucFeb 8 '14 at 0:43

If someone implemented this I would not use their site as none of my passwords would work.
–
Chris Andrè DaleFeb 14 '11 at 15:46

This is highly insecure and easily crackable. If you limit yourself to upp and lower characters you minimum length should be 17.
–
this.joshJul 29 '11 at 20:02

@this.josh Minimum length of 17? Alright here's an md5sum of an 8-character letters-only password: 124c6ffa6d57c5909e7a403293aed173. Generated using echo -n secret | md5sum. Since this is less than the square root of the strength you said is the "minimum" more than 2 years ago, I expect it must be no problem to crack on a commodity gpu (using hashcat or barswf or something). Good luck. (Honestly I think it's doable, but md5 shouldn't be used for password storage anyway. Still, I wonder if anyone can figure it out.)
–
LucFeb 8 '14 at 0:37