If you store the ♥ character in an utf-8 encoded string, you effectively need 3 bytes to store it. So you could actually type the 3 corresponding "normal" characters to get the same result, it's just a longer password.
–
martinstoeckliMar 30 '12 at 7:10

5 Answers
5

Selecting characters from a larger character set should increase security; if you use a rare character like ♥ or ಠ in your password that a brute force isn't trying to check then you may have dramatically increased security.

However, depending on the implementation of the application receiving your password, this can also vastly lower security. Passwords typed by users are untrusted user input and are a potential vector for attack (e.g., SQL injection and unicode encodings can be exploited for these types of attacks ). Also some hashing functions may expect simple ascii input.

Thus an applications may make user input go through a strict sanitization procedure before being compared/stored in the database; which may off the bat just remove non-ascii characters (or do a simple translation like ȃ to a). This could make a rare password like Pȃssw0rd become Password which would be trivial to brute force. Last year, a friend told me he received his password back from one of the big four cell phone companies (that apparently stored in plaintext) and found out all the special characters in his password were silently removed.

Also it may be significantly more difficult to use that password in different computer environments like mobile devices/changing OSes/different encodings (UTF-7, UTF-8, UTF-16, ISO-8859-1, etc).

In general, its better to use to increase the length of your password than it is to increase the size of your charsets. Let's say you choose characters at random from a 100 character set and choose 8 characters at random. The password space is 1008 = 1016. Let's say instead you frequently use two charsets (and the attacker knows this -- e.g., is aware you speak/type arabic frequently and likely will mix-in arabic unicode symbols in your password) and choose characters at random from this 200 character set. Now a random 8 character password from a 200-character set has a password space of 2008 = 2.56 x 1018 which is 2^8=256 times harder to brute force than the 8-character password from a 100 character set. Or in other terms a 10 character password from 100 charset is stronger than a 8 character password from a 200 character set; and since the unicode characters are often more difficult to type/remember it may be easier to just increase password length. (Granted if an attacker doesn't ever look at non-unicode characters they'll never be able to crack a password of just ♥, but I wouldn't trust anything valuable on the attacker not thinking of my password generation scheme -- I want to utilize randomness to prove my advantage. Especially since some services may not hash your passwords and if your scheme gained any popularity (or someone say you using it once), attackers will start exploring this space.)

If the length of the password is not reduced, then I'd say it will be harder to crack, though I wouldn't call it "super safe". Take for instance a 16 character ASCII password: there are (2^7)^16 = 2^112 possibilities for such a password (also called 112 bits of entropy, assuming all 128 chars are equally probable in a "random" password). A 7 character Unicode password encoded in UTF-16 for example (cover the whole set) would have about (2^16)^7 = 2^112 possibilities - the same strength as before.

For that reason, just increasing the length of the password will offer the same protection against brute force attacks. With the advantage that the password will be easier to type, and won't cause problems if entered in systems with different default encodings and input methods.

I'd like to add, though, that for that math to hold the passwords must be truly random. There are other types of attacks (like dictionary attacks, rainbow tables etc) that take advantage of different probabilities in the distribution of characters in a password. Since, by your question, you seem to be generating those passwords, I think you can do fine without using different charsets.

Some of the ASCII character set is not able to be typed on a keyboard. This doesn't significantly affect your answer. The upper and lower case letters, numbers and common symbols come to roughly 100 possibilities.
–
LadadadadaMar 30 '12 at 8:18

That's right, and a similar observation can be done to Unicode: not every block/plane is assigned, surrogate pairs take 32 bytes instead of 16 (but they must come in the right order) etc. In this case, the discrepancy is even bigger (and I can't even tell, from the top of my head, if for better or worse). But I hope these simplified numbers are good enough to illustrate my point.
–
mgibsonbrMar 30 '12 at 9:24

While creating a longer password is generally the best solution, mixing characters from different languages is quite effective. Performing a brute-force attack just using upper and lower-case English letters, number, and punctuation symbols already introduces a large number of possibilities (assuming you have a password of a decent length) and introducing an entire new language will make the possible keyspace grow astronomically.

Having said that, if you are entering these characters using an English keyboard, you are probably going to have to hold down ALT and type a 3-4 number sequence. Mathematically, for the number of keystrokes you are entering you might as well just add 4-5 characters to the length of your password.

Furthermore, very few systems actually allow you to enter anything beyond standard letters and numbers. And if there are encoding mismatches in the process, you character may end up being normalized to a standard ASCII character anyway. This was the case for a long time with the Windows GINA login (although this created an interesting situation where you could programatically create a password for an account that could never be used to login via the console).

So the answer is yes it is a great technique, but for the number of keystrokes and to avoid compatibility issues, you might as well just make your password longer.

As a rule, brute-force attacks follow some sort of dictionary. If the password isn't in the dictionary they use, then it won't be guessed. If someone creates a dictionary that does include your password, then it will be guessed. That's pretty much all there is to it.

Distributed GPU clusters cracking trillions of passwords a second(and that's just what you can casually buy from amazon or, build yourself)

Getting into super-computers, super-FPGA's, and Quantum Qubit computing its just gets scarier, and more bleek.

2-factor seems to be the only way to go if your really serious about your data assurance.
(even then its still security theater) Who knows what "logically" broken, and what was broken from the start on purpose.