More than vanity

Searching for a vanity hash is really just an attempt to find hashes with certain content. Other efforts seek hashes with specific content for other reasons - hash collisions, partial collisions, etc. More generally, these are searches for hash substrings.

Why hash substring search is interesting

Finding substrings in hashes isn't just about vanity. Bringing hash substring search into the cracking platforms would enable other interesting work:

Experimenting with collisions and partial collisions. This is a potential attack surface that is not adequately publicly explored. Nation-states are almost certainly capable of locating near-collisions for some hashes. When you check a download hash, how often is that check only a cursory visual check? True collisions for slow hashes are nigh-infeasible, of course -- but near-collisions may be interesting.

Filtering for hashes with broad properties (instead of just substrings). Hashes that contain only digits, or only letters, or letters and digits alternating, or only lower case (for hashes that distinguish case). Perhaps only a curiosity today, but they might be interesting in the future.

Working with non-contiguous substrings. Most vanity-hashing tools won't let you search for a hash that begins with "[aA][bB][lL][eE]" and ends with "[eE][lL][bB][aA]". But password crackers' mask syntax is perfect for this.

Unforeseeable innovation. Once this capability is available across all hash types, people will tinker with it, revealing new use cases. I've seen some CTFs that involve searching for specific substrings in hashes. I think they're on to something.

Calling them "vanity hashes" doesn't do justice to the potential. I think that the activity of hash substring search -- partial collisions, vanity hashes, etc. -- needs a better name.

Let's call it hash filtering.

Why the password crackers are the best place to do hash filtering

It's efficient. No one is better positioned to implement hash filtering than the password crackers -- and doing it anywhere else is a waste of resources. Each standalone vanity-search implementation reinvents the hash bruteforcing wheel - usually poorly: This rarely approaches the speeds that JtR and hashcat are capable of (though there are exceptions). The Bitcoin folks have an edge today because of FPGAs, but I expect this gap to close.

It's high leverage. If implemented as a general framework within password crackers, all current and future hash types automatically inherit substring search capability, with little additional effort.

How the password crackers might respond

Sound great! That would be awesome, but not likely to be the first reaction. So ...

OK -- but it should be done as a separate executable. This is a waste of time. The best features of professional-grade password crackers -- core brute-force power, Markov, forking/parallelism, session management, masks, device selection, wordlist management, etc. -- would either be unnecessarily duplicated, left out, or suffer from bit rot.

No. It will slow down cracking too much. Not necessarily. Hash filtering should be implemented as an early pre-optimization step, so that you only have to take the inevitable speed hit if you want hash filtering. Regular cracking would be unaffected.

No. We're not interested in vanity hashing as a concept. That would be a mistake. Vanity hashing is already driving interest, contribution -- and even innovation (like entirely new S-boxes).

No. Hash filtering itself will be too slow. By its nature, it will, indeed, be much slower than password cracking. But if it can be pre-optimized (see previous point), then it can take advantage of the rest of the cracking support structure, so that hash filtering will be faster, more portable, and easier to use than it would be anywhere else.

Implementation suggestions

Allow the user to specify a "hash filter", using existing mask syntax. This would automatically enable specifying one or more desired substrings anywhere in the hash -- beginning, end, or middle -- as well as both non-contiguous strings and loose matching by character set.

On the command line, consider long options with names like --hash-filter [mask] and --edit-distance [integer]. Or name it something else -- but keep it consistent across projects.

Implement in the pre-optimization pass, so that the slowdown introduced by examining each hash will only kick in when hash filtering is explicitly requested.

Add simple sanity checking. If the user supplies a mask that is longer than the target hash, warn rather than silently truncating. Also, warn if the user wants to filter for a mask that isn't possible (for example, if their descrypt filter's last character is not within [.26AEIMQUYcgkosw]).

Potential bounty?

If this proposal is not obviously compelling, I will consider setting up a bounty (or charitable donation of the winner's choosing). The bounty would go to each major natively Linux-based project (John the Ripper or hashcat) that incorporates hash filtering. Edit distance would be optional.

More than vanity

Searching for a vanity hash is really just an attempt to find hashes with certain content. Other efforts seek hashes with specific content for other reasons - hash collisions, partial collisions, etc. More generally, these are searches for hash substrings.

Why hash substring search is interesting

Finding substrings in hashes isn't just about vanity. Bringing hash substring search into the cracking platforms would enable other interesting work:

Experimenting with collisions and partial collisions. This is a potential attack surface that is not adequately publicly explored. Nation-states are almost certainly capable of locating near-collisions for some hashes. When you check a download hash, how often is that check only a cursory visual check? True collisions for slow hashes are nigh-infeasible, of course -- but near-collisions may be interesting.

Filtering for hashes with broad properties (instead of just substrings). Hashes that contain only digits, or only letters, or letters and digits alternating, or only lower case (for hashes that distinguish case). Perhaps only a curiosity today, but they might be interesting in the future.

Working with non-contiguous substrings. Most vanity-hashing tools won't let you search for a hash that begins with "[aA][bB][lL][eE]" and ends with "[eE][lL][bB][aA]". But password crackers' mask syntax is perfect for this.

Unforeseeable innovation. Once this capability is available across all hash types, people will tinker with it, revealing new use cases. I've seen some CTFs that involve searching for specific substrings in hashes. I think they're on to something.

Calling them "vanity hashes" doesn't do justice to the potential. I think that the activity of hash substring search -- partial collisions, vanity hashes, etc. -- needs a better name.

Let's call it hash filtering.

Why the password crackers are the best place to do hash filtering

It's efficient. No one is better positioned to implement hash filtering than the password crackers -- and doing it anywhere else is a waste of resources. Each standalone vanity-search implementation reinvents the hash bruteforcing wheel - usually poorly: This rarely approaches the speeds that JtR and hashcat are capable of (though there are exceptions). The Bitcoin folks have an edge today because of FPGAs, but I expect this gap to close.

It's high leverage. If implemented as a general framework within password crackers, all current and future hash types automatically inherit substring search capability, with little additional effort.

How the password crackers might respond

Sound great! That would be awesome, but not likely to be the first reaction. So ...

OK -- but it should be done as a separate executable. This is a waste of time. The best features of professional-grade password crackers -- core brute-force power, Markov, forking/parallelism, session management, masks, device selection, wordlist management, etc. -- would either be unnecessarily duplicated, left out, or suffer from bit rot.

No. It will slow down cracking too much. Not necessarily. Hash filtering should be implemented as an early pre-optimization step, so that you only have to take the inevitable speed hit if you want hash filtering. Regular cracking would be unaffected.

No. We're not interested in vanity hashing as a concept. That would be a mistake. Vanity hashing is already driving interest, contribution -- and even innovation (like entirely new S-boxes).

No. Hash filtering itself will be too slow. By its nature, it will, indeed, be much slower than password cracking. But if it can be pre-optimized (see previous point), then it can take advantage of the rest of the cracking support structure, so that hash filtering will be faster, more portable, and easier to use than it would be anywhere else.

Implementation suggestions

Allow the user to specify a "hash filter", using existing mask syntax. This would automatically enable specifying one or more desired substrings anywhere in the hash -- beginning, end, or middle -- as well as both non-contiguous strings and loose matching by character set.

On the command line, consider long options with names like --hash-filter [mask] and --edit-distance [integer]. Or name it something else -- but keep it consistent across projects.

Implement in the pre-optimization pass, so that the slowdown introduced by examining each hash will only kick in when hash filtering is explicitly requested.

Add simple sanity checking. If the user supplies a mask that is longer than the target hash, warn rather than silently truncating. Also, warn if the user wants to filter for a mask that isn't possible (for example, if their descrypt filter's last character is not within [.26AEIMQUYcgkosw]).

Potential bounty?

If this proposal is not obviously compelling, I will consider setting up a bounty (or charitable donation of the winner's choosing). The bounty would go to each major natively Linux-based project (John the Ripper or hashcat) that incorporates hash filtering. Edit distance would be optional.