Description:
The calculation of MD5 and SHA1 hash values using the built-in MySQL functions does not seem to be as efficient as it could be.
There seem to be two factors that determine the performance of the hash generation:
- computation of the actual hash value (binary value)
- conversion of the binary value into a string field
The run time of the hash computation depends on the length of the input string whereas the overhead of the binary-to-string conversion can be considered as a fixed constant as it will always operate on hash values of 16 (MD5) or 20 (SHA1) bytes length.
The impact of the binary-to-string conversion will become more visible with shorter input strings than with long input strings. For short input strings it seems that more time is spent in the binary-to-string conversion than in the actual hash computation part.
The reason seems to be that the code that transforms the calculated binary hash value back into a string field result is very generic and therefore slower than necessary.
To convert the result into the string field, the code in sql/item_strfunc.cc currently does the following (for MD5, the implementation for SHA1 is similar):
sprintf((char *) str->ptr(),
"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
digest[0], digest[1], digest[2], digest[3],
digest[4], digest[5], digest[6], digest[7],
digest[8], digest[9], digest[10], digest[11],
digest[12], digest[13], digest[14], digest[15]);
str->length((uint) 32);
return str;
The conversion can probably be sped up by not using the very generic sprintf but a more specialized algorithm.
I have done some tests on Linux 2.6.26-1-amd64 with the following replacement:
static char hexmap[]= { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
int i;
char *p;
p=(char *) str->ptr();
for (i=0;i<=15;i++)
{
(*p++)=hexmap[digest[i] >> 4];
(*p++)=hexmap[digest[i] & 15];
}
str->length((uint) 32);
return str;
I am attached this patch to this bug report.
The savings that can be achieved by moving from sprintf to something more specialized may vary from compiler to compiler and from compile option to option. I can't really tell what works best in all environments.
However, the fixed overhead of the binary-to-string conversion should be minimized as much as possible as at least on some systems massive speedups can be achieved by this.
Example:
Original MySQL 5.1.38:
mysql> select count(*) from (select md5(firstname) from users) sub limit 1\G
*************************** 1. row ***************************
count(*): 840245
1 row in set (3.13 sec)
Patched MySQL 5.1.38:
mysql> select count(*) from (select md5(firstname) from users) sub limit 1\G
*************************** 1. row ***************************
count(*): 840245
1 row in set (1.36 sec)
Original MySQL 5.1.38:
mysql> select count(*) from (select sha1(firstname) from users) sub limit 1\G
*************************** 1. row ***************************
count(*): 840245
1 row in set (3.74 sec)
Patched MySQL 5.1.38:
mysql> select count(*) from (select sha1(firstname) from users) sub limit 1\G
*************************** 1. row ***************************
count(*): 840245
1 row in set (1.70 sec)
How to repeat:
Just generate a lot of MD5 or SHA1 hashes from some of the system table values and note the query times:
MD5:
mysql> select * from (select md5(t.name),md5(t.description),md5(t.example),md5(t.url) from mysql.help_topic t,mysql.help_keyword k) sub limit 1\G
*************************** 1. row ***************************
md5(t.name): ce31e2a082d17e038fcc6e3006166653
md5(t.description): bdcda0595dc1c938038b13d51df0a38d
md5(t.example): 3279bf9c13baf8b749e29aa71cb2ccd2
md5(t.url): ccb705dbce9c9827d93fb0d4b6c0b378
1 row in set (2.99 sec)
SHA1:
mysql> select * from (select sha1(t.name),sha1(t.description),sha1(t.example),sha1(t.url) from mysql.help_topic t,mysql.help_keyword k) sub limit 1\G
*************************** 1. row ***************************
sha1(t.name): 04e66352aa8f9c4c5f26b71bf380973ada994760
sha1(t.description): c040e24b56fe6023617ab884d8715da7efb24de3
sha1(t.example): a52b8b0b11e1d1d82839d0de1805192279ca0b1b
sha1(t.url): ef7507b561798b9b877232fcb64719cee9eef057
1 row in set (4.84 sec)
Then apply the attached patch and run the same queries again:
mysql> select * from (select md5(t.name),md5(t.description),md5(t.example),md5(t.url) from mysql.help_topic t,mysql.help_keyword k) sub limit 1\G
*************************** 1. row ***************************
md5(t.name): ce31e2a082d17e038fcc6e3006166653
md5(t.description): bdcda0595dc1c938038b13d51df0a38d
md5(t.example): 3279bf9c13baf8b749e29aa71cb2ccd2
md5(t.url): ccb705dbce9c9827d93fb0d4b6c0b378
1 row in set (1.27 sec)
mysql> select * from (select sha1(t.name),sha1(t.description),sha1(t.example),sha1(t.url) from mysql.help_topic t,mysql.help_keyword k) sub limit 1\G
*************************** 1. row ***************************
sha1(t.name): 04e66352aa8f9c4c5f26b71bf380973ada994760
sha1(t.description): c040e24b56fe6023617ab884d8715da7efb24de3
sha1(t.example): a52b8b0b11e1d1d82839d0de1805192279ca0b1b
sha1(t.url): ef7507b561798b9b877232fcb64719cee9eef057
1 row in set (2.77 sec)
The query times for the patched version are a lot better than for the original version.
Suggested fix:
Will attach a patch for MySQL 5.1.38 to this bug report with a suggestion how to modify the binary-to-string conversion in 5.1.38.

Hi Jan, thank you very much for your contribution! We appreciate it.
In order for us to be able to accept and incorporate your contribution, I have to ask you for one additional favor, which you only have to perform once for all future contributions (including any other Sun-governed Open Source project).
Can you please fill out and send us a copy of the Sun Contributor Agreement (SCA), as outlined on the following Wiki page?
http://forge.mysql.com/wiki/Contributing_Code
Thanks in advance! Please let us know, if you have any questions or concerns about this.

[10 Dec 2009 18:09]
Jan Steemann

I have filled out the SCA and sent it to Sun via fax on Dec 8, 2009.
Now waiting to be listed in the SCA signatories list.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
http://lists.mysql.com/commits/98208
3334 Davi Arnaut 2010-01-26
Bug#49491: Much overhead for MD5() and SHA1() on short strings
MySQL's hash functions MD5 and SHA relied on the somewhat slow
sprintf function to convert the digests to hex representations.
This patch replaces the sprintf with a specific and inline hex
conversion function.
Patch contributed by Jan Steemann.
@ sql/item_strfunc.cc
Add a hex conversion function.

[26 Jan 2010 17:05]
Bugs System

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:
http://lists.mysql.com/commits/98225
3334 Davi Arnaut 2010-01-26
Bug#49491: Much overhead for MD5() and SHA1() on short strings
MySQL's hash functions MD5 and SHA relied on the somewhat slow
sprintf function to convert the digests to hex representations.
This patch replaces the sprintf with a specific and inline hex
conversion function.
Patch contributed by Jan Steemann.
@ sql/item_strfunc.cc
Add a hex conversion function.