Suppose that you have a table with a BLOB column, and you want to know if there are duplicate BLOBS in that table. A GROUP BY query with COUNT doesn’t work on BLOB columns… One way would be to write a PL/SQL or Perl program that extracts all BLOBS and compares the values between all rows. So, is there no simple way to do this in SQL? Yes, there is! You can use the DBMS_CRYPTO package to hash the values of the BLOB column and then group the hashed values using an analytic function.

Now we need to grant our user execute rights on the DBMS_CRYPTO package.
Execute the following SQL with user SYS as SYSDBA:

grant execute on dbms_crypto to matthiash;

Now comes the difficult part: create a query so that the contents of the BLOBS are hashed using DBMS_CRYPTO, hashes that have the same value are grouped together, and the record ID’s are shown so that it’s clear which records have identical BLOB values. I managed to do this using window partitioning. Basically, we group the results into different windows, where each window contains the records which have the same hashed BLOB contents.

So, BLOBS 1, 4 and 5 are grouped, next we have BLOBS 2 and 6, and finally BLOB 3. Mission accomplished! :-) Mind you that this is a very small table, so the query runs very fast. Using DBMS_CRYPTO.HASH on tables with millions of rows or very large BLOBS might need some CPU power!