5 Answers
5

We used InnoDB storage to store one million images and conducted stress test, we had proper results. Either file was retrieved correctly or it was not at all retrieved (less than 0.01%), since InnoDB is acid compliant.

When we shifted to MyISAM, failure rate increased to 20% with lossy data as well same as your case. And reason was, MyISAM uses table lock, so while write is in progress entire table is locked and in event of timeout, it does overwrite something leading to data loss.

We have now shifted everything to MS SQL, since InnoDB performs well but still it never reuses deleted file space, so InnoDB endlessly keeps on growing. MS SQL express has limit of 10gb, so we created pages of 4-8gb and we store blobs there. And we have our own custom replication to replicate files over three servers across network with same config.

Storing as files on disk is bad for many reasons, everyone keeps on saying file systems are designed for high performance and can store million files, this is not true, drives fails to perform faster when you have more than 100 thousand files. They perform well with one big file then 1000 smaller files. Currently we are storing 10 million files and storing it in db makes more sense because db does optimization over query and does good caching. You can read more at http://akashkava.com/blog/127/huge-file-storage-in-database-instead-of-file-system/

i got the point, previously we use InnoDB for this system, but we did not face this problem (except for 1 picture, we assume it as a exception). but now in MyISAM its really a big trouble maker. storing as files on disk you are totally right.
–
Nikson Kanti PaulJun 24 '12 at 4:42

First of all, as Sarke mentioned, storing files content in DB isn't the best idea (file meta data is a whole different story.

Why?

Performance: in majority of cases OS file cache will outperform anything built into DBMS.

Disaster recovery: odds of loosing all/most files on failure are way higher than with file system and recovery is far more difficult

Scaling: if you outgrow capacity of a single server adding application level sharding is trivial and with no performance penalty. Multiserver DB setups are more "painful"

Multiple solutions available/ease of migration: There are plenty hardware and software solutions for large file collections storage and migrating between them is far more simple than migrating between DBMS

I store close to 2 million images that are stored in a simple folder structure: /xx/yy/filename, where filename = md5 of file (+ optional number should a hash collision happen), xx = first 2 characters of md5, yy = 3rd and 4th character of md5. It works great and I shouldn't get any FS related slowdowns for a long while (2 orders of magnitude at least).

Getting back to your question there are 3 options

The files are never saved correctly to the DB. It might be problem in the app that is uploading the photos or the image is too big. Your max_allowed_packet restricts image size to ~8 MB, mediub_blob can store maximum of 16 MB. To rule this one out increase max_allowed_packet to 32 MB and test. You'll need to make sure no image exceeds this size at any point and make sure the app does it's job right when uploading photos. If you can find an image that was uploaded and displayed fine (from DB!) and later it didn't then this is not the cause.

The files get damaged during update - if anything updates the photos in any way then even if original file is fine the updated one might not be - it might for example exceed the size limits from point 1.

(the least likely one) If the file is stored and updated without damaging it then it is getting damaged while being stored -> with no reported MySQL bugs on this (and this wouldn't go unnoticed) I'd look on server hardware.

I think you first need to figure out whether it is your application or some external process (backup/restore?) that changes this data.
Actually, I see little reason why your application would need to update this picture (i.e. update the field with the same data), if the file is supposed to remain unchanged.

Once you find which parts of your application updates this field, you may want to post some code to see if there is no conversion, escaping, or whatever else, taking place.

If, as I assume, such update should never happen, setting a BEFORE UPDATE trigger on the table will allow you to know exactly when the issue happens, and may help in identifying a possible pattern. Compare the OLD and NEW values and record as much relevant available data as you can in a log table - beware, comparing large BLOBs can be a performance killer, watch your performances closely.

My company elects to store images outside of the database. We noticed that Blobs, like the one you are using, are prone to corruption and performance issues. We saw the same problems in MSSQL, Sybase, and Faircom.

Whenever the application needs access to an image, it needs access to a network (or web based) storage where it can find that image. Then the data only stores the path to the image.

Since the image is a flat file somewhere in the file system, if the record needs to be updated (i.e. a note added to describe the image), the image it's self isn't re-compiled into the blob and there is no chance of it getting corrupted.