Managing Large Objects in secondary database

Carlos Conti

Ranch Hand

Posts: 131

posted 2 years ago

Hi all,

I have my database app where I store hundreds and thousands of document scans and pictures. The database grew up to 3.3Gb so I decided to move off the "image data" (which supposes 99,99% of the database size) to a secondary database, which will be available to the application for the document scans and images to be shown.

In order to do that I created aswell a secondary persistence unit where I include the entities/tables that are to hold the image data in form of byte[] (MEDIUMBLOB fields in the database).

The application works in a transactional model, which means that changes are persisted once a process is marked as finished. So whenever I am in a process with image handling, my strategy consists in storing all the attachment info (metadata) in the primary persistence unit, while storing the byte[] data (image itself) in the secondary. When a given process finishes, it merges all the data from the primary persistence unit, and at certain code points I attach callback methods (@PostPersist, @PostUpdate) to handle the storage of the image data in the secondary database.

This is not the best solution possible, since I can't control both transactions (primary and secondary) simultaneously. I must finish the first one, and once it finishes the callback methods are called and trigger the secondary.

Another pitfall I have with this approach is that I store the byte[] data in a @Transient member of the primary PU entity, for the callback method to make use of it and store it in the secondary. But the @Transient fields are not sustained at merged entities when they are new entities (yes, once they already exist, very curious behavior in Hibernate). So in practice what I am forced to do is to create the entity, store it, open it again, attach an image, and persist the entity again, being forced to do it in a two step process, which becomes very tedious.

AS a temporary solution I plan to use non transient fields as transients for the data to be available at the callback methods: make them persistent fields, for the data to survive the merge and then empty the field, for I will be abl eto take the data to the secondary PU, not being necessary anymore in the primary and releasing database memory in the primary, which is the utmost goal of this strategy.