I have some major issues with Hibernate Search 5.6.1.Final (Elasticsearch backend). I have a main entity that I index in Elasticsearch which has an @IndexedEmbedded ManyToOne association. If the embedded parent object has many (40K) child records, and it gets updated, I get this for the duration of 6 hours (!):- CPU jumps to 80%- the modification to the related object is blocked (eventually succeeds after 6h)- any other modifications to that object are also blocked and time out with

It is worth mentioning that the Case entity has a ClassBridge that:1. Searches for a Hibernate session in a ThreadLocal variable, if not there it opens it and sets it in the Threadlocal (tries to reuse a session across re-index jobs)2. Executes a query to retrieve aggregated data to be included in the document - making joins to other tables3. Enriches the Case document with additional fields

Because of the ClassBridge I believe dirty-checking is disabled for any modifications to a Case object or one of its embedded objects. Could it be that this bridge is simply too heavy which causes the slowness? I did not see excessive opening of database connections, so I believe sessions are being reused across indexing threads.

A thread dump during indexing reveals a single thread related to indexing:

Code:

"http-apr-443-exec-337" - Thread t@4804 java.lang.Thread.State: RUNNABLE at org.hibernate.engine.internal.Collections.processReachableCollection(Collections.java:164) at org.hibernate.event.internal.FlushVisitor.processCollection(FlushVisitor.java:42) at org.hibernate.event.internal.AbstractVisitor.processValue(AbstractVisitor.java:104) at org.hibernate.event.internal.AbstractVisitor.processValue(AbstractVisitor.java:65) at org.hibernate.event.internal.AbstractVisitor.processEntityPropertyValues(AbstractVisitor.java:59) at org.hibernate.event.internal.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:155) at org.hibernate.event.internal.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:216) at org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:85) at org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:44) at org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1264) at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1975) at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:322) at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:125) at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:966) at x.x.x.x.x.CaseFacade.getAggregateDocumentDataOfCase(CaseFacade.java:499) at x.x.x.x.x.ComputedCaseFieldsClassBridge.set(ComputedCaseFieldsClassBridge.java:76) at org.hibernate.search.bridge.util.impl.ContextualExceptionBridgeHelper$OneWayConversionContextImpl.set(ContextualExceptionBridgeHelper.java:134) at org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.buildDocumentFieldForClassBridges(DocumentBuilderIndexedEntity.java:890) at org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.buildDocumentFields(DocumentBuilderIndexedEntity.java:454) at org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.getDocument(DocumentBuilderIndexedEntity.java:391) at org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.createUpdateWork(DocumentBuilderIndexedEntity.java:301) at org.hibernate.search.engine.spi.DocumentBuilderIndexedEntity.addWorkToQueue(DocumentBuilderIndexedEntity.java:243) at org.hibernate.search.engine.impl.WorkPlan$PerEntityWork.enqueueLuceneWork(WorkPlan.java:520) at org.hibernate.search.engine.impl.WorkPlan$PerClassWork.enqueueLuceneWork(WorkPlan.java:282) at org.hibernate.search.engine.impl.WorkPlan.getPlannedLuceneWork(WorkPlan.java:154) at org.hibernate.search.backend.impl.WorkQueue.prepareWorkPlan(WorkQueue.java:114) at org.hibernate.search.backend.impl.BatchedQueueingProcessor.prepareWorks(BatchedQueueingProcessor.java:57) at org.hibernate.search.backend.impl.PostTransactionWorkQueueSynchronization.beforeCompletion(PostTransactionWorkQueueSynchronization.java:66) at org.hibernate.search.backend.impl.EventSourceTransactionContext$DelegateToSynchronizationOnBeforeTx.doBeforeTransactionCompletion(EventSourceTransactionContext.java:169) at org.hibernate.engine.spi.ActionQueue$BeforeTransactionCompletionProcessQueue.beforeTransactionCompletion(ActionQueue.java:928) at org.hibernate.engine.spi.ActionQueue.beforeTransactionCompletion(ActionQueue.java:503) at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:2353) at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:491) at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.beforeCompletionCallback(JdbcResourceLocalTransactionCoordinatorImpl.java:147) at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.access$100(JdbcResourceLocalTransactionCoordinatorImpl.java:38) at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:231) at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:65) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:218) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:442) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1082) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:623) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2517) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2506) - locked <6a2b09f4> (a org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745)

I want to know if it is normal for a long-lasting batch indexing to block modification to records like that. Can I make it asynchronous and if yes - what are the implications? Modification to any record is mission-critical and I do not want Hibernate Search to interfere with it.

Hibernate Search is designed to never need any lock on your database, if you're having locking issues I would suggest to make sure the Session you're using in your custom ClassBridge doesn't acquire any:

Of course this is only safe if you're only reading additional data. Please make sure you're not causing the actual managed entities which are being reindexed to become dirty: the indexing process is happening during the flush and commit phase of Hibernate ORM so making changes to your entities there would cause impredictable side-effects.

Quote:

I did not find a way to increase the number of batch indexing worker threads - is there a way?

There is a way to scale up the number of workers when running the MassIndexer.

The normal indexer can not be scaled as it lives - and reuses - the same Session of Hibernate ORM which you're using and the operations need to happen within the same transaction. Speaking of which, I would recommend against using an additional Session in your ClassBridge ? Your entities are managed so you can lazy load additional attributes which you might need. If the queries you perform aren't easily mapped I would suggest you rather prepare these before committing the transaction - not in a ClassBridge. You can for example index some @Transient properties to add additional stuff which you want your custom bridges to index.

I tried setting FlushMode to MANUAL and ReadOnly to true. Since I am still using Hibernate ORM 5.1.3, I am still calling the now deprecated method setFlushMode() instead of setHibernateFlushMode(). I don't update any record in the ClassBridge so that's fine. However, the locking still persists - while the parent entity is being updated, other threads that attempt to update it time out.

My isolation level is REPEATABLE_READ. I set it to READ_UNCOMMITTED to test but the locking still occurs. I tested to see when the lock is acquired and I see that before Transaction.commit() is called nothing is locked, and in the ClassBridge the record is locked (the thread 'blocks' other threads). If indexing takes too long for whatever reason, this will cause timeouts... In any case, I don't want to change my isolation level.

Quote:

I would suggest you rather prepare these before committing the transaction - not in a ClassBridge

I don't see how I can do that without huge memory impact, because the data added in the class bridge is specific to each Case object. If I have 40K objects, the best I can do is execute some HQL / native SQL to get the extra data per Case object. And this data can be massive if I have thousands of records. If I understand correctly, then I would need to set the value of the Transient indexed field for each case in the lazy @ContainedIn collection, but then I would load all these objects into memory too, right? I have a custom mass indexer in which I do a query to get ALL data to build documents in a batch of 500, but I still don't dare to go beyond 500 documents per batch, let alone 40K.

I really thought the indexing happens in a thread different from the main thread that is issuing the transaction commit... This is what "hibernate.search.default.worker.execution": "async" should make sure of, unless I understood wrong?

I suspect you're on the right path by looking at your isolation levels. Consider that many databases actually implement a stricter locking pattern than what the isolation level really requires, so it's possible that REPEATABLE_READ is causing the locking; some DBs might have some tuning options to resolve this without violating the isolation level your application requires.

Quote:

I really thought the indexing happens in a thread different from the main thread that is issuing the transaction commit... This is what "hibernate.search.default.worker.execution": "async" should make sure of, unless I understood wrong?

The work which happens in the "async" thread is the aspect of writing the Lucene Document into the index and running the Analyzers. Loading the related data has to happen in the same thread of the Hibernate user to benefit of loading it all from the same, consistent transactional read. Incidentally in more typical use cases that would also benefit from the L1 cache (the Session cache).

Your case is a bit special. I wish we could do better but a Lucene index is not a good place for strongly relational data. Ideas welcome ;)

Quote:

I don't see how I can do that without huge memory impact, because the data added in the class bridge is specific to each Case object. If I have 40K objects, the best I can do is execute some HQL / native SQL to get the extra data per Case object. And this data can be massive if I have thousands of records. If I understand correctly, then I would need to set the value of the Transient indexed field for each case in the lazy @ContainedIn collection, but then I would load all these objects into memory too, right? I have a custom mass indexer in which I do a query to get ALL data to build documents in a batch of 500, but I still don't dare to go beyond 500 documents per batch, let alone 40K.

Is it not going to eventually write all 40K contained objects into the same Lucene Document? That implies you'll have it all on the heap.

N.B. I'm not suggesting that you fully map the additional tables; but you could load the minimal data you need (e.g. limit the columns you're targeting to the bare minimal and try to select the least amount of rows with some strongly selective query), map that to a simple object or just a resultset, and feed that whole thing into a single @Transient field.

Please do tell me more about this custom MassIndexer; it's interesting and I'm totally open to discuss substantial changes in our architecture. Also, be aware that you can roll your own variation of the MassIndexer? See the source code of org.hibernate.search.impl.FullTextSessionImpl.createMassIndexerFactory() for details of how the Service is being loaded. You can of course extend / reuse any of our code but I would appreciate seeing how we could make it more flexible, or feel free to send a pull request!

I wish we could do better but a Lucene index is not a good place for strongly relational data. Ideas welcome ;)

Here's an idea: give the user the ability to build the a document or a document set himself in JSON format (or anything that eventually can be converted to JSON consumable by ES). This would give flexibility to build documents from highly-relational data and can be used to completely avoid the necessity to walk graphs. - a single object can be represented by a single JSON object created by the user and (maybe) validated against the defined Hibernate Search mapping- multiple objects (like in my case - a list of related objects in a OneToMany field) can be represented as a JSONArray of objects, that are then indexed in batches by your internal batch indexer

This way the user could also generate computed fields and put them in a document. In my ClassBridge for example I get a list of associated objects (in a OneToMany field), then concatenate the values of a specific field inside, using group_concat() MYSQL function. I get all computed data this way, in a single efficient query.I hope it's possible to make such an extension point :)

Quote:

Is it not going to eventually write all 40K contained objects into the same Lucene Document?

I am not indexing the parent object - I am changing the parent object which triggers a reindex of all of its 40K children. Because the parent is ContainedIn the child entity objects.I have a User (parent) who is associated with many Cases. When I change a user object, all of its cases (potentially many thousands) are reindexed - which blocks any changes to the user until indexing finishes.

I am confused where to put the @Transient field in this case. I guess I should put it in the parent, then access it in the ClassBridge when indexing each child. The transient field should contain all data for all children then - which can be megabytes of textual data (up to 2MB per object).

About the mass indexer - it is not based on Hibernate's implementation, I made it from scratch. It traverses all records of an entity in batches with a cursor. It uses either synchronous or asynchronous indexing via the Elasticsearch bulkIndex REST API. Data loading is always executed in the main thread, so indexing speed is at most as high as the speed of data loading right now. Most important thing is that data for an entire batch is loaded via a single native SQL query that fetches all data required to build the documents in a batch (including computed or aggregated data) then this data is transformed to a batch of JSON documents. Once the data is prepared, it is fed into a IndexJobManager which launches a new indexing thread to index the batch. The job manager listens for job status updates and limits the number of threads launched, throttling requests to ES if ES cannot keep up.

I will try to submit some code when i have some time, but it is not code I can openly expose without modification.

In the end I solved the issue by relying on a non-blocking indexing of the child records that I trigger manually after the parent changes. So I removed the Hibernate @IndexedEmbedded and @ContainedIn in order to stop auto-reindexing of the children. After an update to the parent happens, after the transaction commits, I manually schedule a new thread to reindex the children in batches using the custom mass indexer. No update gets blocked now. The downside is that when a child gets indexed by Hibernate I need to now fetch the parent data in the ClassBridge (I know it is a bad practice to do queries in a class bridge but I cannot find a feasible solution yet).