Hibernate Search: prevent from high memory usage during batch insert

https://stackoverflow.com/questions/23548292

18-07-2023
|

Question

While performing batch insert with automatic indexing Hibernate Search creates milions of org.apache.lucene.document.Field instances, which all stay in memory until transaction is finished.

As I didn't manage to fix that with any of HS options and I don't want to flushToIndexes uncommited transaction, I would like to pause automatic indexing before batch and then update index manually. For that I set the following options:

 hibernateProperties.put("hibernate.search.default.indexBase", "path/to/index");
 hibernateProperties.put("hibernate.search.model_mapping", searchMappingFactory.createSearchMapping());
 hibernateProperties.put("hibernate.search.autoregister_listeners", false);

and I write custom FullTextIndexEventListener with methods like:

@Override
public void onPostInsert(PostInsertEvent event) {
    if (!isPaused) {
         super.onPostDelete(event);
    }
}

I integrate that using my custom integrator:

@Component
public class HibernateEventIntegrator {  

    @Autowired
    private SessionFactoryImpl sessionFactory;
    @Autowired
    private SearchIndexEventListener searchIndexEventListener;  

    @PostConstruct
    public void integrate() {
         EventListenerRegistry listenerRegistry = sessionFactory.getServiceRegistry().getService(EventListenerRegistry.class);
         listenerRegistry.appendListeners(EventType.POST_INSERT, searchIndexEventListener);
         //... and so for all events like in HibernateSearchIntegrator
         searchIndexEventListener.initialize(sessionFactory.getProperties());
    }
}

However, in this case mapping is not read, as if no entity was indexed, they are only seen by native HS integrator.

I tried also using interceptor with skip action, but it does not seem to be the nice solution.

Is there any solution to pause automatic indexing programmatically without switching completely to manual indexing?

Solution 2

We solved that customizing TransactionalWorker, to make it flush and this way free the memory after given number of operations. That's what I would expect worker.batch_size to do, below the code:

public class TransactionalFlushingWorker extends TransactionalWorker {

private static final int INDEX_BATCH_SIZE = 2000;

private final AtomicInteger indexingWithoutFlushCounter = new AtomicInteger();

public void performWork(Work<?> work, TransactionContext transactionContext) {
    super.performWork(work, transactionContext);
    if (indexingWithoutFlushCounter.incrementAndGet() > INDEX_BATCH_SIZE) {
        flushWorks(transactionContext);
    }
}

public void flushWorks(TransactionContext transactionContext) {
    indexingWithoutFlushCounter.set(0);
    super.flushWorks(transactionContext);
}
}

registration:

hibernateProperties.put("hibernate.search.worker.scope", TransactionalFlushingWorker.class.getName());

OTHER TIPS

Hibernate Search does not provide such a functionality atm. See also https://hibernate.atlassian.net/browse/HSEARCH-168 and https://hibernate.atlassian.net/browse/HSEARCH-387.

One workaround is to use two separate SessionFactories, one with the event processing enabled and one with the event processing disabled. You will then open the Session from the right factory depending on the use case.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow