Question

I am trying to add about 21,000 entities already in the database into an nhibernate-search Lucene index. When done, the indexes are around 12 megabytes. I think the time can vary quite a bit, but it's always very slow. In my last run (running with the debugger), it took over 12 minutes to index the data.

private void IndexProducts(ISessionFactory sessionFactory)
{
  using (var hibernateSession = sessionFactory.GetCurrentSession())
  using (var luceneSession = Search.CreateFullTextSession(hibernateSession))
  {
    var tx = luceneSession.BeginTransaction();
    foreach (var prod in hibernateSession.Query<Product>())
    {
      luceneSession.Index(prod);
      hibernateSession.Evict(prod);
    }
    hibernateSession.Clear();
    tx.Commit();
  }
}

The vast majority of the time is spent in tx.Commit(). From what I've read of Hibernate search, this is to be expected. I've come across quite a few ways to help, such as MassIndexer, flushToIndexes, batch modes, etc. But as far as I can tell these are Java-only options.

The session clear and evict are just desperate moves by me - I haven't seen them make a difference one way or another.

Has anyone had success quickly indexing a large amount of existing data?

Was it helpful?

Solution

I've been able to speed up considerable indexing by using a combination of batching and transactions.

My initial code took ~30 minutes to index ~20.000 entities. Using the code bellow I've got it down to ~4 minutes.

    private void IndexEntities<TEntity>(IFullTextSession session) where TEntity : class
    {
        var currentIndex = 0;
        const int batchSize = 500;

        while (true)
        {
            var entities = session
                .CreateCriteria<TEntity>()
                .SetFirstResult(currentIndex)
                .SetMaxResults(batchSize)
                .List();

            using (var tx = session.BeginTransaction())
            {
                foreach (var entity in entities)
                {
                    session.Index(entity);
                }
                currentIndex += batchSize;

                session.Flush();
                tx.Commit();
                session.Clear();
            }

            if (entities.Count < batchSize)
                break;
        }
    }

OTHER TIPS

It depends on lucene options you can set. See this page and check if nhibernate-search has wrappers for these options. If it doesn't, modify its source.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top