Question

I am running a utility class as a java application. The class read a csv file with 5 million records and tries to save about 125k records in database. Half way through i got heap space error. Full file takes about 5-6 hours to run. Does adding thread.sleep method help with cleaning up resources considering this is run as a java application? I am using spring data jpa to insert after every 1k rows.

    String strLine;
    List<Provider> providers = new ArrayList<Provider>();

    int count = 0;
    while ((strLine = br.readLine()) != null) {
      String[] providerDetails = strLine.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
      if (providerDetails[31].substring(1, (providerDetails[31].length() - 1)).equals("MD")
                || providerDetails[31].substring(1, (providerDetails[31].length() - 1)).equals("DC")) {
        count++;

            // add provider to repository
        providers.add(convertToProvider(providerDetails));

        if (count % 1000 == 0) {
          providerRepository.save(providers);
          providers.clear();
          Thread.sleep(2000);
        }
      }
    }

Are there any other optimizations i can do to fix memory issue. I am using eclipse and have given it plenty of memory

-Xms128m
-Xmx1536m
-XX:MaxPermSize=768m
-XX:-UseGCOverheadLimit
Was it helpful?

Solution

I suspect the biggest problem is down to the way you are inserting data into the database with Hibernate.

When you either call EntityManager.persist() or EntityManager.merge(), the entity you are working with is added to the PersistenceContext of your EntityManager instance (it is worth getting your head around entity lifecycles as described here.)

You can think of the PersistenceContext as a kind of cache that Hibernate works with to avoid unnecessary trips to the database for objects that it has already loaded within the current unit of work. In addition Hibernate uses the PersistenceContext to perform dirty checking so that it understands which objects need to be flushed when the transaction commits.

This is fine with small number of objects. The problem comes when you're working with a very large number of objects, as Hibernate keeps a reference to each and every object in the PersistenceContext for the reasons explained above.

Therefore it is important that when you're doing large batch inserts, you carefully manage the size of the PersistenceContext, either be explicitly flushing and clearing it at certain intervals, or by using a stateless EntityManager for the bulk insertions.

Hibernate has a good explanation of how to work with "a lot" of entities in one go here. I suspect that following that advice will solve most of your memory problems.

OTHER TIPS

I will try to answer your specific question, which is about the effect of Thread.sleep() on memory issues - I am sure that others will school you on how to keep Hibernate's footprint under control.

I know of only one case where sleeping your application thread(s) can help avoid out of memory conditions, and that is when you are making heavy use of instances of classes that have the Object.finalize() method defined. Such instances live through two rounds of reachability tests and have to have the finalize method executed; the finalize methods of all instances are executed on a single thread as part of garbage collection. If you are creating finalizable garbage on multiple threads faster than the single finalizer thread can process, you will get an OutOfMemoryError even though you have lots of garbage available for collection. By slowing down your application threads by sleeping, you may give the finalizer thread a chance to catch up.

This is almost certainly NOT your problem in this case (you have other obvious reasons for running out of heap space) and the sleep gains you nothing.

Also you have to flush() EntityManager database and clear() it periodically.

This is most probably the reason that you run out of memory.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top