Question

How do you optimize the heap size usage of an application that has a lot (millions) of long-lived objects? (big cache, loading lots of records from a db)

  • Use the right data type
    • Avoid java.lang.String to represent other data types
  • Avoid duplicated objects
    • Use enums if the values are known in advance
    • Use object pools
    • String.intern() (good idea?)
  • Load/keep only the objects you need

I am looking for general programming or Java specific answers. No funky compiler switch.

Edit:

Optimize the memory representation of a POJO that can appear millions of times in the heap.

Use cases

  • Load a huge csv file in memory (converted into POJOs)
  • Use hibernate to retrieve million of records from a database

Resume of answers:

  • Use flyweight pattern
  • Copy on write
  • Instead of loading 10M objects with 3 properties, is it more efficient to have 3 arrays (or other data structure) of size 10M? (Could be a pain to manipulate data but if you are really short on memory...)
Was it helpful?

Solution

You don't say what sort of objects you're looking to store, so it's a little difficult to offer detailed advice. However some (not exclusive) approaches, in no particular order, are:

  • Use a flyweight pattern wherever possible.
  • Caching to disc. There are numerous cache solutions for Java.
  • There is some debate as to whether String.intern is a good idea. See here for a question re. String.intern(), and the amount of debate around its suitability.
  • Make use of soft or weak references to store data that you can recreate/reload on demand. See here for how to use soft references with caching techniques.

Knowing more about the internals and lifetime of the objects you're storing would result in a more detailed answer.

OTHER TIPS

I suggest you use a memory profiler, see where the memory is being consumed and optimise that. Without quantitative information you could end up changing thing which either have no effect or actually make things worse.

You could look at changing the representation of your data, esp if your objects are small. For example, you could represent a table of data as a series of columns with object arrays for each column, rather than one object per row. This can save a significant amount of overhead for each object if you don't need to represent an individual row. e.g. a table with 12 columns and 10,000,000 rows could use 12 objects (one per column) rather than 10 million (one per row)

Ensure good normalization of your object model, don't duplicate values.

Ahem, and, if it's only millions of objects I think I'd just go for a decent 64 bit VM and lots of ram ;)

Normal "profilers" won't help you much, because you need an overview of all your "live" objects. You need heap dump analyzer. I recommend the Eclipse Memory analyzer.

Check for duplicated objects, starting with Strings. Check whether you can apply patterns like flightweight, copyonwrite, lazy initialization (google will be your friend).

Take a look at this presentation linked from here. It lays out the memory use of common java object and primitives and helps you understand where all the extra memory goes.

Building Memory-efficient Java Applications: Practices and Challenges

You could just store fewer objects in memory. :) Use a cache that spills to disk or use Terracotta to cluster your heap (which is virtual) allowing unused parts to be flushed out of memory and transparently faulted back in.

I want to add something to the point Peter alredy made(can't comment on his answer :() it's always better to use a memory profiler(check java memory profiler) than to go by intution.80% of time it's routine that we ignore has some problem in it.also collection classes are more prone to memory leaks.

If you have millions of Integers and Floats etc. then see if your algorithms allow for representing the data in arrays of primitives. That means fewer references and lower CPU cost of each garbage collection.

A fancy one: keep most data compressed in ram. Only expand the current working set. If your data has good locality that can work nicely.

Use better data structures. The standard collections in java are rather memory intensive.

[what is a better data structure]

  • If you take a look at the source for the collections, you'll see that if you restrict yourself in how you access the collection, you can save space per element.
  • The way the collection handle growing is no good for large collections. Too much copying. For large collections, you need some block-based algorithm, like btree.

Spend some time getting acquainted with and tuning the VM command line options, especially those concerning garbage collection. While this won't change the memory used by your objects, it can have a big impact on performance with memory-intensive apps on machines with a lot of RAM.

  1. Assign null value to all the variables which are no longer used. Thus make it available for Garbage collection.
  2. De-reference the collections once usage is over, otherwise GC won't sweep those.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top