Question

Is there an already implemented data structure that I can use in order to assign to an object (in my case an Edge), an integer? I am reading a graph from a file , 10 mil vertices , 60 mil edges and I assign to each edge , a cost , using a map ( costs.put(e,cost) ).

I create the costs map in this way :

costs = new HashMap<Edge,Integer>();

The exception that it gives is :

java.lang.OutOfMemoryError: Java heap space
    at java.util.HashMap.resize(Unknown Source)
    at java.util.HashMap.addEntry(Unknown Source)
    at java.util.HashMap.put(Unknown Source) 
Was it helpful?

Solution

HashMap is the correct data structure for a basic Map. The problem you are having is that the JVM is not being instructed to reserve enough space to keep the file contents in memory. Start the JVM with a -Xmx flag. For instance -Xmx1G parameter will allow it to use 1 Gigabyte of memory.

OTHER TIPS

You've got 6e7 edges. A plain Object takes 24 bytes (64-bit HotSpot), so that's 1.44e9 bytes right there (1.5 GB). Now you introduce the most efficient map imaginable, adding only 6e7 references plus 6e7 Integer objects. That's another 2.4e8 bytes for the refs and 1.44e9 bytes for the Integers: another 1.5 GB, the total being now 3 GB—and this is the theoretical lower bound for your problem (modulo caching, see below).

Based on this I suggest you just extend your Edge class with another int field. This will drastically reduce your memory footprint.

If this is not an option, however, and:

  • all your integers rarely exceed two digits,
  • you are careful never to use new Integer, but Integer.valueOf, autoboxing, etc.,
  • you use Java 7,

you'll automatically benefit from the built-in small integer cache. If the integers assume values from a wider range, but still with lot of duplication, a custom cache is highly advisable.

Additionally to changing the jvms memory settings you can tweak HashMaps memory management with initial capacity and load balance.

Javadoc excerpt:

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Instead of creating a HashMap

costs = new HashMap<Edge,Integer>();

You can store cost within Edge object.

class Edge
{
    Vertex a;
    Vertex b;
    int cost;
}

This way you can save some memory in system.

Back to the original problem: You have Edges, that have costs. As your graph is sparse, why not use a sparse matrix? Maybe a Object-to-Integer mapping isn't what you really need and want. You can look at apache.commons.math, i think they have sparse matrices. Also, you need to think about how you access the costs in your algorithms, to choose the proper sparse format (column based run-length encoding / row based rle or something else). Or you don't care, and use any, but then you should convert the thing in the beginning of your algorithms.

You do realize that this takes a whole lot of RAM, do you? Try increasing the heap size, and you'll be fine...

And to answer your original question: yes, that is what Maps are for...

You must specify per project how much heap space your project wants

I think you could follow this step:

Right mouse click - Run As - Run Configuration - Arguments - Vm Arguments, then add this -Xmx2048m

Perhaps what you are looking for is TObjectIntHashMap This is similar to HashMap<Edge, Integer> except it stores the int as a primitive potentially saving you some memory. This collection can also be marginally faster when the collection is larger (as it fits into cache better)

TObjectIntHashMap<Edge> costs = new TObjectIntHashMap<Edge>();

costs.put(e, cost); // cost is stored as a primitive not its wrapper object.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top