How to calculate size in memory after inserting data in HashMap and LinkedHashMap?

https://stackoverflow.com/questions/23100477

04-07-2023
|

Question

I am working on some performance test on HashMap, LinkedHashMap insertion. Operations on which I am testing are insert and size in memory after insertion.

I am able to do, insert test and also able to extract size in memory after insertion as well with the below logic -

long myTotalMemoryBefore = Runtime.getRuntime().totalMemory();

/* Fill the hashmap or linkedhashmap */

long myTotalMemoryAfter = Runtime.getRuntime().totalMemory();
long myHashMapMemory = myTotalMemoryAfter - myTotalMemoryBefore;

I have a text file which contains 2 million english words with their frequencies in this format -

hello 100
world 5000
good 2000
bad 9000
...

Now I am reading this file line by line and storing it in HashMap and LinkeddHashMap so I am able to measure the insertion performance and size in memory after insertion as well with the below code.

I have a single class file in which I have two methods, one for HashMap and other for LinkedHashMap performance test and they both are run sequentially, first of all HashMap test will run, then linkedHashMap test willl run -

public void hashMapTest() {

    Map<String, String> wordTest = new HashMap<String, String>();

    long myTotalMemoryBefore = Runtime.getRuntime().totalMemory();
    String line = reader.readLine();
    while (line != null && !line.isEmpty()) {
        // split the string on whitespace
        String[] splittedString = line.split("\\s+");
        String split1 = splittedString[0].toLowerCase().trim();
        Integer split2 = Integer.parseInt(splittedString[1].trim());
        // now put it in HashMap as key value  pair
        wordTest.put(split1, split2);
        line = reader.readLine();
    }

    long myTotalMemoryAfter = Runtime.getRuntime().totalMemory();
    long myHashMapMemory = (myTotalMemoryAfter - myTotalMemoryBefore) / 1024;       

    System.out.println(myHashMapMemory);

}

public void linkedHashMapTest() {

    Map<String, String> wordTest = new LinkedHashMap<String, String>();

    long myTotalMemoryBefore = Runtime.getRuntime().totalMemory();
    String line = reader.readLine();
    while (line != null && !line.isEmpty()) {
        // split the string on whitespace
        String[] splittedString = line.split("\\s+");
        String split1 = splittedString[0].toLowerCase().trim();
        Integer split2 = Integer.parseInt(splittedString[1].trim());
        // now put it in LinkedHashMap as key value  pair
        wordTest.put(split1, split2);
        line = reader.readLine();
    }

    long myTotalMemoryAfter = Runtime.getRuntime().totalMemory();
    long myLinkedHashMapMemory = (myTotalMemoryAfter - myTotalMemoryBefore) / 1024;     

    System.out.println(myLinkedHashMapMemory); // this is coming as zero always or negative value

}

There is a very strange problem I am seeing - For the HashMap performance test, I can see myHashMapMemory has some value in it but in the myLinkedHashMapMemory variable, it always has zero or negative value.

Any thoughts why this is happening and how to avoid this issue? In general, why I am seeing zero or negative value?

Solution

To measure used memory we need to switch off thread allocation buffer -XX:-UseTLAB, then eg this

    Runtime rt = Runtime.getRuntime();
    long m0 = rt.totalMemory() - rt.freeMemory();  //used memory
    Object obj = new Object();
    long m1 = rt.totalMemory() - rt.freeMemory();
    System.out.println(m1 - m0);

will show correct size of java.lang.Object in memory - 16 bytes in my case

OTHER TIPS

Quick question: why have two identical methods...? Just pass in the map as parameter?

But that aside: if you are running them sequentially, by the time you get to second method, gc may have kicked in and deleted stuff from the first hash map. Any memory scanning based on such crude methods will likely not give you a correct estimate.

In other words: the second map may be occupying the same memory space as the first map if it has been gc-ed. Additionally, depending on the jvm and the settings, the jvm can actually give back memory to the OS if it is unused (e.g. after everything in it has been gc-ed).

It is probably because of behaviour of gc as others mentioned. What I want to say is for such big amount of data both map implementations are bad. I have tested that whenever the data is bigger than a few million bytes you must implement Map interface by yourself for that kind of job.

I think Evgeniy is right.In jdk1.7,TLAB is set true default.when a new thread start,TLAB will be allocated even not object created yet.so you can turn off TLAB and try again. Because of the gc factor,you should try more times,and you't better to raise the space of Eden area to avoid young gc.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow