treemap vs hashmap and sort in terms of memory usage

Question 1

If it's a batch job, a TreeMap won't use extra memory like HashMap. HashMaps default load factor is I believe 0.75 (i.e. the map can be 75% full before the size is grown).

A TreeMap would be more straight forward too, provided the O(log n) (IIRC) doesn't become a bottle neck. If it does, you could use a List with your own Tuple object and a custom Comparator, but then you don't get a O(1) get().

Question 2

HashMap is not optimal in memory utilization, rather for specific operations. It is backed by an array, which is allocated on initialization and resized when its size reaches certain limits. Consequently, memory is allocated eagerly. Its size is always a power of 2, which is actually a computational optimization for faster bucket index calculation. As a result, the (unused) allocated memory may significantly surpass the actual memory used by your program.

TreeMap provides the optimal memory utilization, while having worse performance in get, add, remove operations comparing to the HashMap. This becomes more obvious by its constructor, which doesn't have any parameter that can affect its computational complexity. All its entries are lazily allocated and associated with the existing entries to form a tree.

Question 3

Since number of unique keys is unknown, and could be large, hash based approach may consume more memory if number of keys exceed the product of initial capacity of the hash table & its load factor. Because in such events capacity simply gets doubled increasing memory usage.