Question

The problem I have to solve is that I have to input IP address prefixes and that data associated with them in a tree so they can be queried later. I'm reading these addresses from a file and the file may contain as many as 16 million records and the file could have duplicates and i have to store those too.

I wrote my own binary search tree, but learned that a TreeMap in Java is implemented using a Red Black tree, but a TreeMap can't contain duplicates.

I want the query to take O(logn) time.
The data structure needs to be in Ram, so I'm also not sure how I'm going to store 16 million nodes.

I wanted to ask: Would it be too much of a performance hit using a library like guava to insert the Ips in Multi-maps? Or is there a better way to do this?

Was it helpful?

Solution

Using a built in library, which is tested documented and well maintained is usually a good practice.
It will also help you learn more about guava. Once you start using it "for just one thing", you will most likely realize there is much more you can use to make your life a bit easier.

Also, an alternative is using a TreeMap<Key,List<MyClass>> rather then TreeMap<Key,MyClass> as a custom implementation of a Multimap.


Regarding memory - you should try to minimize your data as much as possible (use efficient data structures, no need for "wasty" String, for example for storing IPs, there are cheaper alternatives, exploit them.

Also note - the OS will be able to offer you more memory then the RAM you have, by using virtual memory (practically for 64bits machine - it is most likely to be more then enough). However, it will most likely be less efficient then a DS dedicated for disk (such as B+ trees, for example).


Alternatives:
As alternatives to the TreeMap - you might be interested in other data structures (each with its advantages and disadvantages):

  • hash table - implemented as HashMap in java. Your type will then beHashMap<Key,List<Value>>. It allows O(1) average case query, but might decay to O(n) worst case. It also does not allow efficient range queries.
  • trie or its more space efficient version - radix tree. Allows O(1) access to each key, but is usually less space efficient then the alternatives. With this approach, you will implement the Map interface with the DS, and your type will be Map<Key,List<Value>>
  • B+ tree, which is much more optimized for disk - if your data is too large to fit in RAM after all.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top