Suitable Java data structure for parsing large data file
-
26-10-2019 - |
Question
I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:
Date Time Value
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>>
to map the date to the rest of the line but is a TreeMap
of List
s a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List
as a value that I'm worried might be unsuitable.
I'm using a TreeMap
because I want to iterate the keys in date order.
Solution
is a TreeMap of Lists a ridiculous thing to do?
Conceptually not, but it is going to be very memory-inefficient (both because of the Map
and because of the List
). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.
For a more memory-efficient solution, create a class that has fields for every column (including a Date
), put all those in a List
and sort it (ideally using quicksort) when you're done reading.
OTHER TIPS
There's nothing wrong with using a List
as the value for a Map
. All of those <>
look ugly, but it's perfectly fine to put a generics class inside of a generics class.
Instead of using a String
as the key, it would probably be better to use java.util.Date
because the keys are dates. This will allow the TreeMap
to more accurately sort the dates. If you store the dates as Strings
, then the TreeMap
may not properly sort the dates (they will be sorted as strings, not as "real" dates).
Map<Date, List<String>> map = new TreeMap<Date, List<String>>();
There is no objection against using Lists. Though in your case maybe a List<Integer>
as values of the Map would be appropriate.