Question

I am reading a text file containing dates, and I want to parse the Strings representing the dates into Date objects in java. What I notice is the operation is slow. Why? is there any way to accelerate it? My file looks like:

2012-05-02 12:08:06:950, secondColumn, thirdColumn
2012-05-02 12:08:07:530, secondColumn, thirdColumn
2012-05-02 12:08:08:610, secondColumn, thirdColumn

I am reading the file line by line, then I am getting the date String from each line, then I am parsing it into a Date object using a SimpleDateFormat as follow:

DataInputStream in = new DataInputStream(myFileInputStream);
BufferedReader  br = new BufferedReader(new InputStreamReader(in));
String strLine;

SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
while ((strLine = br.readLine()) != null)
{
    ....Do things....
    Date myDateTime = (Date)formatter.parse(myDateString);
    ...Do things....
}
Was it helpful?

Solution

The converting of dates and timezone is expensive. If you can assume your date/times are similar to each other, you can convert the date and hours/minutes (or only dates if you use GMT) whenever minutes change and generate the seconds yourself.

This will call parse once per minute. Depending on your assumptions you could make it once per hours or once per day.

String pattern = "yyyy-MM-dd HH:mm";
SimpleDateFormat formatter = new SimpleDateFormat(pattern);
String lastTime = "";
long lastDate = 0;
while ((strLine = br.readLine()) != null) {
    String myDateString = strLine.split(", ")[0];
    if (!myDateString.startsWith(lastTime)) {
        lastTime = myDateString.substring(0, pattern.length());
        lastDate = formatter.parse(lastTime).getTime();
    }
    Date date = new Date(lastDate + Integer.parseInt(myDateString.substring(pattern.length() + 1).replace(":", "")));
}

OTHER TIPS

tl;dr

  • Use java.time rather than legacy classes.
  • Each parse of String to LocalDateTime with DateTimeFormatter takes less than 1,500 nanoseconds each (0.0000015 seconds).

java.time

You are using troublesome old date-time classes that are now legacy, supplanted by the java.time classes.

Let's do a bit of micro-benchmarking to see just how slow/fast is parsing a date-time string in java.time.

ISO 8601

The ISO 8601 standard defines sensible practical formats for textually representing date-time values. The java.time classes use these standard formats by default when parsing/generating strings.

Use these standard formats instead of inventing your own, as seen in the Question.

DateTimeFormatter

Define a formatting pattern to match your inputs.

DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu-MM-dd HH:mm:ss:SSS" );

We will parse each such input as a LocalDateTime because your input lacks an indicator of time zone or offset-from-UTC. Keep in mind that such values do not represent a moment, are not a point on the timeline. To be an actual moment requires the context of a zone/offset.

String inputInitial = "2012-05-02 12:08:06:950" ;
LocalDateTime ldtInitial = LocalDateTime.parse( inputInitial , f );

Let's make a bunch of such inputs.

int count = 1_000_000;
List < String > inputs = new ArrayList <>( count );

for ( int i = 0 ; i < count ; i++ )
{
    String s = ldtInitial.plusSeconds( i ).format( f );
    inputs.add( s );
}

Test harness.

long start = System.nanoTime();
for ( String input : inputs )
{
    LocalDateTime ldt = LocalDateTime.parse( input , f );
}
long stop = System.nanoTime();
long elapsed = ( stop - start );
long nanosPerParse = (elapsed / count ) ;
Duration d = Duration.ofNanos( elapsed );

Dump to console.

System.out.println( "Parsing " + count + " strings to LocalDateTime took: " + d  + ". About " + nanosPerParse + " nanos each.");

Parsing 1000000 strings to LocalDateTime took: PT1.320778647S. About 1320 nanos each.

Too slow?

So it takes about a second and a half to parse a million such inputs, on a MacBook Pro laptop with quad-core Intel i7 CPU. In my test runs, each parse takes about 1,000 to 1,500 nanoseconds each.

To my mind, that is not a performance problem.


About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

The ThreeTen-Extra project extends java.time with additional classes. This project is a proving ground for possible future additions to java.time. You may find some useful classes here such as Interval, YearWeek, YearQuarter, and more.

I would suggest writing a custom parser, which is going to be faster. Something like:

Date parseYYYYMMDDHHMM(String strDate) {
   String yearString = strDate.substring(0, 4);
   int year = Integer.parseInt(yearString);
   ...

Another way is using pre-computed hashmap of datetime (w/o millis) to unix-timestamp. Will work if there are no much distinct dates (or you can recompute it once the date flips over).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top