Question

So I have one big file (13 million rows) and date formatted as:
2009-04-08T01:57:47Z. Now I would like to split it into 2 columns now,
one with just date as dd-MM-yyyy and other with time only hh:MM.

How do I do it?

Was it helpful?

Solution

You can simply use tMap and parseDate/formatDate to do what you want. It is neither necessary nor recommended to implement your own date parsing logic with regexes.

First of all, parse the timestamp using the format yyyy-MM-dd'T'HH:mm:ss'Z'. Then you can use the parsed Date to output the formatted date and time information you want:

  • dd-MM-yyyy for the date
  • HH:mm for the time (Note: you mixed up the case in your question, MM stands for the month)

If you put that logic into a tMap:

Using tMap to split a timestamp into date and time columns

you will get the following:

  • Input:
    • timestamp 2009-04-08T01:57:47Z
  • Output:
    • date 08-04-2009
    • time 01:57

NOTE

Note that when you parse the timestamp with the mentioned format string (yyyy-MM-dd'T'HH:mm:ss'Z'), the time zone information is not parsed (having 'Z' as a literal). Since many applications do not properly set the time zone information anyway but always use 'Z' instead, so this can be safely ignored in most cases.

If you need proper time zone handling and by any chance are able to use Java 7, you may use yyyy-MM-dd'T'HH:mm:ssXXX instead to parse your timestamp.

OTHER TIPS

I'm guessing Talend is falling over on the T and Z part of your date time stamp but this is easily resolved.

As your date time stamp is in a regular pattern we can easily extract the date and time from it with a tExtractRegexFields component.

You'll want to use "^([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}):[0-9]{2}Z" as your regex which will capture the date in yyyy-MM-dd format and the time as mm:HH (you'll want to replace the date time field with a date field and a time field in the schema).

Then to format your date to your required format you'll want to use a tMap and use TalendDate.formatDate("dd-MM-yyyy",TalendDate.parseDate("yyyy-MM-dd",row7.date)) to return a string in the dd-MM-yyyy format.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top