Pergunta

I'm trying to use Date functions built into pig.
The input file contains date in yyyy-mm-dd hh:mi:ss format.
I'm trying to use the following code:

row = foreach log generate FLATTEN(REGEX_EXTRACT_ALL (pattern) AS (date_time : datetime, other columns);

final_data = foreach row {
  yyyy = (chararray)GetYear(date_time);
  mm = (chararray)GetMonth(date_time);
  dd = (chararray)GetDay(date_time);
  hh = (chararray)GetHour(date_time);
  mi = (chararray)GetMinute(date_time);
  ss = (chararray)GetSecond(date_time);
  generate CONCAT(CONCAT(CONCAT(yyyy, '-'), CONCAT(mm, '-')),dd) as myDate;
  }

but I get an error:

 ERROR 1066: Unable to open iterator for alias final_data. Backend error : java.lang.String cannot be cast to org.joda.time.DateTime

I'm trying to use workaround from: Formatting Date in Generate Statement

What format is expected?

Foi útil?

Solução

REGEX_EXTRACT_ALL takes a string and returns a tuple with the extracted strings, i.e you can't extract into to DateTime fields.
You may use the ToDate UDF on the date string you loaded:
E.g:

cat data.txt
2014-03-11 13:44:11
2014-02-22 10:44:11

A = load 'data.txt' as (in:chararray);
B = foreach A generate ToDate(in,'yyyy-MM-dd HH:mm:ss') as (dt:DateTime);
C = foreach B {
      year = (chararray)GetYear(dt);
      month = (chararray)GetMonth(dt);
      day = (chararray)GetDay(dt);
      generate CONCAT(CONCAT(CONCAT(year, ''), CONCAT(month, '-')),day) as myDate;
};
dump M;
(2014-3-11)
(2014-2-22)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top