Question

I am trying to run a Hive QUERy I have a table with lets say 3 columns. Of one is a date column with data as :

a d 2014-04-01
b e 2014-04-03
c f 2014-04-20

Now I want to pick the Maximum date from the above data and do a difference with the current date ( current date lets assume is 2014-04-24) and add the difference to the output. What I mean is ; the query should pick 2014-04-20 and subtract it with current date to give an output as 4 and then add this difference to all the dates to have an output as :

a d 2014-04-05
b e 2014-04-07
c f 2014-04-24

I tried this but it runs into a semantic issue:

select A, B, date_add( SOMEDATE, datediff(to_date( FROM_UNIXTIME(UNIX_TIMESTAMP() )), max(SOMEDATE))) As SOMEDATE

Was it helpful?

Solution

Doable using Hive date udf's (DATEDIFF, FROM_UNIXTIME, UNIX_TIMESTAMP, DATE_ADD): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Assuming your source table definition is: DateSource(col1 string, col2 string, myDate string)

The query would be:

SELECT col1, col2, myDate, DATE_ADD(myDate,daysDiff) as adjustedDate 
FROM DateSource
JOIN
  (
    SELECT DATEDIFF(FROM_UNIXTIME(UNIX_TIMESTAMP(),"yyyy-MM-dd"),maxDate) as daysDiff 
      FROM 
       (
         SELECT max(myDate) as maxDate FROM DateSource
       ) maxDate
  ) diffDate;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top