Pregunta

How can i schedule an oozie coordinator to run today's instance only after yesterday's dataset input is available. It has to check only for one input dataset which has to be an old dataset, something like 1 day ago or 2 day ago dataset. It doesn't have to wait for today's dataset input.

I have tried using something like below, here i have used ${coord:current(-1)} inside the instance tag so that it can check for yesterday's data , but this doesn't seem to work. Even if the signal is not available for yesterday's date the job gets fired at the nominal time.

    <coordinator-app name="hello-coord" frequency="${coord:days(1)}"
              start="2009-01-02T08:00Z" end="2009-01-04T08:00Z" timezone="America/Los_Angeles"
             xmlns="uri:oozie:coordinator:0.1">
   <datasets>
   <dataset name="din" frequency="${coord:days(1)}"
            initial-instance="2009-01-02T08:00Z" timezone="America/Los_Angeles">
     <uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}</uri-template>
     <done-flag>_SUCCESS</done-flag>
    </dataset>
    </datasets>
  <input-events>
     <data-in name="input" dataset="din">
            <instance>${coord:current(-1)}</instance>
     </data-in>
  </input-events>
  <action>
  <workflow>
      <app-path>${wf_app_path}</app-path>
  </workflow>
  </action>

¿Fue útil?

Solución

The <dataset> tag is used to set the folder in which the trigger file will appear.

The <input-events> tag provides the time from which folder parameters like ${YEAR}, ${MONTH} and ${DAY} are calculated.

<instance>${coord:current(-1)}</instance> means the time given is -1 day.

Hence on the first day, that is at "2009-01-02T08:00Z", the time given to <dataset> is "2009-01-01T08:00Z" which is earlier than the initial-instance="2009-01-02T08:00Z". All actions before the intial-instance are executed without waiting for the trigger file to appear.

Solution is to change initial-instance="2009-01-01T08:00Z"

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top