Question

I have a table A which looks like:

number    value
 1           A
 1           B
 2           C

And I have a csv file which contains number as one of the columns. When I do a (pentaho)database lookup on this table with number from that csv file i get an output like:

number     value
  1         A
  2         C

Is there any other way in ETL where the output must be like:

 number    value
  1          A
  1          B
  2          C
Was it helpful?

Solution

The Database Value Lookup step is designed to return at most 1 row for any given input value. If you want to get all rows for a key you can use a Database Join step, or read all rows from the table and the csv file, sort them, and flow them through a Merge Join step.

These correspond roughly to a nested lookup join and a sort merge join respectively. You would choose between them in the same way a query optimizer would. Basically the rule of thumb is if the number of rows in the table and the csv are roughly the same, the Merge Join will be faster, otherwise use the Database Join step. This is of course a 'rule of thumb', and will not suit every situation. Experimentation is encouraged if performance is critical.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top