Question

I'm working on an ETL process with Pentaho Data Integration (Spoon, before Kettle).

In the Modified Javascript step of Pentaho you can set a start, end and transform script. In the transform script you can write code that it will be executed only for each row, and from here I don't know how to access to data of the previous row (if it's possible).

I need access to the previous row because all rows are ordered by product, store and date (respectively), and the goal is to get the quantity on hand from the previous row and add the quantity sell or received on the current row (this would be the same product, same store but different date). I also need accessing to the previous row to compare the product and store of the current row with the previous row, because if someone of them changes I must to restart the field quantity_on_hand (I do it with a field of all columns named initial_stock).

On pseudocode would be something like this (if I hadn't the restriction of that the code written on the step is executed only for each row):

while(all_rows_processed()){

    current_row.quantity_on_hand = current_row.initial_stock;

    while(id_product_current_row == id_product_previous_row && id_store_current_row == id_store_previous_row){

        current_row.quantity_on_hand = previous_row.quantity_on_hand + current_row.stock_variation;
    }
}

This question related couldn't help me.

Any ideas to solve my problem would be appreciated.

Was it helpful?

Solution 4

Thanks for all, I've got the solution to my problem.

I've combined all your suggestions and I've used the Analytic Query, Modified Javascript and Group by steps.

all steps of solution

Although the question wasn't very well formulated, the problem I had was to calculate the stock level on each row (there was one row for each product, date and store combination).

First (obviously later than sort rows by product_id, store_id and date ascending), I used the Analytic Query step to group by product_id and store_id, because with this step I've got a new field previous_date to identify the first row of each group (previous_date=null on the row of the group where date was the oldest).

analytic query step

Then I needed to calculate the quantity_on_hand of each group [product,store] at first row (first date of each group because it's sorted by date) because the initial_stock is different for each group. This is because of (sum(quantity_received) - sum(quantity sold)) != quantity_on_hand.

modified javascript step

Finally (and the key was here), I used the Group by step like @andtorg suggested and do it as the next image shows.

group by step

This link that @andtorg suggested was very useful. It includes even two .ktr example files.

Thank you so much for help!

OTHER TIPS

May I ask you to reconsider Group By step? It seems suitable for your scenario. If you sort the stream accordingly to your combination date/store/article, you can calculate cumulative sum for sell/received quantity. This way you can have a running total of inventory variation that would be reset on a group basis.

Also give a look both at this blog post and at the forum post it quotes.

I doubt you need to go to JavaScript for this. Check out the Analytic query step. That will allow you to bring a value from the previous row into the current.

The JavaScript step gives you tremendous flexibility, but if you can do it with the regular transform steps, it will typically be much faster.

use Analytic Query. By Using this Step u can access the previous / next record. Actually, not only prev and next record that you can read, but you can read N Rows Fordward or N Rows Back Wards.

Check the following URL for clearer expalanation :

  1. http://wiki.pentaho.com/display/EAI/Analytic+Query
  2. http://www.nicholasgoodman.com/bt/blog/2009/01/30/the-death-of-prevrow-rowclone/
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top