Question

I'm parsing Delimited Files files to get information that I will put in a Database Table.

Now, I have a file where I have to merge each two rows, to get information about one single register (one row of the Database Table) - the line 1 has some fields from a database row, and line 2 has some other fields to put in the same row.

How can I process two rows at a time?

For example, assuming that I have a file with 6 rows, it corresponds to 3 entries in my Database Table, that has 9 columns. From "odd lines" I get the Columns 1, 3, 4, 5, 8 and 9. From "even rows", I get the remaining info (Columns 2, 6 and 7):

IN  | COLUMN1 | xxxxxxx  | COLUMN3 | COLUMN4 | COLUMN5 | xxxxxxx | xxxxxxx | COLUMN8

OUT | xxxxxxx | COLUMN2 | xxxxxxx  | xxxxxxx | xxxxxxx | COLUMN6 | COLUMN7 | xxxxxxx

IN  | COLUMN1 | xxxxxxx  | COLUMN3 | COLUMN4 | COLUMN5 | xxxxxxx | xxxxxxx | COLUMN8

OUT | xxxxxxx | COLUMN2 | xxxxxxx  | xxxxxxx | xxxxxxx | COLUMN6 | COLUMN7 | xxxxxxx

IN  | COLUMN1 | xxxxxxx  | COLUMN3 | COLUMN4 | COLUMN5 | xxxxxxx | xxxxxxx | COLUMN8

OUT | xxxxxxx | COLUMN2 | xxxxxxx  | xxxxxxx | xxxxxxx | COLUMN6 | COLUMN7 | xxxxxxx
Was it helpful?

Solution

You could try splitting the file up into your 2 types of rows and then using a tMap to join them.

To clarify further you'll want to split the file depending on whether it's an IN or OUT and then use a tMap to join the columns as per your needs.

I've modified your example data a little to look a little like:

|=---+-----------+-----------+-----------+-----------+-----------+-----------+-----------+----------=|
|IN1 |ROW1COLUMN1|xxxxxxx    |ROW1COLUMN3|ROW1COLUMN4|ROW1COLUMN5|xxxxxxx    |xxxxxxx    |ROW1COLUMN8|
|OUT1|xxxxxxx    |ROW1COLUMN2|xxxxxxx    |xxxxxxx    |xxxxxxx    |ROW1COLUMN6|ROW1COLUMN7|xxxxxxx    |
|IN2 |ROW2COLUMN1|xxxxxxx    |ROW2COLUMN3|ROW2COLUMN4|ROW2COLUMN5|xxxxxxx    |xxxxxxx    |ROW2COLUMN8|
|OUT2|xxxxxxx    |ROW2COLUMN2|xxxxxxx    |xxxxxxx    |xxxxxxx    |ROW2COLUMN6|ROW2COLUMN7|xxxxxxx    |
|IN3 |ROW3COLUMN1|xxxxxxx    |ROW3COLUMN3|ROW3COLUMN4|ROW3COLUMN5|xxxxxxx    |xxxxxxx    |ROW3COLUMN8|
|OUT3|xxxxxxx    |ROW3COLUMN2|xxxxxxx    |xxxxxxx    |xxxxxxx    |ROW3COLUMN6|ROW3COLUMN7|xxxxxxx    |
'----+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------'

The only real addition is that there is now a key as to how it should be joined next to the IN or OUT of the first column.

First you'll want to split the data up into your in and out parts with a tMap set up like:

Split the file using a tMap

This simply sends the data down one of two paths depending on whether the Id field begins with "IN" or "Out".

After this you'll want to recombine it with another tMap set up like:

Recombine the files using a tMap

This joins based on the extracted key from the Id file and uses the appropriate columns in the combined output.

Unfortunately you can't split a flow with a tMap and then rejoin it simply straight back into another tMap so the best bet is to output it to two separate places (either database tables or temporary CSV files), and then when that subjob is complete then to read in those separate places and recombine with the second tMap.

An example job might look like:

example Job setup

If you don't have a natural key to join on then you could generate one by taking the outputs of the first tMap and then adding a column with an expression of Numeric.sequence as the value for the column.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top