Talend — one row to many, variable number of output rows

https://stackoverflow.com/questions/7920570

java
talend

15-02-2021
|

Question

Background: It's common in Talend to use something like tSplitRow to map one row with many fields into multiple rows. A row with fields:

Date | Name | MorningPhone | Day Phone | EveningPhone ...could be split into:

Date | Name | Phone ... and you'll always have 3 resulting rows from one row.

Question: What if I want number of rows from a variable number of fields?

I have a schema: UniqueID | FieldSet where FieldSet is a delimited field of columns divisible by nine. If there are 45 fields, in this delimited column, I want 5 rows. 81 fields => 9 rows.

I'm trying to use tJavaRow to parse the fields, but I don't know how to combine that with tSplitRow to generate the appropriate number of fields.

Ideas? Thanks!

Solution

I used a custom tJavaRow -- this turned a specially formatted string into a new table. Sort of a hack, but it worked.

String input = "";
String OUT = "";


try {
      input = java.net.URLDecoder.decode(input_row.CustomField16, "ASCII");

} catch (UnsupportedEncodingException e) {
      e.printStackTrace();
}

String[] pieces = input.split(";");

/*for(int a=0; a<pieces.length; a++)
      System.out.println("Piece "+a+"\n"+pieces[a]);*/



String[] allfields = pieces[0].split("\\|");

//System.out.println("num_full_rows="+num_full_rows);


int fieldnum=9;
int totalrows=1;
for (int i=0; i+8<allfields.length; i++) {

      String xrow = allfields[i];
      i++;
      for (int j=i; j<fieldnum*totalrows;j++){
            xrow=xrow+"\t"+allfields[j];
      }
      i+=fieldnum-2;

      totalrows++;
      OUT += (input_row.LoadTime + "\t"
                  + input_row.minutepart + "\t" + input_row.TXID
                  + "\t" + input_row.SessionString + "\t" + xrow + "\n");


}

output_row.BULK = OUT;

OTHER TIPS

Talend has evolved since this question was made, and a much better way of doing this, is to use tNormalize component.

enter image description here

First, we use a file like this as input:

pepe|123|123
juan|454|2423|34343|5454

We read this file using tFileInputRegex component. We have to define the regular expression and the schema. The regular expression will be:

"^([^|]+)\\|(.+)"

The schema will be:

enter image description here

Then, we connect tFileInputRegex with a tNormalize. We set the separator to:

"\\|"

And finally we use the output as we need.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow