Frage

I have these sample data. (Current Balance is numeric field and has some bad records which need to be replaced)

Accno,Cust_id,gender,DOB,Current_balance
0008647447654709299,87128110,M,29/02/1960,184126.23
0008650447626799299,143500723,F,4/18/1967,165198.85
0008651447674209299,479941323,M,5/5/1979,NULL
0008653447693589299,687746622,M,18-08-1981,#20
0008654447606469299,890134223,M,18-08-1983,0
0008655447659179299,684451923,F,10/9/1982,142.25
0008658447686789299,57470921,F,25-02-1978,458518.25
0008669447629759299,57470925,M,23-01-1981,xx

I need to validate data in Pentaho and want the output like below :

Accno,Cust_id,gender,DOB,Current_balance
0008647447654709299,87128110,M,29/02/1960,184126.23
0008650447626799299,143500723,F,4/18/1967,165198.85
0008651447674209299,479941323,M,5/5/1979,
0008653447693589299,687746622,M,18-08-1981,
0008654447606469299,890134223,M,18-08-1983,0
0008655447659179299,684451923,F,10/9/1982,142.25
0008658447686789299,57470921,F,25-02-1978,458518.25
0008669447629759299,57470925,M,23-01-1981,

That means the validator pass the good row(s) and replace those bad data into null value. Can anyone suggest how can I do this??

War es hilfreich?

Lösung

I'm not sure about Pentaho, but to point you in the right direction, you can use the following regex:

,(?=[^,]+$)(?!\d+(\.\d{2})).*$

In Multi-line mode

If you replace all matches with ',' you should have the desired output.

Working on RegexPal


RegexPlanet translates this into the following Java regex (looks like you just need to escape the backslashes):

,(?=[^,]+$)(?!\\d+(\\.\\d{2})).*$

So in Java I guess you'd use something like:

str.replaceAll("(?m),(?=[^,]+$)(?!\\d+(\\.\\d{2})).*$", ",");

The (?m) at the start is the multi-line flag mentioned above.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top