Question

I am not sure how to achieve the function call in the nawk command. I have given the scope what i want want from the input and output. The function should validate the column 3 and return true or false. If the column satisfies the condition then it should go to good file; if not go to bad file. Can you help me in modifying the nawk command to achieve my functionality?

I know we can achieve the length validation in single statement but my validate function is just the sample code. I want to achieve more that length check in the validate function.

input.txt:

1 | I | 123  | KK
3 | U | 3456 | JJ
6 | B | 241  | YH

outputgood.txt:

3 | U | 3456 | JJ

outputbad.txt:

1 | I | 123  | KK
6 | B | 241  | YH

Script:

#!/bin/sh
#function validation

function validate(){
in = $1
if length(in) > 3
  return true
else
 return false
}

nawk -F '|' 'function validate($3){print}' input.txt > outputgood.txt
Was it helpful?

Solution 2

First off, you've got a shell function which you're trying to call from within your awk script. That can't work.

If your validation must be in shell, then perhaps you can do the whole thing in shell.

#!/bin/sh

while read line; do

  var=${line#* | }                              # take off first field
  var=${var#* | }                               # take off second field
  var=${var% | *}                               # take off fourth field
  var=`expr "$var" : "^\ *\(.*[^ ]\)\ *$"`      # trim whitespace

  if [ ${#var} -gt 3 ]; then
    echo "$line" >> outputgood.txt
  else
    echo "$line" >> outputbad.txt
  fi

done < input.txt

We're splitting the line using parameter tools because $IFS doesn't let us use variable amounts of whitespace. You could alternatively still do this using positional parameters, potentially giving you easier access to other fields as well. Note that you still need to trim, if you're using field length as a condition.

#!/bin/sh

IFS="|"
while read line; do

  set -- $line
  var=`expr "$3" : "^\ *\(.*[^ ]\)\ *$"`        # trim whitespace

  if [ ${#var} -gt 3 ]; then
    echo "$line" >> outputgood.txt
  else
    echo "$line" >> outputbad.txt
  fi

done < input.txt

If what you're really interested in is whether the third field is greater than 1000, then that would be a better thing to test for than the length of the field. Clarity in programming is like clarity in anything else. Don't obfuscate if you can avoid it..

Note that we could do this with a little less code in bash, but your question just specified "shell" so I'm assuming /bin/sh.

OTHER TIPS

If that's your read data, splitting on pipe will leave whitespace in the field's value. Also, the nawk function must actually be in the body of the nawk script.

Let's try this:

nawk -F'[[:space:]]*\\\|[[:space:]]*' '
    function validate(in) {
        return (length(in) > 3)
    }
    {
        if (validate($3)) {
            print > "good"
        } else {
            print > "bad"
        }
    }
' input.txt

Much simpler is:

nawk -F '|' '{if ($3 > 1000) print > "good"; else print > "bad"}' input.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top