Function call in Nawk command

Question 1

First off, you've got a shell function which you're trying to call from within your awk script. That can't work.

If your validation must be in shell, then perhaps you can do the whole thing in shell.

#!/bin/sh

while read line; do

  var=${line#* | }                              # take off first field
  var=${var#* | }                               # take off second field
  var=${var% | *}                               # take off fourth field
  var=`expr "$var" : "^\ *\(.*[^ ]\)\ *$"`      # trim whitespace

  if [ ${#var} -gt 3 ]; then
    echo "$line" >> outputgood.txt
  else
    echo "$line" >> outputbad.txt
  fi

done < input.txt

We're splitting the line using parameter tools because $IFS doesn't let us use variable amounts of whitespace. You could alternatively still do this using positional parameters, potentially giving you easier access to other fields as well. Note that you still need to trim, if you're using field length as a condition.

#!/bin/sh

IFS="|"
while read line; do

  set -- $line
  var=`expr "$3" : "^\ *\(.*[^ ]\)\ *$"`        # trim whitespace

  if [ ${#var} -gt 3 ]; then
    echo "$line" >> outputgood.txt
  else
    echo "$line" >> outputbad.txt
  fi

done < input.txt

If what you're really interested in is whether the third field is greater than 1000, then that would be a better thing to test for than the length of the field. Clarity in programming is like clarity in anything else. Don't obfuscate if you can avoid it..

Note that we could do this with a little less code in bash, but your question just specified "shell" so I'm assuming /bin/sh.

Question 2

If that's your read data, splitting on pipe will leave whitespace in the field's value. Also, the nawk function must actually be in the body of the nawk script.

Let's try this:

nawk -F'[[:space:]]*\\\|[[:space:]]*' '
    function validate(in) {
        return (length(in) > 3)
    }
    {
        if (validate($3)) {
            print > "good"
        } else {
            print > "bad"
        }
    }
' input.txt

Much simpler is:

nawk -F '|' '{if ($3 > 1000) print > "good"; else print > "bad"}' input.txt