Question

I'm fairly new to awk and I'm writing a script to read contents of a file process it and then append the result to few files based on the result. The script works on file containing about 100 lines but fails for a file containing 125k lines. I'm confused if its the issue with the way i'm doing things here because i've seen awk work fine with larger files.

Here's my code: FileSplitting.awk

BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 >> "testFile";} if(r<=5){ print $0 >> "testFile2";} } END { print "Done"; }

I'm invoking the script like this:

awk -F"," -f FileSplitting.awk test.csv
Was it helpful?

Solution

The issue is you're using the wrong output redirection operator. You should be using > not >>. Awk does not behave the same as shell wrt these 2 operators. man awk for how those operators work in awk and change your script to:

BEGIN { print "Splitting file ";} { print NR; r=int($2/1024); if(r>5){ print $0 > "testFile";} if(r<=5){ print $0 > "testFile2";} } END { print "Done"; }

to get it to work, and then clean it up to:

BEGIN { print "Splitting file " }
{ print NR; print > ("testFile" (int($2/1024)>5?"":"2")) }
END { print "Done" }

You do NOT need to close the files after every write.

In response to @Aryan's comment below, here are the > and >> awk vs shell equivalents:

1) awks >

awk:
    { print > "foo" }

shell equivalent:

    > foo
    while IFS= read -r var
    do
        printf "%s\n" "$var" >> foo
    done

2) awks >>

awk:
    { print >> "foo" }

shell equivalent:

    while IFS= read -r var
    do
        printf "%s\n" "$var" >> foo
    done
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top