How to efficiently sum two columns in a file with 270,000+ rows in bash

Question 1

Use awk instead and take advantage of modulus function:

awk '!(NR%2){print $1+$2}' infile

Question 2

awk is probably faster, but the idiomatic bash way to do this is something like:

while read -a line; do      # read each line one-by-one, into an array
                            # use arithmetic expansion to add col 1 and 2
    echo "$(( ${line[0]} + ${line[1]} ))"
done < <(grep -v READ input.txt)

Note the file input file is only read once (by grep) and the number of externally forked programs is kept to a minimum (just grep, called only once for the whole input file). The rest of the commands are bash builtins.

Using the <( ) process substition, in case variables set in the while loop are required out of scope of the while loop. Otherwise a | pipe could be used.

Question 3

Your question is pretty verbose, yet your goal is not clear. The way I read it, your numbers are on every second line, and you want only to find the maximum sum. Given that:

awk '
    NR%2 == 1 {next} 
    NR == 2 {max = $1+$2; next} 
    $1+$2 > max {max = $1+$2}
    END {print max}
' filename

Question 4

You could also use a pipeline with tools that implicitly loop over the input like so:

grep -v read INFILE | tr -s ' ' + | bc | sort -rn | head -1 > OUTFILE

This assumes there are spaces between your read and write data values.

Question 5

Why not run:

awk 'NR==1 { print "sum"; next } { print $1 + $2 }'

You can afford to run it on the file while the other script it still running. It'll be complete in a few seconds at most (prediction). When you're confident it's right, you can kill the other process.

You can use Perl or Python instead of awk if you prefer.

Your code is running grep, sed and awk on each line of the input file; that's damnably expensive. And it isn't even writing the data to a file; it is creating an array in Bash's memory that'll need to be printed to the output file later.

Question 6

Assuming that it's always one 'header' row followed by one 'data' row:

awk '
  BEGIN{ max = 0 }
  {
    if( NR%2 == 0 ){
      sum = $1 + $2;
      if( sum > max ) { max = sum }
    }
  }
  END{ print max }' input.txt

Or simply trim out all lines that do not conform to what you want:

grep '^[0-9]\+\s\+[0-9]\+$' input.txt | awk '
  BEGIN{ max = 0 }
  {
    sum = $1 + $2;
    if( sum > max ) { max = sum }
  }
  END{ print max }' input.txt