Bash/Nawk whitespace problems

Question 1

The following script should do what you want. The only thing that might not work yet is the choice of delimiters. In your original script you seem to have tabs. My solution assumes spaces. But changing that should not be a problem.

It simply pipes all files sequentially into the nawk without counting the files first. I understand that this is not required. Instead of trying to keep track of positions in the file it uses arrays to store seperate statistical data for each step. In the end it iterates over all step indexes found and outputs them. Since the iteration is not sorted there is another pipe into a Unix sort call which handles this.

#!/bin/bash
# pipe the data of all files into the nawk processor
cat *.data | nawk ' 
BEGIN { 
  FS=" "                         # set the delimiter for the columns
} 
{
  step = $1                      # step is in column 1
  temp = $2*$2 + $3*$3 + $4*$4

  # use arrays indexed by step to store data
  calc[step] = calc[step] + sqrt (temp)
  fluc[step] = fluc[step] + calc[step]*calc[step]
  count[step] = count[step] + 1   # count the number of samples seen for a step
}
END {
  # iterate over all existing steps (this is not sorted!)
  for (i in count) {
    stddev = sqrt((calc[i] * calc[i]) + (fluc[i] * fluc[i]))
    print i" "calc[i]/count[i]" "fluc[i]/count[i]" "stddev
  }
}' | sort -n -k 1 # that' why we sort here: first column "-k 1" and numerically "-n"

EDIT

As sugested by @edmorton awk can take care of loading the files itself. The following enhanced version removes the call to cat and instead passes the file pattern as parameter to nawk. Also, as suggested by @NictraSavios the new version introduces a special handling for the output of the statistics of the last step. Note that the gathering of the statistics is still done for all steps. It's a little difficult to suppress this during the reading of the data since at that point we don't know yet what the last step will be. Although this can be done with some extra effort you would probably loose a lot of robustness of your data handling since right now the script does not make any assumptions about:

the number of files provided,
the order of the files processed,
the number of steps in each file,
the order of the steps in a file,
the completeness of steps as a range without "holes".

Enhanced script:

#!/bin/bash
nawk ' 
BEGIN { 
  FS=" "   # set the delimiter for the columns (not really required for space which is the default)
  maxstep = -1
} 
{
  step = $1                      # step is in column 1
  temp = $2*$2 + $3*$3 + $4*$4

  # remember maximum step for selected output
  if (step > maxstep)
    maxstep = step

  # use arrays indexed by step to store data
  calc[step] = calc[step] + sqrt (temp)
  fluc[step] = fluc[step] + calc[step]*calc[step]
  count[step] = count[step] + 1   # count the number of samples seen for a step
}
END {
  # iterate over all existing steps (this is not sorted!)
  for (i in count) {
    stddev = sqrt((calc[i] * calc[i]) + (fluc[i] * fluc[i]))
    if (i == maxstep)
      # handle the last step in a special way
      print i" "calc[i]/count[i]" "fluc[i]/count[i]" "stddev
    else
      # this is the normal handling
      print i" "calc[i]/count[i]
  }
}' *.data | sort -n -k 1 # that' why we sort here: first column "-k 1" and numerically "-n"

Question 2

You could also use:

awk -f c.awk *.data

where c.awk is

{
    j=FNR
    temp=$2*$2+$3*$3+$4*$4
    calc[j]=calc[j]+sqrt(temp)
    fluc[j]=fluc[j]+calc[j]*calc[j]
}

END {
    N=ARGIND
    for (i=1; i<=FNR; i++) {
        stdev=sqrt(fluc[i]-calc[i]*calc[i])
        print i-1,calc[i]/N,fluc[i]/N,stdev
    }
}