Frage

I have the following bash script which pulls a list of numbers from a file. I want to maintain a log of the order in which they were pulled (that is important information). So I got some help (possibly from an example I found on here) of dumping the information into an array, sorting and outputting the information.

if [ ! -z "$sort" ]; then
  if [[ $sort == ascending ]]; then
    gawk '/SCF Done/\
           {c++; list[$5]=c}
           END {
                 asorti(list,energies);
                 for (i=1;i<=c;i++)
                 printf("%s%s%d\n",energies[i]," - Optimization Step #",list[energies[i]])
                 print "Total Optimization Steps: "c}
           ' "$1"

The only issue, is that I found there is a chance the value stored in the $5 field from the line can be repeated. So during the initial building of the array, list[$5], this value might be non-unique, and hence the previous value of c gets overwritten. I've thought of a few things (multiplying the value of $5 by some random number, and then redividing that out afterwards), but I would not be surprised if there is an already established (and more efficient) method for dealing with this problem that I'm unaware of.

Here is the output of a grep "SCF Done"

 SCF Done:  E(UM11L) =  -1267.67892101     A.U. after   41 cycles
 SCF Done:  E(UM11L) =  -1267.64771239     A.U. after   43 cycles
 SCF Done:  E(UM11L) =  -1267.67892101     A.U. after   39 cycles
 SCF Done:  E(UM11L) =  -1267.67892578     A.U. after   24 cycles
 SCF Done:  E(UM11L) =  -1267.67892051     A.U. after   24 cycles
 SCF Done:  E(UM11L) =  -1267.67892201     A.U. after   22 cycles

The whole reason I switched to the gawk format was because I want to pull those middle numbers, then also create a formatted output that reads like the following. I originally used a simple grep "SCF Done" statement, but then getting the formatting, the sorting and etc, was starting to become a rather cumbersome statement to write. The fact is still the same, I want to be able to sort by those numbers, while retaining the correlation between the number and the optimization step (as shown below). But the numbers don't always have to be unique.

-1267.67892101 - Optimization Step #1
-1267.64771239 - Optimization Step #2
-1267.67892101 - Optimization Step #3
-1267.67892578 - Optimization Step #4
-1267.67892051 - Optimization Step #5
-1267.67892201 - Optimization Step #6
War es hilfreich?

Lösung

why are you sorting with gawk instead of sort?

I don't quite get what you're trying to accomplish from your code snippet, but perhaps:

grep 'SCF Done' "$1" | cut -f5 | cat -n | sort -k 2

I see. How about calling out to sort instead of using awk's array sorting.

awk '
    /SCF Done/ {
        printf "%s - Optimization step #%d\n", $5, ++n | "sort"
    } 
    END {
        close("sort")
        print "total steps:", n
    }
' file

which would look like:

-1267.64771239 - Optimization step #2
-1267.67892051 - Optimization step #5
-1267.67892101 - Optimization step #1
-1267.67892101 - Optimization step #3
-1267.67892201 - Optimization step #6
-1267.67892578 - Optimization step #4
total steps: 6

Andere Tipps

Am I missing where the sort is coming into play? If you are worried about repeating lines, simply skip the line if it was the same as your previous line:

$ awk 
    'END { print "total steps: " count }
     /SCF Done/ {
        if ( prev5 == $5 ) {
             continue  # Skip duplicate line
        }
        count++
        printf "%s - Optimization step #%d\n", $5, count
        prev5 = $5
    }'

If you really don't want a line to ever repeat, use arrays to store the value of $5 as the key to the array. Then, you can use the array to see if you have ever hit that line. All arrays in awk are really hashes:

$ awk 
    'END { print "total steps: " count }
     {
        if ( $0 ~ /SCF Done/  ) {
            if ( prev[$5] == 1 ) {
                continue  # Seen that value of $5 before. Skip
            }
            count++
            printf "%s - Optimization step #%d\n", $5, count
            prev[$5] = 1  # Mark that you've printed $5 out
        }
    }'
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top