I'm running a bunch (~320) computational chemistry experiments and I need to pull a small amount of the data out of each of the files so that I can do some work on it in MatLab.

I'm pretty sure I can use sed to make this work, but try as I might I don't seem to be able to do so.

I need all of the data starting at the line beginning with "1 1" and ending with the line starting with "33 33".

 I  J      FI(I,J)      k(I,J)       K(I,J)

 1  1       -337.13279    -0.06697    -0.00430
 2  2       3804.89120     8.52972     0.54787
 3  3       3195.69653     6.01702     0.38648
 4  4       3189.18684     5.99253     0.38490
 5  5       3183.73262     5.97205     0.38359
 6  6       3174.47525     5.93737     0.38136
 7  7       3167.88746     5.91275     0.37978
 8  8       1628.80868     1.56311     0.10040
 9  9       1623.56055     1.55306     0.09975
10 10       1518.21620     1.35806     0.08723
11 11       1476.93012     1.28520     0.08255
12 12       1341.24087     1.05990     0.06808
13 13       1312.30373     1.01466     0.06517
14 14       1264.73004     0.94242     0.06053
15 15       1185.62592     0.82822     0.05320
16 16       1175.54013     0.81419     0.05230
17 17       1170.41211     0.80710     0.05184
18 18       1090.20196     0.70027     0.04498
19 19       1039.29190     0.63639     0.04088
20 20       1015.00116     0.60699     0.03899
21 21       1005.05773     0.59516     0.03823
22 22        986.55965     0.57345     0.03683
23 23        917.65537     0.49615     0.03187
24 24        842.93089     0.41863     0.02689
25 25        819.00146     0.39520     0.02538
26 26        758.39720     0.33888     0.02177
27 27        697.11173     0.28632     0.01839
28 28        628.75684     0.23292     0.01496
29 29        534.75856     0.16849     0.01082
30 30        499.35579     0.14692     0.00944
31 31        422.01320     0.10493     0.00674
32 32        409.30255     0.09870     0.00634
33 33        227.12411     0.03039     0.00195

  33 2nd derivatives larger than 0.371D-04 over     561

MatLab is not a fan of text, so I'd like to not use text delimiters (though there are some in the header of this data section) and keep the data contained to only the numeric lines.

The data files contain a lot of other numbers as well, so I need to match the occurrence of "1 1" at the start of the line and "33 33" as the end of the copy. These 'indices' exist only in this block of info.

I attempted to use

% sed -n /"1 1"/,/"33 33"/p input.file > output.file

But I get a WHOLE BUNCH of data in the output file as it copies everything that shows up between any "1" and "33"

Is there any way to do what I'm looking for?

Also, I'm using the tcsh as that is what my servers run.

有帮助吗?

解决方案

How about using awk

awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{t=0}' file

Recommand by @mklement0, if there is only one block, to avoid processing the remainder of the file you can update the command to:

awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{exit}' file

其他提示

Your problem is twofold. First, there are two blanks between the ones, but your regex only allows for one (judging from the now indented code). Second, you are probably not precise enough; the /1 1/ pattern matches 11 11, for example, and 111 111 and so on.

So, you should consider:

sed -n -e '/^ *1  *1 /,/^33  *33 /p' -e '/^33 33 /q' input.file > output.file

The patterns are anchored to the start of line by the ^ (caret). The numbers are separated by one or more blanks (there are other, longer-winded ways of writing that in standard sed; the + option is not standard sed but is widely available). And the numbers are terminated by a blank. The chances are that the first expression alone will give you what you want. The second expression terminates the search early when it recognizes the 33 33 input line, which can save a significant amount of file I/O and hence processing time if the input file is big enough.

If the lines with ID numbers in the hundreds have some different format, then it should be fairly straight-forward to tweak the regexes to match what is used. If the data contains tabs instead of (or as well as) blanks, you can tweak the regexes to manage that, too.

If you data is all formatted exactly the same as this file, then you can use sed to just read the 3rd through the 35th line (rows 1 1 - 33 33). This is a lot easier than parsing the values, but does require that the files have a standard format:

sed -n 3,35p data.txt

Another cheap way would be to grep for only numeric lines, and take only the first 33:

grep "^[0-9 ][0-9 .-]*$" data.txt | head -n 33
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top