Question

Good day members,

I have an input file which has rows of numerical digits ( close to 2000 rows ) I want to extract out " the second to the eight digit from the right" of every row of the numeric string into a separate file, with the result separated by a comma as shown.

Example: input.txt

00000000000001303275310752

00000000000001827380519015

00000000000000800081610361

00000000000000449481894004

00000000000000449481894004

00000000000001812612607514

Expected result: newfile.txt

7531075,

8051901,

8161036,

8189400,

8189400,

1260751,

I'm guessing something like 'sed' can be used to solve my problem, but i'm not quite sure how to go about achieving this. I'm connected to a machine running on Solaris 5.10 Appreciate if someone can guide me with a brief explanation.

regards,

novice.

Was it helpful?

Solution

For fixed width input, try:

cut -c19-26 input.txt | sed 's/$/,/'

which is to say, extract the 19th to 26th character of input txt and then replace the end of line with a comma.

If you have variable length lines, you will need something a little different.

OTHER TIPS

You can truncate the leading zeros with:

sed 's/^0*//g'

Thus something like:

sed 's/^0*//g' input.txt | sed 's/$/,/'

should work.

Try:

perl -pe 's/^.*(\d{7})\d$/$1,/' < input.txt

Or if you don't like regular expressions:

perl -pe '$_ = substr($_,-9,-2) . ",\n"' < input.txt

This will work for any fixed or variable length line.

Here is a solution in python, it should be intuitive:

$ cat data2
00000000000001303275310752
00000000000001827380519015
00000000000000800081610361
00000000000000449481894004
00000000000000449481894004
00000000000001812612607514

$ cat digits.py
import sys
for line in sys.stdin:
    print '%s,' % (line[-9:-2])

$ python digits.py < data2
7531075,
8051901,
8161036,
8189400,
8189400,
1260751,
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top