Unable to separate codes in one file to many files in AWK/Python

https://stackoverflow.com/questions/632958

python
awk

08-07-2019
|

Question

I need to put different codes in one file to many files. The file is apparantly shared by AWK's creators at their homepage. The file is also here for easy use.

My attempt to the problem

I can get the lines where each code locate by

awk '{ print $1 }'

However, I do no know how

to get the exact line numbers so that I can use them
to collect codes between the specific lines so that the first word of each line is ignored
to put these separate codes into new files which are named by the first word at the line

I am sure that the problem can be solved by AWK and with Python too. Perhaps, we need to use them together.

[edit] after the first answer

I get the following error when I try to execute it with awk

$awk awkcode.txt 
awk: syntax error at source line 1
 context is
     >>> awkcode <<< .txt
awk: bailing out at source line 1

Solution

Did you try to:

Create a file unbundle.awk with the following content:

$1 != prev { close(prev); prev = $1 } { print substr($0, index($0, " ") + 1) >$1 }

Remove the following lines form the file awkcode.txt:

# unbundle - unpack a bundle into separate files

$1 != prev { close(prev); prev = $1 } { print substr($0, index($0, " ") + 1) >$1 }

Run the following command:

awk -f unbundle.awk awkcode.txt

OTHER TIPS

Are you trying to unpack a file in that format? It's a kind of shell archive. For more information, see http://en.wikipedia.org/wiki/Shar

If you execute that program with awk, awk will create all those files. You don't need to write or rewrite much. You can simply run that awk program, and it should still work.

First, view the file in "plain" format. http://dpaste.com/12282/plain/

Second, save the plain version of the file as 'awkcode.shar'

Third, I think you need to use the following command.

awk -f awkcode.shar

If you want to replace it with a Python program, it would be something like this.

import urllib2, sys

data= urllib2.urlopen( "http://dpaste.com/12282/plain/" )
currName, currFile = None, sys.stdout
for line in data:
    fileName, _, text= line.strip().partition(' ')
    if fileName == currName:
        currFile.write(line+"\n")
    else:
        if currFile is not None:
            currFile.close()
        currName= fileName
        currFile= open( currName, "w" )
if currFile is not None:
    currFile.close()

Awk file awkcode.txt should not contain ANY BLANK line. If any blank line is encountered, the awk program fails. There is no error check to filter out blank line in the code. This I could find out after several days of struggle.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow