How do you split a file base on a token?

https://stackoverflow.com/questions/290503

08-07-2019
|

Question

Let's say you got a file containing texts (from 1 to N) separated by a $ How can a slit the file so the end result is N files?

text1 with newlines $
text2 $etc... $
textN

I'm thinking something with awk or sed but is there any available unix app that already perform that kind of task?

Solution

Maybe split -p pattern?

Hmm. That may not be exactly what you want. It doesn't split a line, it only starts a new file when it sees the pattern. And it seems to be supported only on BSD-related systems.

You could use something like:

awk 'BEGIN {RS = "$"} { ... }'

edit: You might find some inspiration for the { ... } part here:

http://www.gnu.org/manual/gawk/html_node/Split-Program.html

edit: Thanks to comment from dmckee, but csplit also seems to copy the whole line on which the pattern occurs.

OTHER TIPS

awk 'BEGIN{RS="$"; ORS=""} { textNumber++; print $0 > "text"textNumber".out" }' fileName

Thank to Bill Karwin for the idea.

Edit : Add the ORS="" to avoid printing a newline at the end of each files.

If I'm reading this right, the UNIX cut command can be used for this.

cut -d $ -f 1- filename

I might have the syntax slightly off, but that should tell cut that you're using $ separated fields and to return fields 1 through the end.

You may need to escape the $.

awk -vRS="$" '{ print $0 > "text"t++".out" }' ORS="" file

using split command we can split using strings.

but csplit command will allow you to slit files basing on regular expressions as well.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow