Reformat a large text file into one line strings (via BASH)
Question
File1:
hello
- dictionary definitions:
hi
hello
hallo
greetings
salutations
no more hello for you
-
world
- dictionary definitions:
universe
everything
the globe
the biggest tree
planet
cess pool of organic life
-
I need to format this (for a huge list of words) into a term to definition format (one line per term). How can one achieve this? None of the words are the same, only the structure seen above is. The resultant file would look something like this:
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you -
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life -
Awk/Sed/Grep/Cat are the usual contenders.
Solution
awk 'BEGIN {FS="\n"; RS="-\n"}{for(i=1;i<=NF;i++) printf("%s ",$i); if($1)print"-";}' dict.txt
outputs:
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you -
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life -
OTHER TIPS
and who says only Perl can do it elegantly ? :)
$ gawk -vRS="-\n" '{gsub(/\n/," ")}1' file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life
OR
# gawk 'BEGIN{RS="-\n";FS="\n";OFS=" "}{$1=$1}1' file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life
A perl one-liner:
perl -pe 'chomp;s/^-$/\n/;print " "' File1
gives
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life
This is 'something like' your required output.
Not sure the scripting language you will be using, pseudo code here:
for each line
if line is "-"
create new line
else
append separator to previous line
append line to previous line
end if
end for loop
Try this one liner works on a conditions that theer will always be 6 lines for a word
sed 'N;N;N;N;N;N;N;N;s/\n/ /g' test_3
sed -ne'1{x;d};/^-$/{g;s/\n/ /g;p;n;x;d};H'
awk -v'RS=\n-\n' '{gsub(/\n/," ")}1'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow