Question

I have the following data.

1455931_at Chrna3 1420468_at Asb17 1445520_at −−− 1436717_x_at Hbb−y 1431788_at Fabp12 1458975_at −−−

With sed or VIM editor how can I change it to

1455931_at Chrna3 
1420468_at Asb17 
1445520_at −−− 
1436717_x_at Hbb−y 
1431788_at Fabp12 
1458975_at −−−

So all the word that has _at will be the first of every line. Every line consist of pairwise _at and gene terms.

Was it helpful?

Solution

In Vim, I would do this:

:%s/ /^M/g
:g/_at/j

Where the ^M is typed by pressing control-V (control-Q on Windows) followed by the Enter/Return key.

This assumes single spaces between tokens; as @Floris suggests, you can use s/ \+/^M/g to turn multiple consecutive spaces into a single newline. Or you could use s/\v\s+/^M/g to do the same thing with any consecutive whitespace including tabs as well as literal space characters.

OTHER TIPS

Amazing but true:

sed 's/\([^ ]*\) \(.[^ ]* \)/\1 \2\
> /g' <<<"1455931_at Chrna3 1420468_at Asb17 1445520_at −−− 1436717_x_at Hbb−y 1431788_at Fabp12 1458975_at −−−"
1455931_at Chrna3 
1420468_at Asb17 
1445520_at −−− 
1436717_x_at Hbb−y 
1431788_at Fabp12 
1458975_at −−−

In other words, the sed string I used had a physical carriage return in it (the > was added by the console):

sed 's/\([^ ]*\) \(.[^ ]* \)/\1 \2\
> /g'

You could experiment a bit with other expressions (right now I'm assuming balanced pairs, but if you specifically want to match the at at the end of the first string you could).

Using sed: s/ /\n/g; s/_at\n/_at /g There might be a more elegant solution but this one will do.

for your example,

sed -e 's/\(_at [0-9a-zA-Z−]*\) /\1\n/g'
sed 's/\(_at[[:blank:]]\{1,\}[^[:blank:]\{1,\}\)\([[:blank:]]\)/\1\
\2/g' YourFile

This allow any "space" as separator and in one or more occurence, no \n on last line. This take 1 "word" after any portion of string terminated by _at, not alternance of word (interpretation from my side).

This does not avoid 2 "_at" to be write on 2 separate lines (case there is a missing/empty word)

Here is an awk solution:

awk '{for (i=1;i<=NF;i+=2) print $i,$(i+1)}' file
1455931_at Chrna3
1420468_at Asb17
1445520_at ...
1436717_x_at Hbb.y
1431788_at Fabp12
1458975_at ...

This prints two and two fields.

Another version:

awk '{printf $0 FS;getline;print}' RS=" " file

You can use this to find all 2-word pairs where the first word ends with "_at"

grep -oP '\S+_at\s+\S+' file

or, to put a newline after every 2nd word:

tr -s '[:blank:]' '\n' < file | paste -d " " - -
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top