Question

I have a file that I need to parse into an array but I really only want a brief portion of each line and for only the first 84 lines. Sometimes the line maybe:

>MT gi...

And I would just want the MT to be entered into the array. Other times it might be something like this:

>GL000207.1 dn...

and I would need the GL000207.1

I was thinking that you might be able to set two delimiters (one being the '>' and the other being the ' ' whitespace) but I am not sure how you would go about it. I have read other peoples posts about the internal field separator but I am really not sure of how that would work. I would think perhaps something like this might work though?

desiredArray=$(echo file.whatever | tr ">" " ")
for x in $desiredArray
do
   echo > $x
done

Any suggestions?

Was it helpful?

Solution

How about:

head -84 <file> | awk '{print $1}' | tr -d '>'

head takes only the first lines of the file, awk strips off the first space and everything after it, and tr gets rid of the '>'.

OTHER TIPS

You can also do it with sed:

head -n 84 <file> | sed 's/>\([^ ]*\).*/\1/'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top