سؤال

I have a data with fastq format:

@HISEQ:157:C11RCACXX:6:1101:1522:2491 2:N:0:CGTACG
GTGCCNNNNNNNNNNNNNNNNNNNNNNNTGCGNNNNNNNNNNNNNNCNNGCAGATACTCGTANNNNNNNNNGNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
+
@BCFF###########################################################################
#####################
@HISEQ:157:C11RCACXX:6:1101:1668:2494 2:N:0:CGTACG
TCTTTNNNNNNNNNNNNNNNNNNNNNNNATTGNNNNNNNNNNNNNNTTNTGTTTTACGGTTTNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
+
C@CFF###########################################################################
#####################
@HISEQ:157:C11RCACXX:6:1101:2557:2492 2:N:0:CGTACG
CCTCTNNNNNNNNNNNNNNNNNNNNNNNGTTGNNNNNNNNNNNNNNCNNCAACACACTCCTCNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
+
CCCFF###########################################################################
#####################

and I want to split each read with "+" used awk command, but it didnt't work, Is there simple command with see/awk can convert it into fasta format?

The expect output should be

>HISEQ:157:C11RCACXX:6:1101:1522:2491 2:N:0:
CGTACGGTGCCNNNNNNNNNNNNNNNNNNNNNNNTGCGNNNNNNNNNNNNNNCNNGCAGATACTCGTANNNNNNNNNGNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
>HISEQ:157:C11RCACXX:6:1101:1668:2494 2:N:0:
CGTACGTCTTTNNNNNNNNNNNNNNNNNNNNNNNATTGNNNNNNNNNNNNNNTTNTGTTTTACGGTTTNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
>HISEQ:157:C11RCACXX:6:1101:2557:2492 2:N:0:
CGTACGCCTCTNNNNNNNNNNNNNNNNNNNNNNNGTTGNNNNNNNNNNNNNNCNNCAACACACTCCTCNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN

Thanks a lot!

هل كانت مفيدة؟

المحلول

You may try the following

awk -f conv.awk input.txt

where input.txt is your input data file, and conv.awk is

/@HISEQ/ { p=1; sub(/^@/,">"); sub(/:[^:]*$/,":"); print; next }
/^\+/ {p=0}
p==1 { print }

نصائح أخرى

awk '((/@/&&$0!~/#/)||$0!~/#/)&&$0!~/\+/' your_file

Tested Below:

> cat temp2
@HISEQ:157:C11RCACXX:6:1101:1522:2491 2:N:0:CGTACG
GTGCCNNNNNNNNNNNNNNNNNNNNNNNTGCGNNNNNNNNNNNNNNCNNGCAGATACTCGTANNNNNNNNNGNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
+
@BCFF###########################################################################
#####################
@HISEQ:157:C11RCACXX:6:1101:1668:2494 2:N:0:CGTACG
TCTTTNNNNNNNNNNNNNNNNNNNNNNNATTGNNNNNNNNNNNNNNTTNTGTTTTACGGTTTNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
+
C@CFF###########################################################################
#####################
@HISEQ:157:C11RCACXX:6:1101:2557:2492 2:N:0:CGTACG
CCTCTNNNNNNNNNNNNNNNNNNNNNNNGTTGNNNNNNNNNNNNNNCNNCAACACACTCCTCNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
+
CCCFF###########################################################################
#####################
>
> nawk '((/@/&&$0!~/#/)||$0!~/#/)&&$0!~/\+/' temp2
@HISEQ:157:C11RCACXX:6:1101:1522:2491 2:N:0:CGTACG
GTGCCNNNNNNNNNNNNNNNNNNNNNNNTGCGNNNNNNNNNNNNNNCNNGCAGATACTCGTANNNNNNNNNGNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
@HISEQ:157:C11RCACXX:6:1101:1668:2494 2:N:0:CGTACG
TCTTTNNNNNNNNNNNNNNNNNNNNNNNATTGNNNNNNNNNNNNNNTTNTGTTTTACGGTTTNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
@HISEQ:157:C11RCACXX:6:1101:2557:2492 2:N:0:CGTACG
CCTCTNNNNNNNNNNNNNNNNNNNNNNNGTTGNNNNNNNNNNNNNNCNNCAACACACTCCTCNNNNNNNNGCNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
>
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top