Question

I have bunch of fastq files in the directory and i want to trim the sequence by 2 nucleotides and quality(if the read has 51 base pairs and ends-with CTG or TTG).

here is what i wrote as shell script but i am getting some errors,need help as i am new to shell scripting

Input:

@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGCTG
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFIIFF
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFI

output:

@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTT
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGC
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFII
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFI

script:

for sample in *.fastq;do
    name=$(echo ${sample} | sed 's/.fastq//')
    while read line;do
        if [ ${line:0:1} == "@" ] ; then
                head="${line}"
                $echo $head
        elif [ "${head}" ] && [ "${line}" ] ; then
                length=${#line}
                if [ "${length}" = 51 -a "${line}" =~ *CTG|*TTG ] ; then
                        sequence= substr($line,0,49)
                        #echo $sequence
                fi
        elif [ ${line:0:1} == "+" ] ; then
                plus="${line}"
                #echo $plus
        elif [ "${plus}" ] && [ "${line}" ] ; then
                quality= substr($line,0,49)
                #echo $quality
        fi
        print "${head}\n${sequence}\n${plus}\n${quality}" > ${name}_new.fq
   done < $sample
done
Was it helpful?

Solution

Don't 100% understand what you're doing, but fixed a few things. Try below

#!/bin/bash
for sample in *.fastq; do
  name="${sample/.fastq/}"
  while read -r line; do
    if [[ $line == '@'* ]]; then
      head="$line" && echo "$head" >> "${name}_new.fq"
    elif [[ -n $head && ${#line} == 51 && $line =~ (CTG|TTG)$ ]]; then
      sequence="${line:0:49}" && echo "$sequence" >> "${name}_new.fq"
    elif [[ $line == '+'* ]]; then
      plus="$line" && echo "$line" >> "${name}_new.fq"
    else
      quality="$line" && echo "$quality" >> "${name}_new.fq"
    fi
  done < "$sample"
done

Example output

> cat sample_new.fq

> cat sample.fastq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTTTG
+
#0<BFFFFFFFFF<BFFFIIFFFFFIIIBFFFFFIIFIIIIIFFBFFFFFF
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGCTG
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFIIIFIIFIIFFFFIIFF
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
#0<BFFFFFFFFFFIIBFFIIIIIIFIIIFFIIFI

> ./abovescript

> cat sample_new.fq
@HWI-ST1072:187:C35YUACXX:7:1101:1609:1983 1:N:0:ACAGTG
NGGAGAAAGAGAGTGTGTTTTTAGGGGGAGATTTTTAAAATGGTTGTTT
+
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTATTCGGGAGGTTGAGC
+
@HWI-ST1072:187:C35YUACXX:7:1101:9351:2210 1:N:0:ACAGTG
CGGTTTTGTTTTATTTTGTATGATTAGGAGGGTTTTGGAGGTTTAGTTACC
+
BBBFFFFFFFFFFIIIIIFFIIFIIIIIIIIIFFIIFIFIIFFIIIFIIII
@HWI-ST1072:187:C35YUACXX:7:1101:1747:1995 1:N:0:ACAGTG
NGGTTGTGGTGGTGGGTATTTGTAGTTTTATTTAT
+
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top