Question

this is a part of samtools mpileup result:

chr7    55241514        G       2786    .....................
chr7    55241515        C       2786    .....................
chr7    55241516        C       2786    .....................
chr7    55241517        G       2786    .....................
chr7    55241518        T       2786    .....................
chr7    55241519        G       2786    .$.$.$.$.$.$.$.$.$.$.
chr7    55241520        G       2776    .....................
chr7    55241521        C       2776    .....................
chr7    55241522        T       2776    .....................
chr7    55241523        G       2774    .....................
chr7    55241524        C       2774    .....................
chr7    55241525        T       2774    .....................
chr7    55241526        G       2723    .....................
chr7    55241527        G       2723    .$.$.$.$.$.$.$.$.$.$.
chr7    55241609        C       7999    ......^F.^F.^F.^F.^F.
chr7    55241610        C       7999    .....................
chr7    55241611        C       7999    .....................
chr7    55241612        A       7999    .....................
chr7    55241613        G       7999    .....................
chr7    55241614        C       7999    .....................
chr7    55241615        T       7999    .....................
chr7    55241616        T       7999    .....................

I don't know the meaning of "^F", I have consulted the help of mpileup commander, a symbol ‘^’ marks the start of a read. The ASCII of the character following ‘^’ minus 33 gives the mapping quality. A symbol ‘$’ marks the end of a read segment. and it didn't say anything about "F", does anybody know what's meaning of "F" in this result.

Was it helpful?

Solution

You almost found the answer by yourself:

... a symbol ‘^’ marks the start of a read. The ASCII of the character following ‘^’ minus 33 gives the mapping quality.

So 'F' encodes the mapping quality of one read starting at that position (I think Steve talks about base call qualities instead). Qualities are phred scores, i.e. log-scaled error probabilities: P = 10^(-Q/10). You can derive the numeric value for your quality by checking an ASCII table (e.g. man ascii) and then subtract 33. F translates into 70, which gives a mapping quality of 37. The definition of mapping quality varies per aligner but in theory this means that there is a 10^(-37/10)=0.01% chance of misalignment of that one read starting in that column.

Andreas

OTHER TIPS

Welcome to fastq format. Using an ASCII table, you will see that an 'F' has a decimal value of 70. Therefore, the quality score associated with an 'F' is 70 - 33 which gives you 37.

Assuming your pileup was generated using a dataset with Illumina 1.8+ encoding, the quality score range would be 0 to 41. So 37 is quite a high quality score for that position. From memory:

40 would give a base call accuracy of 99.99%
30 would give a base call accuracy of 99.90%
20 would give a base call accuracy of 99.00%
10 would give a base call accuracy of 90.00%
 0 would give a base call accuracy of 00.00%
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top