Pull Alignment Character Position

https://stackoverflow.com/questions/10872710

12-06-2021
|

Question

I use pairwise align to get the following:

> alignment <-pairwiseAlignment(pattern = canonical.protein, subject=protein.extracted)
> alignment
Global PairwiseAlignedFixedSubject (1 of 1)
pattern: [448]          DDWEIPDGQITVGQRIGSGSFGTVYKGKWHGDVAVKMLNVTAPTPQQLQAFKNEVGV...FMVGRGYLSPDLSKVRSNCPKAMKRLMAE  CLKKKRDERPLFPQILASIELLARSLPK 
subject:   [1]     DDWEIPDGQITVGQRIGSGSFGTVYKGKWHGDVAVKMLNVTAPTPQQLQAFKNEVGV...FMVGRGYLSPDLSKVRSNCPKAMKRLMAECLKKKRDERPLFPQILASIELLARSLPK 
score: -912.3752

I can then use:

toString(pattern(alignment))
toString(subject(alignment))

to get the full string sequence for both the pattern and the subject. However, how do I get the number 448 and 1 out of the object as an integer? I need to use these numbers but there doesn't seem to be a way to get at them.

Solution

I believe these are the starts of the alignments, so

start(pattern(alignment))

Your question would be clearer with a fully reproducible example, e.g.,

library(Biostrings)
example(pairwiseAlignment)
aln <- pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"),
    substitutionMatrix = "BLOSUM50", gapOpening = 0, gapExtension = -8)

Then

> aln
Global PairwiseAlignedFixedSubject (1 of 1)
pattern: [1] PA--W-HEAE
subject: [2] EAGAWGHE-E
score: 1
> start(subject(aln))
[1] 2

Also, the Bioconductor mailing list is more appropriate for these questions; no subscription required.

OTHER TIPS

Since you can make a string out of the alignment you can use R's string functions. You can do substr(toString(pattern(alignment)), 448, 448) to get the 448th character. I'm not familiar with that library so there might be an inbuilt way that I don't know of. See http://www.statmethods.net/management/functions.html for string functions in R.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow