I have two groups A and B of strings of the letters "AGTE" and I'd like to find some way of comparing these to see whether they are statistically similar. The first group A are real world observations, B are predictions. There are 400 or so in each group Eg

**A**
GTAATEGTTTEAAA
TTEAGE
...

**B**
AGTEAAAAGT
TAT
GGATEAATGGGTEAATG
....

I'd also like to be up to visualise these in some way really for presentation purposes. Do you have any ideas how I might be able to do that?

有帮助吗?

解决方案

I'd suggest you compute the Levenshtein distance between the strings, then you can plot these inter string distances. Larger values indicate strings that are more dissimilar.

If you don't want to implement the Levenshtein distance calculation yourself, check out these submissions on file exchange.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top