鉴于这些意见:

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

我想生成:

  1. 一千个长度 - 10个标签

  2. 标签中每个位置的替代率为0.003

  3. 产生如下输出:

    AAAAAAAAAA
    AATAACAAAA
    .....
    AAGGAAAAGA # 1000th tags
    

    在Perl中有一种简洁的方法吗?

    我坚持将此脚本的逻辑作为核心:

    #!/usr/bin/perl
    
    my $init_seq = "AAAAAAAAAA" #length 10 bp 
    my $sub_rate = 0.003;
    my $nof_tags = 1000;
    my @dna = qw( A C G T );
    
        $i = 0;
        while ($i < length($init_seq)) {
            $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4
    
            if ($roll == 1) {$base = A;}
            elsif ($roll == 2) {$base = T;}
            elsif ($roll == 3) {$base = C;}
            elsif ($roll == 4) {$base = G;};
    
            print $base;
        }
        continue {
            $i++;
        }
    
有帮助吗?

解决方案

作为一项小型优化,请替换:

    $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

    if ($roll == 1) {$base = A;}
    elsif ($roll == 2) {$base = T;}
    elsif ($roll == 3) {$base = C;}
    elsif ($roll == 4) {$base = G;};

    $base = $dna[int(rand 4)];

其他提示

编辑:假设替代率在0.001到1.000之间:

$ roll 一样,生成[1..1000]范围内的另一个(伪)随机数,如果它小于或等于(1000 * $ sub_rate),则执行替换,否则什么都不做(即输出'A')。

请注意,除非知道随机数生成器的属性,否则可能会引入微妙的偏差。

不完全是你想要的,但我建议你看一下BioPerl的 Bio :: SeqEvolution :: DNAPoint 模块。但它并不以突变率作为参数。相反,它询问序列同一性的下限与你想要的原始内容。

use strict;
use warnings;
use Bio::Seq;
use Bio::SeqEvolution::Factory;

my $seq = Bio::Seq->new(-seq => 'AAAAAAAAAA', -alphabet => 'dna');

my $evolve = Bio::SeqEvolution::Factory->new (
   -rate     => 2,      # transition/transversion rate
   -seq      => $seq
   -identity => 50      # At least 50% identity with the original
);


my @mutated;
for (1..1000) { push @mutated, $evolve->next_seq }

所有1000个突变序列都将存储在@mutated数组中,可以通过 seq 方法访问它们的序列。

如果发生替换,您希望排除当前基础

my @other_bases = grep { 

如果发生替换,您希望排除当前基础

<*>

另请参阅 Mitch Wheat的回答如何实现替代率。

ne substr($init_seq, $i, 1) } @dna; $base = @other_bases[int(rand 3)];

另请参阅 Mitch Wheat的回答如何实现替代率。

我不知道我是否理解正确,但我会做这样的事情(伪代码):

digits = 'ATCG'
base = 'AAAAAAAAAA'
MAX = 1000
for i = 1 to len(base)
  # check if we have to mutate
  mutate = 1+rand(MAX) <= rate*MAX
  if mutate then
    # find current A:0 T:1 C:2 G:3
    current = digits.find(base[i])
    # get a new position 
    # but ensure that it is not current
    new = (j+1+rand(3)) mod 4        
    base[i] = digits[new]
  end if
end for
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top