MATLAB：一个时间序列中的每个1分钟的时间间隔的计算平均

https://stackoverflow.com/questions/2323031

22-09-2019
|

题

我有一堆倍系列由两个部件，时间戳矢量（以秒计）每个所描述的，和测量值的向量。时间矢量是不均匀的（即，在非规则间隔采样的）

我试图计算每个1分钟的平均值/ SD值的间隔（以X分钟的时间间隔，计算其平均值，采取下一个间隔，...）。

我的当前实现使用循环。这是什么我迄今为止的样品：

t = (100:999)' + rand(900,1);       %' non-uniform time
x = 5*rand(900,1) + 10;             % x(i) is the value at time t(i)

interval = 1;         % 1-min interval
tt = ( floor(t(1)):interval*60:ceil(t(end)) )';  %' stopping points of each interval
N = length(tt)-1;

mu = zeros(N,1);
sd = zeros(N,1);

for i=1:N
    indices = ( tt(i) <= t & t < tt(i+1) ); % find t between tt(i) and tt(i+1)
    mu(i) = mean( x(indices) );
    sd(i) = std( x(indices) );
end

我想知道是否有更快的矢量化的解决方案。这是重要的，因为我有大量的时间序列来处理每一个都比上述..

中示出的样品长得多

任何帮助是受欢迎的。

感谢大家的反馈。

我纠正t的方式产生为总是单调增加（排序），这不是一个真正的问题..

另外，我可能没有规定这个明确，但我的意图是在分钟（1分钟只是一个例子）

对于任何间隔长度的溶液

解决方案

的唯一合理的解决方案似乎是...

确定。我觉得很有趣，对我来说只有一个合理的解决方案，但许多人找到其他的解决办法。无论如何，解决方案确实看起来很简单。鉴于向量x和t和一组等距隔开断点TT的，

t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
x = 5*rand(900,1) + 10;             % x(i) is the value at time t(i)

tt = ( floor(t(1)):1*60:ceil(t(end)) )';

（注意，我排序吨以上。）

我将在代码三个完全矢量化线做到这一点。首先，如果休息是武断和间距潜在的不平等，我会用histc来确定数据系列落在这区间鉴于他们是一致的，只是这样做：

int = 1 + floor((t - t(1))/60);

再有，如果是未知的T的元素进行排序，我会用分钟（t）的，而不是T（1）。已经这样做了，使用accumarray减少的结果为平均值和标准偏差。

mu = accumarray(int,x,[],@mean);
sd = accumarray(int,x,[],@std);

其他提示

您可以尝试创建一个单元阵列，并通过cellfun适用平均值和STD。它的〜10％的比为900名的条目溶液慢，但是〜10倍更快90000个条目。

[t,sortIdx]=sort(t); %# we only need to sort in case t is not monotonously increasing
x = x(sortIdx);

tIdx = floor(t/60); %# convert seconds to minutes - can also convert to 5 mins by dividing by 300
tIdx = tIdx - min(tIdx) + 1; %# tIdx now is a vector of indices - i.e. it starts at 1, and should go like your iteration variable.

%# the next few commands are to count how many 1's 2's 3's etc are in tIdx
dt = [tIdx(2:end)-tIdx(1:end-1);1]; 
stepIdx = [0;find(dt>0)];
nIdx = stepIdx(2:end) - stepIdx(1:end-1); %# number of times each index appears

%# convert to cell array
xCell = mat2cell(x,nIdx,1);

%# use cellfun to calculate the mean and sd
mu(tIdx(stepIdx+1)) = cellfun(@mean,xCell); %# the indexing is like that since there may be missing steps
sd(tIdx(stepIdx+1)) = cellfun(@mean,xCell);

注意：我的解决方案没有给出同样的结果作为你的，因为你在端跳过几个时间值（1:60:90为[1,61]），并且由于间隔的开始是不完全一样的。

下面是一种方式，用途二进制搜索。这是6-10x更快9900元，快捷地为99900元约64倍倍。这是很难只用900元，所以我不知道这是在该尺寸更快地获得可靠倍。如果你考虑从生成的数据进行直接TX它使用几乎没有任何额外的内存。除此之外，它只是有四个额外的浮动变量（prevind，首先，中，最后一个）。

% Sort the data so that we can use binary search (takes O(N logN) time complexity).
tx = sortrows([t x]);

prevind = 1;

for i=1:N
    % First do a binary search to find the end of this section
    first = prevind;
    last = length(tx);
    while first ~= last
        mid = floor((first+last)/2);
        if tt(i+1) > tx(mid,1)
            first = mid+1;
        else
            last = mid;
        end;
    end;
    mu(i) = mean( tx(prevind:last-1,2) );
    sd(i) = std( tx(prevind:last-1,2) );
    prevind = last;
end;

它使用所有你原本的变量。我希望它适合你的需求。这是更快，因为它需要为O（log N）找到与二进制搜索索引，但O（N）找到他们，你在做它的方式。

可以一次全部使用bsxfun计算indices：

indices = ( bsxfun(@ge, t, tt(1:end-1)') & bsxfun(@lt, t, tt(2:end)') );

这比循环速度较快，但需要将它们存储一次全部（时间与空间的权衡）..

免责声明：我工作了这一点，在纸面上，但尚未有机会检查“硅”这... 的

您可能能够通过做一些棘手的累加和，索引，并计算平均值和标准偏差自己来避免环路或使用细胞阵列。下面是一些代码，我相信会的工作，虽然我不能确定它如何快速明智的其他解决方案：

[t,sortIndex] = sort(t);  %# Sort the time points
x = x(sortIndex);         %# Sort the data values
interval = 60;            %# Interval size, in seconds

intervalIndex = floor((t-t(1))./interval)+1;  %# Collect t into intervals
nIntervals = max(intervalIndex);              %# The number of intervals
mu = zeros(nIntervals,1);                     %# Preallocate mu
sd = zeros(nIntervals,1);                     %# Preallocate sd

sumIndex = [find(diff(intervalIndex)) ...
            numel(intervalIndex)];  %# Find indices of the interval ends
n = diff([0 sumIndex]);             %# Number of samples per interval
xSum = cumsum(x);                   %# Cumulative sum of x
xSum = diff([0 xSum(sumIndex)]);    %# Sum per interval
xxSum = cumsum(x.^2);               %# Cumulative sum of x^2
xxSum = diff([0 xxSum(sumIndex)]);  %# Squared sum per interval

intervalIndex = intervalIndex(sumIndex);  %# Find index into mu and sd
mu(intervalIndex) = xSum./n;                             %# Compute mean
sd(intervalIndex) = sqrt((xxSum-xSum.*xSum./n)./(n-1));  %# Compute std dev

使用上面的计算标准偏差式的简化此Wikipedia页面上找到

在相同的答案同上，但与参数间隔（window_size）。问题与向量长度解决为好。

window_size = 60; % but it can be any value 60 5 0.1, which wasn't described above

t = sort((100:999)' + 3*rand(900,1));     % non-uniform time
x = 5*rand(900,1) + 10;                   % x(i) is the value at time t(i)

int = 1 + floor((t - t(1))/window_size);
tt = ( floor(t(1)):window_size:ceil(t(end)) )';



% mean val and std dev of the accelerations at speed
mu = accumarray(int,x,[],@mean);
sd = accumarray(int,x,[],@std);

%resolving some issue with sizes (for i.e. window_size = 1 in stead of 60)
while ( sum(size(tt) > size(mu)) > 0 ) 
  tt(end)=[]; 
end

errorbar(tt,mu,sd);

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow