The data set has 11719 points. The continuous segments can be detected like this
d = load('summerRunningMean.csv');
% detect continuous parts
s = sqrt(sum(diff(d)' .^ 2));
ind = [0, find(s > 100 * mean(s)), size(d, 1)];
which results in 6 segments with lengths from 534 to 4255 points.
Plotting the tracks superimposed in space
for i = 1 : 6
x = d(ind(i) + 1 : ind(i + 1), 1);
y = d(ind(i) + 1 : ind(i + 1), 2);
plot(x, y)
hold all
end
xlabel x
ylabel y
axis equal
gives this result
and shows that the tracks have roughly the same shape, and that all but one of them start at the same place.
Superimposition of the time series however shows that the movement along that shape occurs at different speeds:
The problem with doing statistics for this data is therefore that there is no common reference point to assign data points from different tracks to each other.
In the following I excluded track #6 because it has a different starting point than the others.
One approach to define such a common reference is to compute the curve length along the track:
l = [0 ; cumsum(sqrt(diff(x) .^ 2 + diff(y) .^ 2))];
Plotting over this length instead of over time makes the x and y coordinates more similar and therefore comparable:
From this result, one can compute the mean and other statistics. To actually do it, the data have to be reparametrized by interpolation:
li = 0 :0.001: l(end);
xi = interp1(l, x, li);
yi = interp1(l, y, li);
Now, for each of the tracks we have a common reference and can store the transformed data in common data matrices:
n = numel(li);
xp(1 : n, i) = xi;
yp(1 : n, i) = yi;
where i
is the track index. The matrices xp
and yp
have to be initialized to NaNs because the tracks have different lengths. Then, statistics can be computed, e.g. the mean:
xm = nanmean(xp, 2);
ym = nanmean(yp, 2);
The resulting mean track together with the original tracks:
To further improve the agreement between the tracks, one could smooth them in order to reduce random variations (span = 5000
seems to work well):
xs = smooth(xi, span);
ys = smooth(yi, span);
After that, the curve length parameter has to be recomputed:
l = [0 ; cumsum(sqrt(diff(xs) .^ 2 + diff(ys) .^ 2))];
The result:
The agreement underlying the common reference has clearly been improved. The data have again to be reparametrized
li = 0 :0.001: l(end);
xsi = interp1(l, xs, li);
ysi = interp1(l, ys, li);
the result stored in the common data matrices
n = numel(li);
xp(1 : n, i) = xsi;
yp(1 : n, i) = ysi;
and the mean computed
xm = nanmean(xp, 2);
ym = nanmean(yp, 2);
The resulting mean track together with the original tracks:
The result looks much smoother. However, the smoothing has reduced the circumference of the large loop at the end of two of the tracks, and as a result the "mean track" does no longer lie between the original tracks. The trade-off between smoothness and closeness to the original tracks can be regulated via the value of span
.
The full code to generate the last figure is included here; in order to reproduce the unsmoothed version, set span = 1;
.
d = load('summerRunningMean.csv');
% detect tracks as continuous segments
s = sqrt(sum(diff(d)' .^ 2));
ind = [0, find(s > 100 * mean(s)), size(d, 1)];
% remove 6th track because itis an outlier
ind = ind(1 : end - 1);
N = size(ind, 2) - 1;
% smoothing parameter
span = 5000;
xp = nan(13000, N); % I know, hardcoded, not nice.
yp = nan(13000, N);
for i = 1 : N
% extract data
x = d(ind(i) + 1 : ind(i + 1), 1);
y = d(ind(i) + 1 : ind(i + 1), 2);
% determine length along curve
% to use as curve parameter
l = [0 ; cumsum(sqrt(diff(x) .^ 2 + diff(y) .^ 2))];
% reparametrize by interpolation
li = 0 :0.001: l(end);
xi = interp1(l, x, li);
yi = interp1(l, y, li);
% smooth to remove small deviations
xs = smooth(xi, span);
ys = smooth(yi, span);
% determine length along smoothed curve
% as improved curve parameter
l = [0 ; cumsum(sqrt(diff(xs) .^ 2 + diff(ys) .^ 2))];
% again, reparametrize by interpolation
li = 0 :0.001: l(end);
xsi = interp1(l, xs, li);
ysi = interp1(l, ys, li);
% store
n = numel(li);
xp(1 : n, i) = xsi;
yp(1 : n, i) = ysi;
plot(x, y)
axis equal
hold all
end
% compute mean
xm = nanmean(xp, 2);
ym = nanmean(yp, 2);
% plot mean
plot(xm, ym, 'k', 'LineWidth', 2)
xlabel x
ylabel y