So initially a mixed Gaussian approach looked really promising. The issue is that as well as noisy data, the signal source actually varies into a couple of distinct cases, such that I'd often find one combination of Gaussians which worked on one data-set would fail (drastically) on another.
Getting around this was possible, but the more general solutions introduced drift/bias into the approximations which had an inconsistent impact depending on both noise and the underlying case.
After faffing with this for a while, I opted to try matlab's curvedspline
instead. This ended up providing a much better approach, which I then combined with some multidimensional cluster analysis to pick out places where the spine fitting had clearly gone awry. Using this meant that rather than fitting to bad data (i.e. data which gave serious deviations from the bulk data) I was able to discard these outliers. Specifically, I used domain knowledge to work out cases where, by definition, outliers were a result of a poor fit and not sample variance. This actually only lead to a couple of data points per sample being discarded (1-2 out of 20) and gave pretty clean results in the end.