Dynamic editing within blockmean
|Target version:||Candidate for next minor release|
|Platform:||Mac OS X|
Blockmean (and it's cousins, blockmedian and blockmode) are terrific tools for rapidly computing grid cell means (medians and modes) from arbitrarily located x,y,z triples.
Unfortunately, these tools lack any way for outlier detection. For example, I would welcome a feature that permitted the user to input a sigma scale level (i.e., 3-sigma, 2-sigma etc.) and have blockmean compute new means based on all points within that sigma scale level for each cell.
There is, as far as I'm aware, no "easy" way to do this, short of working with each cell, individually. First you pass the original data through blockmean to compute mean and sigmas (with
S option). Then the original data needs to be passed through two calls of gmtselect: one to capture data for each cell and a second gmtselect call with the -Z option invoked to select only values within Z-bounds (e.g., set with +/ 3-sigma), saving the data that passed the test and recomputing the cell mean. This process takes a long time since it has to be run on each individual cell. And you may need to pass the data through twice or more!
Any thoughts? Has anyone come-up with a faster/better way to edit out outliers for the blockmean command?
Thanks - John
#3 Updated by Paul over 6 years ago
- Status changed from New to Feedback
In their way, both blockmedian and blockmode provide outlier protection by virtue of giving a robust estimate of the central location. Presumably, it is data within +/- robust 3-sigma of that location you want to keep, and hence blockmean is not the tool for this job. Unlike the other two, blockmean does not keep arrays of points for each block which is what the others do. I think it may be much easier to add such a feature to blockmedian that, upon finishing reading and placing in block arrays, simply determines the robust (median) location, ignore values outside the robust 3-sigma (block* computes this range), then compute mean,stdev estimates instead. This is like the reweighed least squares approach of Rousseeuw and Leroy and that is something I think would be valuable. But not a complete rewrite of blockmean. A similar option for blockmode based on its central location and spread.
#4 Updated by John over 6 years ago
Thanks for the considerations, R & P.
Remko, in my experience, I find that a two-pass 3-sigma edit tends to catch multiple outliers in most sample sets. If you have one outlier that is WAY out there, and few others that aren't so disperse, then with a one-time edit, you'll catch the really bad value, but the not-so-bad ones will remain.
Paul, interesting idea about using blockmedian. I have to confess I'm not as familiar with their capabilities. I'm certainly not familiar with their inner workings. I need to learn more about it - still would be nice to have some way to do some kind of editing inside each cell. - JR