Auto-download data needed for doc examples, tests, and examples in man pages

Added by Paul 7 months ago

Auto-download for certain files

Over the years I have received many complaints about the GMT documentation. Take the man pages (now in rst): We don't have very many quick examples of how to run a module, and when we do (other than for modules that don't take input) the examples cannot run since the input data are imaginary and do in fact not exist. We also sometimes get comments on GMT lacking any useful data for even basic stuff, such as making a topography map. As a cure to these issues and especially for making our documentation more dynamic, I have implemented a new GMT Defaults setting called GMT_AUTO_DOWNLOAD. When set to on (which is the default) the following will been enabled:

  1. If a GMT module references a file called earth_relief_<res>.grd then, if not already present, the file will be downloaded for you and placed in your ~/.gmt data directory. [At the moment, res is 30s, 1m, 2m, 5m, 10m, 30m, and 60m].
  2. If a GMT module references files called @filename then, if not already present, the files will be downloaded for you and placed in your ~/.gmt/cache subdirectory.
  3. If a GMT module references a full URL file, if not already present, the files will be downloaded for you and placed in your ~/.gmt/cache subdirectory.

Item (1) enables automatic access to the only truly general-purpose datasets I could think of that are useful to as many GMT users as possoble. Making a color map of topography/bathymetry is probably a common denominator to all, right up there with a coastline map. Item (2) would allow us to say that any command line example in the documentation can be copied and pasted to the user's command line and run, and if a dataset is involved it will be downloaded (once). This would replace any previous plans we had for a test data directory to be downloaded manually. Item (3) allows us to download (once) any file accessible via an URL.

Technical Details

  1. The Earth Relief grids are derived from the SRTM30+ grid (30s) and the rest are either ETOPO1m, ETOPO2m, or spherically grdfilter'ed version of larger grid size. These grids are stored as short ints chuncked netCDF 4 grids, with file sizes 778 Mb, 214 Mb, 51 Mb, 11 Mb, 2.9 Mb, 0.35 Mb and 110 kb, respectively.
  2. The auto-download decision is implemented as a function called at the lowest level and thus affect all input. If GMT_AUTO_DOWNLOAD is on then we check if the file already exist. If it does then the function returns. We then check if the filename matches the two formats given above. If it does then we attempt to download the data via functions in libcurl (there are no system calls to curl).
  3. We automatically create a demo directory in ~/.gmt (i.e., GMT_USERDIR) if none exists.

Discussion items

  1. I think earth_relief_*.grd are good names for these data. I initially had a gmt_ prefix as well to make them even more distinct, but I don't think that is needed.
  2. Some examples actually use a UNIX tool on a data set (e.g., awk, grep) and those tools will not download anything, obviously. For those cases we would need to obtain the file. I have modified gmtwhich to take a new option that means "download this file if it cannot be found here". E.g., "gmt which -G demo_junk.txt" could get that file from the server if not present locally; it then returns the path to that file.
  3. Currently, our svn repository is full of odd data files used by test scripts and examples. These would all be renamed to the @* scheme and placed on the SOEST ftp server' GMT cache directory. It will shrink our svn footprint by 40 Mb now and much more when the test directory expands to 10,000 scripts. Most people using svn are not building and running tests, and those who do will obtain those files when they do, once.

Comments are encouraged!