GMT Work Flows

Introduction

GMT started out as a UNIX toolset but was early on used on Windows via BAT scripts as well. Recently we have developed an API for MATLAB/Octave, an experimental Julia API is being tested, and we are funded to design the Python module for GMT. One fundamental problem is the different ways work flows are written in the different environments. These differences follow largely from OS differences but also design differences and the strengths and weaknesses of the external environments. For testing purposes it is important to have large quantities of test scripts to catch as many problems as possible before releases are announced. Our goal is to increase such scripts by a magnitude. To address these problems I propose we design a new macro language called GMT Work Flow (GWF). With this language one may write a single workflow stored in a file called script.gwf. Now, several things could happen. We may need to run this script on the UNIX command line, the Windows command line, or within MATLAB, Octave, Julia, and Python, with others to come. Imagine running

gmt workflow -Fclassic script.gwf > gmt_classic.sh
gmt workflow -Fmodern script.gwf  > gmt_modern.sh
gmt workflow -Fbat script.gwf     > gmt_windows.bat
gmt workflow -Fmex script.gwf     > gmt_mex.m
gmt workflow -Fjulia script.gwf   > gmt_julia.jl
gmt workflow -Fpython script.gwf  > gmt_python.py

and these scripts could be run in the corresponding environments and produce the same results. The benefits are numerous:

  1. We would only maintain one set of work flows (i.e., examples, tests) from which we auto-derive scripts for specific environments. This alone would save tons of work.
  2. GWF would define a standard and provide a structure for how portable GMT workflows should be designed by users.
  3. Users on different platforms and environments could collaborate seamlessly by exchanging GWF scripts.
  4. GFW would lower the barriers to scripting since details (e.g., OS and redirection) would not be required to write scripts.
  5. Having a defined GFW language might inform how we approach designing external APIs and how they should work.

One can also imagine running the workflow directly via

gmt run script.gwf        # UNIX
gmt ('run script.gwf');   % MATLAB
gmt.run ('script.gfw);    # Python

A possible standard

Not much thought has gone into this yet but it is easier to tear apart a proposal than to build one from scratch. So here is one possible model. We would name all (or most) of the common command options:

-B: frame='auto|off|frame'
-J: projection='proj'  width='width or scale='scale'
-R: region='region|world|exact|auto'
-U: stamp='on|off|command|string'
-V: verbose='quiet|normal|verbose|debug'
-X -Y: position=x/y
-i: select='columns'
-o: report='columns'

etc., etc.

For module-specific options we would still try to find common ground. For instance, concepts like fill, pen, increment, input, and output are common to most modules, so perhaps we can define

-G: fill='fill'
-W: pen='pen'
-I: increment=dx/dy
-G<grid>: output='file'
->file: output='file'
-<file: input='file'

A simple example of a GWF script (my_map.gwf) may be written like this:

gwf begin product=PDF name=my_map
gwf pscoast region=world projection=Hammer width=9i frame=auto fill=brown pen=faint
gwf psxy input=mylines.txt,yourlines.txt pen=1p,red symbol=line
gwf end

Here, the first line tells us to start a new workflow (which removes past gmt.history files), what the product is (here a plot, since we stated PDF, in contrast to a grid, for instance) and what its name stem should be. Then we overlay some lines on a global map and finally produce the PDF via an implicit psconvert call when gwf end is called.

Targeted commands

Unfortunately, GWF cannot solve all the world's problems. For instance, someone writing a script may need to mix GMT use with capabilities offered by the calling environment. This could be a sorting task in Unix or a built-in function in MATLAB. To allow this capability we would need gwf commands that are platform specific. As a (not so well thought-out) example, consider

...
gwf unix sort -k 2 -g -r raw_file.txt > sorted_file
gwf mex A = gmt ('read -Td raw_file.txt');
gwf mex B = sort (A);
gwf mex gmt ('write sorted_file.txt', A);
gwf psxy input=sorted_file.txt symbol=circle size=0.2c fill=red pen=faint
...

Alternatively, we let GWF be restrictive and only allow for things that are portable across all environments. Between options -i, -o (can do simple awk-like transactions), -e (basic grepping) and gmt convert -N (sorting) we can do quite a bit within gmt.

Input and Output

A key concept remaining to be abstracted is the input and output to memory for some environments. Perhaps we need two flavors of the input and output statements, e.g., m_input and m_output for memory. Then, we can use m_input to indicate an input source and if under MATLAB or Python these refers to names of variables while if processed for UNIX output then they become files (i.e., m_input is the same as input), and likewise for output.