GMT Modern mode, 2.0

Added by Paul over 1 year ago

Given our experimentation with the earlier demo modern mode and learning about its benefits and remaining shortcomings, here is an updated proposal 2.0:

Modern mode initiation

  1. A modern gmt session starts with the command gmt begin. This step accomplishes several tasks:
    1. Obtains the parent process ID (PPID).
    2. Creates a subdirectory gmt5.PPID in the current temp dir (OS-dependent) where all hidden GMT files are written.
    3. Writes a clean gmt.conf to this directory; no gmt.history exists at this point.
    4. Enables modern mode; this is the only way to enable this mode from the command line.
    5. PostScript plots are written to file gmt.ps0 in /tmp/gmt5.PPID.
    6. All interactions with gmt.conf and gmt.history take place in this unique directory.
  2. A modern session ends with gmt end, which removes /tmp/gmt5.PPID.
  3. All gmt calls bracketed by begin - end will know they are running in modern mode by the presence of /tmp/gmt5.PPID. There will be no GMT_RUNMODE gmt.conf setting to misuse.

The use of the session-specific temporary directory serves the same purpose as the classic isolation mode but transcends a particular OS [The implementation of isolation mode was strictly UNIX], is always enabled and does not require the user to do anything extra (beyond using gmt begin - end).

Rules regarding plot construction (-O -K)

These remain unrecognized options under modern mode. As before, the first plotter is identified by the lack of a gmt.ps0 file, while subsequent plotters will append to this existing file. The plot is finalized when gmt psconvert is called, at which point an output filename is required.

Rules regarding the region setting (-R)

Modern mode recognizes that the concept of "region" has always had two meanings in GMT:

  1. Specifying the map/plot domain. Apart from being a valid region, there are no further requirement on the values of the domain limits.
  2. Specifying the domain and organization of an equidistant grid in concert with conforming increments and node registration.

Yet, in classic mode there is no distinction between these meanings as far as GMT history goes. This will change in modern mode. For data processing modules that define and create grids the -R rules regarding the history are as follows:

  1. The first data-processing module must specify the grid region, its increment(s), and optionally node registration. These three settings will be stored in gmt.history's new -RG slot (for grid domain).
  2. Subsequent data processing will default to the history in -RG unless overridden on the command line. Any or all of the three components can be overridden on the command line and -RG will be updated accordingly.

Next, here are the -R rules for plot-producing modules:

  1. A new plot will require a complete -R specification on the command line. This setting will be stored in gmt.history's new -RP slot (for plot region).
  2. Subsequent plot overlays will default to the history in -RP unless overridden on the command line (which also resets the history of -RP).
  3. Two new flavors of -R will be available to plot modules:
    1. The -Re (for exact) option will determine a tight region from the input data (tables or grids).
    2. The -Ra (for auto) will take the region from -Re and round it outwards using a suitable increment.

We have had something like an implicit -Re in classic mode, for instance when a plotting module takes its plot region from the grid domain. Now this is explicit and extends to all data sets. The -Ra mode is new and will allow generation of maps with reasonable boundaries much easier.

Benefits

  1. The use of the gmt5.PPID temporary directory extends the benefit of isolation mode to all of GMT, across all platforms.
  2. Plot modules gain new capabilities for selecting the plot region with -Re and -Ra.
  3. The gmt begin and end pair encapsulates a single GMT workflow [http://gmt.soest.hawaii.edu/projects/gmt/wiki/GWF] and greatly reduces the potential of rouge gmt history values. It provides needed structure for new (and old) GMT users.
  4. The region history of grid generation can inform about plot regions but not the other way around.
  5. There can be no confusion between classic and modern mode since users cannot inadvertently switch to modern by messing with gmt.conf.

Replies (22)

RE: GMT Modern mode, 2.0 - Added by Remko over 1 year ago

Suggestions for modification and questions

  1. I would suggest /tmp/gmt.PPID instead of /tmp/PPID ... this makes it easier to find those temporary directories in case they are left behind, and avoids clobbering by other programs that may have created /tmp/PID in the parent script.
  2. What would the temporary directory be named when calling the API from a program? In that case it should be /tmp/gmt.PID (the PID of the program), not?
  3. We could potentially create PS files named gmt_000.ps, gmt_001.ps, etc., for the different layers. This makes it faster to change the header (only modification to gmt_000.ps). Of course, all gmt_???.ps need to be piped to ghostscript, which may not be trivial to do in an OS-independent way.

RE: GMT Modern mode, 2.0 - Added by Paul over 1 year ago

  1. Excellent, agreed, and updated.
  2. Probably. Getting the PPID can be tricky in that case (think Matlab) since there may be many threads and PPID can change. So yes, likely PID. We will lean more when trying that.

RE: GMT Modern mode, 2.0 - Added by Paul over 1 year ago

I swear there was no 3 earlier!
3. I fail to see the utility of this. Remember, modern mode adds restrictions to simplify life and we are not allowing people to look under the hood to pull out a particular layer. That being said, we did discuss at one point the ability to save a partially baked PostScript cake so that a plotting sequence could jump-start to that point without having to rerun lots of GMT commands (i.e., a case where many plots share the same tedious background map, made once). That capability I think we can accommodate with a special psconvert option to "get" and "put" the half-baked cake in/out of the temp directory.

RE: GMT Modern mode, 2.0 - Added by Remko over 1 year ago

There was no no 3 before, indeed. I added it later.
Fair enough, so forget about it (I did a strike-through on that one).

RE: GMT Modern mode, 2.0 - Added by Remko over 1 year ago

But then, one more thing:

4. Instead of /tmp/, it would be better to use $TMPDIR/, which is defined on some OSes differently from /tmp. If not defined, then default to /tmp/

RE: GMT Modern mode, 2.0 - Added by Paul over 1 year ago

Yes, we actually use API→tmpdir that is set depending on OS, etc. I was just using /tmp as a shorthand.

RE: GMT Modern mode, 2.0 - Added by Leonardo over 1 year ago

Hi Paul, this is great!
So the Python side will also need to start with a "gmt.begin()" and "gmt.end()" right?
We could get rid of "gmt.show()" by patching our "gmt.end()" to pull up a png and return it to the notebook.
Just to be clear, there is still need for "gmt psconvert" before "end", correct?

RE: GMT Modern mode, 2.0 - Added by Leonardo over 1 year ago

Actually, we could even to away with "begin" and "end" altogether in Python.
That could be replaced by a context manager (https://docs.python.org/3/reference/datamodel.html#context-managers):

with gmt.figure():
gmt.pscoast(...)
...
gmt.psconvert(...)

The "with" statement will handle calling "begin" and "end" when entering and exiting.

RE: GMT Modern mode, 2.0 - Added by Paul over 1 year ago

Yes, but remember not everything may be resulting in a figure. Some people may be making a grid only, so we should not think of begin/end to necessarily be graphics related. They just encapsulate a specific workflow I think.

RE: GMT Modern mode, 2.0 - Added by Leonardo over 1 year ago

Right, for figure generation we could use "gmt.figure()" (which need to produce a figure for the notebook in the end) but if not, then "with gmt.begin()" could be used (without needing a gmt.end() later).

RE: GMT Modern mode, 2.0 - Added by Paul over 1 year ago

That seems reasonable.

RE: GMT Modern mode, 2.0 - Added by Andreas 7 months ago

After playing with GMT6's modern mode since July. This is what my /tmp looks like:

/tmp$ ls | grep gmt
gmt6.10630
gmt6.1086
gmt6.1141
gmt6.12275
gmt6.1338
gmt6.14111
gmt6.161
gmt6.212
gmt6.213
gmt6.2282
gmt6.236
gmt6.241
gmt6.254
gmt6.265
gmt6.272
gmt6.285
gmt6.304
gmt6.316
gmt6.355
gmt6.36
gmt6.39
gmt6.4078
gmt6.4097
gmt6.43
gmt6.505
gmt6.5756
gmt6.620
gmt6.651
gmt6.6581
gmt6.6587
gmt6.6594
gmt6.694
gmt6.72
gmt6.74
gmt6.8435
gmt6.9747
gmt6.9911
gmt6.9993

A bunch of 'dead' instances, because of crashes, mistakes in script etc. Since the (P)PID is not very unique (after all, it's just a number), I've experienced several cases where a GMT modern instance thinks it already has a process going on and gets confused (since the current (P)PID was identical to a now dead (P)PID with an associated gmt6.(P)PID folder). I mentioned this in an issue post, https://gmt.soest.hawaii.edu/issues/1167#note-1 and https://gmt.soest.hawaii.edu/issues/1167#note-4. I'm on Ubuntu bash for Windows 10, which I guess (still) is kind of an oddball case. Nonetheless, the issue is applicable(?)

Would an idea be to concatenate a bunch of stuff (e.g. (P)PID+user+date+time), and run it through a hash function? That way the folder name of the modern mode instance would for sure be unique.

RE: GMT Modern mode, 2.0 - Added by Joaquim 7 months ago

Hmm, a bit puzzled with this. PPID should be a quite unique number (but maybe we are wrong) and contrary to what Paul says Windows gives unique PPIDS. It's the unix imitators on windows who are not able to do that.
And mind you that Bash for Win is a single core system.

RE: GMT Modern mode, 2.0 - Added by Andreas 7 months ago

I am not an expert in this, so I may be wrong!

RE: GMT Modern mode, 2.0 - Added by Remko 7 months ago

This does not have to do with unique PPIDs as Joaquim refers to. Andreas mentions that these are temporary directories left behind from crashed or failed processes (i.e. those that did not make it to "gmt end").
The solution would be to get unique temporary directory names, like the Unix command mktemp -d does.

RE: GMT Modern mode, 2.0 - Added by Paul 7 months ago

Leo and I have been discussing this as well. We will be working on a few remedies:

  1. We will replace TMPDIR with a new GMT_WORKDIR (or similar name) default setting directory which will default to ~/.gmt/sessions. The reason for this is that /tmp is not really a safe directory (it is not owned by the user) and very long-lived GMT jobs might find their workdir removed by the systems garage collectors.
  2. Currently, we have not made any special effort to remove directory if gmt end fails to convert. I still need to do some more diagnostics but I believe gmt end fails to convert flawed PS files due to earlier errors and then it simply exits without removing the temp directory. However, gmt end should always remove that directory since there is nothing to salvage if the PS is rotten anyway. Furthermore, any data files created during the session are placed in the users working directory and not in the hidden PPID directory.
  3. We will add a new item to the gmt clea module which will remove all sessions. This will either require answering a "Are you sure" question or take a -F force option. This is needed because user scripts may set -e to exit upon first error and then gmt end is not even run.

RE: GMT Modern mode, 2.0 - Added by Leonardo 7 months ago

Using mktemp probably wouldn't work because each program between begin and end needs to find the exact same working directory. So it needs to be a common thing between all commands running in a given script or shell. That's where the PPID idea came in. This also applies to Andreas' idea of hashing the time stamp.

RE: GMT Modern mode, 2.0 - Added by Remko 7 months ago

Leonardo, you are right. mktemp would indeed not work. So the solution would be have gmt begin remove gmt6.$PPID when it already exists.
Note, that on top of the possibility that Paul mentioned (gmt end fails), a script may fail or be killed before it gets to gmt end.

RE: GMT Modern mode, 2.0 - Added by Remko 7 months ago

A few other thoughts:

  1. On Mac OS $TMPDIR is actually owned by the user. And I think it is actually good that it is pruned regularly, otherwise files will just accumulate.
  2. Of course, a user could set $GMT_TMPDIR (my preference) to $TMPDIR
  3. Instead of creating a directory gmt6.$PPID, we could create a directory gmt6.###### (where ###### is some hash) in $GMT_TMPDIR and add a link ~/.gmt/sessions/gmt6.$PPID to that directory. gmt end would remove both the directory and the link, but gmt begin would only remove the link gmt6.$PPID if it exists. Then the gmt6.###### can still be viewed.

RE: GMT Modern mode, 2.0 - Added by Paul 6 months ago

We certainly cannot guarantee that there won't be gmt6.PPID directories left in tmp, whatever the tmp dir is, for the reasons given above. That means we should have a gmt clear sessions command to help deal with this. We could recycle $GMT_TMPDIR as a way to select the directory for sessions. The main arguments I have against that are (a) we have GMT defaults settings for DIR_CACHE, DIR_DATA, DIR_GSHHG, DIR_DCW, so it goes a bit against the grain to use environmental parameters for this directory, (b) GMT_TMPDIR is already described in the context of isolation mode for classic GMT, so it complicates documentation to have it mean two different (but related things). In modern mode, all sessions are isolated, c) setting environmental parameters is a UNIX thing more than a Windows thing, I believe, but it certainly can be done, d) links is certainly more a Unix thing than Windows. Given all that, I think the bets course of action is

  1. Introduce DIR_SESSION with a default of /tmp. This is where sessions PPID directories are created.
  2. Let gmt begin be hard-ass and delete any existing gmt6.PPID which are 99% of the time the result of a failed gmt modern script started in the same terminal. It is too annoying to get warnings or errors about this.
  3. Let gmt end be ruthless in deleting the session dir, regardless of session success.
  4. Add sessions as a new target in gmt clear.

This keeps things simple and after implementing it (I did 3 already) we can see how it goes for a while before any more elaborate scheme is considered.

RE: GMT Modern mode, 2.0 - Added by Remko 6 months ago

Regarding the points above
  1. I think the default for DIR_SESSION should be $TMPDIR (if set), otherwise /tmp.
  2. Yes, when gmt6.$PPID already exists, gmt begin should remove it silently.
  3. Agreed.
  4. Agreed.

RE: GMT Modern mode, 2.0 - Added by Paul 6 months ago

Turned out a few things needed to change. We cannot use a GMT defaults like DIR_SESSION since we cannot have people doing gmt set DIR_SESSION <newdir> in mid-stream. While we could catch these under modern sessions it defeats the purpose of making it a GMT default. Given most people will not need or want to change our default for this directory I have decided to go back to an optional environmental parameter. I do not want to use TMPDIR for this since we use that already to place temporary files created by various modules, regardless of classic vs modern, and having other files being written in the sessions dir complicates cleanup. Thus, I ended up implementing the session dir via $DIR_GMTSESSION, with default to the users ~/.gmt/sessions directory. In addition, I make sure gmt end removes the current session dir, and added gmt clear sessions and tested it. The gmt begin already does quietly delete and recreate a directory that already exist, but will complain if the directory is a file or if it cannot be read. See what you think. r19939.

(1-22/22)