Feature #523

Improve speed when reading netCDF table data (no grids)

Added by Paul over 3 years ago. Updated about 3 years ago.

Status:ClosedStart date:2014-02-24
Priority:HighDue date:
Assignee:Paul% Done:

100%

Category:-
Target version:Candidate for next minor release
Platform:

Description

Because GMT (a) processes table data record-by-record and (b) can do this across ASCII, native binary, and netCDF tables, it is considerably slower to read netCDF tables because only a single row is being accessed at the time. A rewrite would take advantage of the new GMT→hidden.mem_coord[col] arrays and read in entire columns (or as many rows as the preallocated memory can hold, then read the next section of rows when required). The rec-by-rec machinery can then "read" from these columns whose initial read will be much much faster since entire columns are processed.

History

#1 Updated by Paul about 3 years ago

  • Status changed from New to Resolved
  • Assignee set to Paul
  • % Done changed from 0 to 100

Implemented in 5.2 branch, r13341. May need some more testing.

#2 Updated by Remko about 3 years ago

  • Status changed from Resolved to Closed

Paul wrote:

(or as many rows as the preallocated memory can hold, then read the next section of rows when required).

In fact, GMT_prep_tmp_arrays will always allocate at least as many rows as the major dimension of the netCDF file that is to be read.
To be blunt, GMT_prep_tmp_arrays should be given a limit on what it can allocate in the line:

while (row >= GMT->hidden.mem_rows) GMT->hidden.mem_rows <<= 1; /* Double up until enough */

Currently mem_rows can overflow or grow to hideous proportions.

I tested the current implementation on a reasonably sized file and was pleased with the increased performance. Hence I close this issue.

Also available in: Atom PDF