Feature #523
Improve speed when reading netCDF table data (no grids)
Status: | Closed | Start date: | 2014-02-24 | |
---|---|---|---|---|
Priority: | High | Due date: | ||
Assignee: | % Done: | 100% | ||
Category: | - | |||
Target version: | Candidate for next minor release | |||
Platform: |
Description
Because GMT (a) processes table data record-by-record and (b) can do this across ASCII, native binary, and netCDF tables, it is considerably slower to read netCDF tables because only a single row is being accessed at the time. A rewrite would take advantage of the new GMT→hidden.mem_coord[col] arrays and read in entire columns (or as many rows as the preallocated memory can hold, then read the next section of rows when required). The rec-by-rec machinery can then "read" from these columns whose initial read will be much much faster since entire columns are processed.
History
#1
Updated by Paul over 6 years ago
- Status changed from New to Resolved
- Assignee set to Paul
- % Done changed from 0 to 100
Implemented in 5.2 branch, r13341. May need some more testing.
#2
Updated by Remko over 6 years ago
- Status changed from Resolved to Closed
Paul wrote:
(or as many rows as the preallocated memory can hold, then read the next section of rows when required).
In fact, GMT_prep_tmp_arrays
will always allocate at least as many rows as the major dimension of the netCDF file that is to be read.
To be blunt, GMT_prep_tmp_arrays
should be given a limit on what it can allocate in the line:
while (row >= GMT->hidden.mem_rows) GMT->hidden.mem_rows <<= 1; /* Double up until enough */
Currently
mem_rows
can overflow or grow to hideous proportions.
I tested the current implementation on a reasonably sized file and was pleased with the increased performance. Hence I close this issue.