NuSDaS User's Guide

$Date: 2003/03/07 08:42:43 $ TOYODA Eizi, NPD/JMA

NuSDaS (Numerical Prediction Standard Dataset System) is a data I/O library for meteorological gridded data developed at Numerical Prediction Division, Japan Meteorological Agency.

This document describes the concept, data file structure, and definition file format. Consult Reference Manual about usage of interface in C and Fortran.

Notation Note

Introduction

NuSDaS is designed to store gridded data in NWP (numerical weather prediction). It classifies various data and stores them into structured directory. Since it is not multi-purpose I/O library (like netCDF or HDF) nor database management system, the interface is specialized for use of NWP.

Data Model

In NuSDaS, all data consists of records, a two-dimensional array of numbers. The number may be integer or floating-point number *1. In most cases, a record corresponds to grids on horizontal plane. They are identified by following dimensions (see description in Reference Manual for detailed computational expression):

type

A 16-character *2 string that identifies dataset.

base time

Analysis time, initial time of forecast, or map time of observation.

member

Member of ensemble forecast.

valid time

Time to which forecast or observation is related to. Data for time span (such as average or accumulation) is identified using a pair of valid times.

plane

Location of two-dimensional grids represented by record. For example, "SURF△△" and "500△△△" denotes surface and 500hPa plane, respectively. In most cases, horizontal grids are identified by vertical coordinate. Data for layer is identified using a pair of plane.

element

Physical quantity name. For example, "T△△△△△" and "P△△△△△" denotes temperature and pressure, respectively.

The elements of record identifier is called `dimension', since there is a metaphor that associates all accessible data to single huge array. However, storage design based on simple array would be inefficient for these reasons:

Thus records are designed to be stored in a particular file in a particular directory determined by the data identifier. (The rule is described below.)

Outlook

Extension like following is expected:

Data File Structure

NRD and Directory Structure

All data files are located in directories called NuSDaS root directory (hereafter called as NRD). NRD has following structure:

The NuSDaS interface searches for NRD as directories named NUSDASnn, where nn is a numbers from 01 to 99. In operational suites of JMA, NRD is firstly named in lowercase letters ending with .nus (such as gf_fcst_p.nus) by conventions, and symbolic links to it (say NUSDAS30) is created.

A NRD may have many definition files. A definition file corresponds to a particular data type. It describes subdirectory structure and can refer many data files. Therefore there is a inclusion relation:

record ∈ data file ∈ definition file = data type ∈ NRD

Records are grouped in files by following rules:

File Formats

There is two formats for data files: v1.0 File Format and ES File Format. ES (Extended Storage) is a RAM filesystem in HITACHI SR8000 system, and the ES File Format is used only in that machine.

v1.0 File Format

A NuSDaS data file of v1.0 format is a sequence of records. Each record has a common structure shown in Table 1. All fields noted as 'integer' is written in the big-endian byte order. Note that it is similar but DIFFERENT *3 to sequential unformatted file of Fortran.

Possible value of the 'kind of record' field is as follows:

NUSD

Record of this kind is the first mandatory record of data file. It contains metadata for system administration purpose. Payload of NUSD Record is described in Table 2.

CNTL

Record of this kind is the second mandatory record of data file. It contains metadata commonly defined for all NuSDaS datasets such as identification (data type and base time), internal structure (list of member, valid time, plane and element), and grid/geometry structure. Payload of CNTL Record is described in Table 3.

INDX

Record of this kind is the third mandatory record of data file. The payload of the record is an array of 32bit integer, which can be interpreted as byte position of DATA records. Indices of the array is member, valid time, plane, and element: i-th (for simplicity, let us use C style index starting with 0) element of it gives position of a DATA record for e-th element, p-th plane, v-th valid time and m-th member; where i = (e + E * (p + P * (v + V * m))), E the total number of elements, P the total number of planes, and V the total number of valid times.

SUBC

Record of this kind is optional: there may be no SUBC record, or even many SUBC records. They contains various metadata including vertical grid information, time integration span, and radar operation status. Format of the record is determined by a 4-character field (called 'group name') at byte offset 16. See Record Format for detail of SUBC records.

INFO

Record of this kind is optional: there may be no INFO record, or even many INFO records. They contains user-defined metadata. Four bytes from byte offset 16 in the record is called 'group name' and reserved for classification of INFO records. The rest of the record is not defined.

DATA

These records contain two-dimensional array data. Eight byte characters from byte offset 56 specifies encoding scheme. See Record Format for detail of DATA records.

END△

Record of this kind is the last mandatory record of data file.

ES File Format

ES is a memory-based filesystem of HITACHI SR8000. Since it has special I/O interface, the NuSDaS interface had to support it separately. The file format for ES is largely different. It is direct access file including fixed length records, and has no NUSD, INDX, nor END records. Just one SUBC and one INFO must exist at the beginning of file.

I am not sure about ES for further detail.

Definition File Format

NuSDaS definition file is a plain text file that describes structure of NuSDaS dataset. The definition file looks like free format. More precisely, the file is interpreted line by line. A Line starting with keyword (listed below, case insensitive) starts statement. Following lines without keyword at the top of themselves are continued lines and interpreted as one statement with starting line.

Statements can be omitted, unless noted 'mandatory'. There is a limitation in order of the statements. Since they are not (and cannot be easily) documented, the author recommends to describe statements in order of following description.

nusdas version

version of NuSDaS. If not omitted, it must be 1.0. In future versions of NuSDaS, there may be incompatible extension to the definition file, and this version will describe what version of NuSDaS you are using.

path ...

specifies the directory at which data files will be located. It is relative path from NRD. One of following syntax list is used for words.

path relative_path template

The relative path will be template. See Pathname Expansion for special symbols. By default this style is assumed and ``/_model/_attribute/_space/_time/_name'' as template.

path nwp_path_s

Equivalent to statements ``path relative_path /_3d_name'' and ``filename _validtime''.

path nwp_path_vm

Equivalent to statements ``path relative_path /_3d_name'' and ``filename _member''.

path nwp_path_m

Equivalent to statements ``path relative_path /_3d_name/_member'' and ``filename _validtime''.

path nwp_path_bs

Equivalent to statements ``path relative_path /_3d_name/_basetime'' and ``filename _validtime''.

path nwp_esf

In this special case, internal file I/O will be done through ES interface, not by standard C library.

filename filename

Name of data file will be filename. See Pathname Expansion for special symbols. By default, _basename is assumed.

creator creator

Specifies information on creator of the data. It will written in NUSD record after prepending user name and host name.

type1 _model _2d _3d

This statement cannot be omitted. Word _model is four name characters (alphabet, number, and underline) representing model name or creation process. Word _2d is two name characters representing horizontal grid name. Word _3d is two name characters representing vertical grid name. See Reference Manual for table of possible values.

type2 _attribute _time

This statement cannot be omitted. Word _attribute is two name characters representing data attribute. Word _time is two name characters representing time attribute. See Reference Manual for table of possible values.

type3 _name

This statement cannot be omitted. Word _name is four name characters. You can use arbitrary name for this field; it does not affect behavior of library nor conventional meaning. Name "STD1" is used for the most typical operational dataset.

member n_dc inout

Word n_dc is number of members (1 assumed by default). When inout is in, records for different members are stored in one file, and when inout is out, records for different members are stored in separated files.

memberlist member member ...

lists up members.

basetime YYYYmmddHHMM

This statement is omitted in most cases. It specifies base time. Format of YYYYmmddHHMM is same as %Y%m%d%H%M in UNIX date(1) or strftime(3).

validtime n_vt inout unit

This statement cannot be omitted. This specifies number of valid times n_vt and unit, units of numbers in following validtime1 and validtime2 statements. Word unit should be one of min, hour, day, pen, mon, week, jun. When inout is in, records for different valid times are stored in one file, and when inout is out, records for different valid times are stored in separated files.

validtime1 arithmetic initial step
validtime1 all_list vt1 vt2 vt3 ...

This statement cannot be omitted. At least and just one of above two formes should appear. This statement specifies list of the first part of valid time, called valid1 in Reference Manual. When the second word is arithmetic, the valid1 is an arithmetical series with specified initial and step value. When the second word is all_list, following words are interpreted as list of valid times. Usually the list is written in ascending order. All of the arguments initial, step, vt1, ... are in units declared in previous validtime statement.

validtime2 ft1 ft2 ft3 ...
validtime2 -dt

At least and just one of above two formes should appear. This statement specifies list of the second part of valid time, called valid2 in Reference Manual. When the former form is used, the list of valid2 will be (vt1 + ft1), (vt2 + ft2), (vt3 + ft3), and so on. Usually the list is written in ascending order. When the latter form is used, the list of valid2 will be (vt1 + dt), (vt2 + dt), (vt3 + dt), and so on. All of the arguments dt, ft1, ft2, ... are in units declared in previous validtime statement. If this statement is omitted, the special value -1 is assumed as valid2.

plane n_lv

This statement cannot be omitted. Specifies the number of planes.

plane1 name name name ...

This statement cannot be omitted. Specifies the list of first plane. The list should have n_lv items. Usually the list is written in ascending order in height. It looks like descending order if pressure coordinate is used, (e.g. SURF 1000 950 900 ...).

plane2 name name name ...

Specifies the list of second plane. The list should have n_lv items. If this statement is omitted, the same list to that in plane1 is assumed.

element n_el

This statement cannot be omitted. Specifies the number of elements.

elementmap elemname elementmap

This statement cannot be omitted, and will appear n_el times. It describes where is the element elemname allowed to write. See section Elementmap for detail.

size nx ny

This statement cannot be omitted, It indicates that the number of grid points is nx in X direction, and ny in Y direction. In most cases X is taken eastward and Y northward, although that is dependent to what coordinate system (_2d in type1 statement) you use.

basepoint ix iy lon lat

This statement indicates that the location of grid numbered (ix, iy) is positioned (lon, lat). Both of ix, iy must be real number, lon must be real number with 'E' or 'W' appended, lat must be real number with 'N' or 'S' appended, Note that this statement is used with the geographical meaning shown above even if the 2D grid is taken vertically. In order to describe vertical grid point locations, SUBC record might be used.

distance dx dy

Indicates horizontal distance (in X and Y directions) between adjacent grid points. The units is degree when the grids is latitude-longitude grids, and is meter when map projection is applied. When the 2D grid is taken vertically, one of dx, dy shall be ignored. Note that the meridional grid distance dy is taken southward. It is positive in most JMA models: grid points with the smallest Y index are located at the northern end of 2D grid. On the contrary, if dy is negative, grid points with the smallest Y index are located at the southern end of 2D grid.

standard lon lat lon2 lat2

Specifies standard longitude/latitude. They are parameters of map projection, and only a part of them is used in some cases. It is dependent to horizontal grid style whether this statement is required or not. See following description of others.

others lon3 lat3 lon4 lat4

Specifies 3rd or 4th longitude/latitude. Meaning of parameters is dependent to projection. It is also dependent to horizontal grid style whether this statement is required or not.

LM

The Lambert conformal projection has 3 parameters; use "standard LoV Latin1 LoV Latin2", where LoV is Y-axis longitude, and Latin1 and Latin2 is the first/second latitude where the secant cone cuts the earth. In most cases of JMA, it looks like

standard 140.0E 30.0N 140.0E 60.0N
PS

The polar stereographic projection has 2 parameters; use "standard LoV LaD 0E 0N", where LoV is Y-axis longitude, and LaD is the latitude where grid point distance is defined. In most cases of JMA, it looks like

standard 140.0E 30.0N 0E 0N
MR

The Mercator projection has one parameter; use "standard 0E LaD 0E 0N", where LaD is the latitude where grid point distance is defined.

OL

The Lambert conformal projection has 3 parameters; use "standard LoV Latin1 LoV Latin2" and "others LoP LaP RotAngE 0N", where LoV is Y-axis longitude, (Latin1, Latin2) is the first/second latitude where the secant cone cuts the earth, (LoP, LaP) is longitude/latitnude of the projection southern pole, and RotAng is the angle of rotation after projection. Unfortunately, the practice in JMA has been failed to write this parameter properly and you may have data with zero-filled corresponding fields (as for 2003-03-07).

other horizontal grids

Since there is no projection parameters, standard or others statements should not be written.

value representation

Describes how gridded data represents field. Word representation should be one of them:

PVAL

values at grid point. This is the default.

MEAN

average over volume/area around grid point

REPR

representative value obtained with another method

packing pack_mode

Describes encoding scheme to be used in DATA record. See Reference Manual for table of possible values. By default, 2PAC is assumed.

missing miss_mode

Describes how missing value is to be represented. Word miss_mode should be one of them:

NONE

There is no method for missing value in this case. This is the default.

UDFV

A certain value is missing value, and grids with the value should be regarded missing.

MASK

Grid points with valid data are indicated with bitmap for each DATA record. See NUSDAS_MAKE_MASK() in Reference Manual for detail.

information group filename

If the definition file has this statement, INFO record will be written at the time of data file creation. It can be stated as many as needed. Size and contents of the INFO record will be that of file specified with a relative path filename. Word group should be a four-character name that identifies the INFO record.

subcntl num group size group size ...

If the definition file has this statement, SUBC record is allocated at the time of data file creation. Each SUBC record is secified with a pair of group (four-character name that identifies the SUBC record) and size (size of the SUBC record). Word num specifies the number of group-size pairs.

forcedlen size

This statement is required if you use ES interface. If the definition file has this statement, each records in data file will have size bytes. Padding of (size - (payload size)) bytes is used after record payload. Error occurs if a record exceeds the specified size. By default, records are aligned contiguously (without padding between record payload and 4-byte record trailer).

Pathname Expansion

Pathname of data file is determined by path and filename statements in the definition file, after substitution of following keywords to values of data identifier.

Table 5: Keywords of Pathname Expansion
keywordmeaning
_modelmodel name, first 4 characters of type1
_2d2D grid structure, 5th and 6th characters of type1
_3d2D grid positioning, 7th and 8th characters of type1
_attibutefirst two characters of type2
_timetime attribute, last two characters of type2
_nametype3
_spaceequivalent to '_2d_3d'
_basebase time
_validvalid time
_membermember

Note that plane and element is not used in pathname expansion, since they cannot 'split' file. Similarly, using '_valid' or '_member' will cause malfunction if you declare 'valid ... in' or 'member ... in' respectively. On the other hand, if you declare 'valid ... out' or 'member ... out', you must use '_valid' or '_member' respectively in path or filename statements; otherwise data files for different valid times or members will collide (have same names and may cause malfunction).

Elementmap

Elementmap defines whether a certain element is allowed or not for certain combination of member, valid time, and plane. To understand elementmap, first think of a bitmap of size M * V * P (or Fortran logical array with DIMENSION(P, V, M)), where M, V, P are total number of members, valid times, and planes. For each bit, '1' declares that the element is allowed, and '0' does oppositely. Elementmap written in the definition file is the bitmap in a kind of run-length-encoding (RLE) compression.

The syntax of elementmap is written in BNF as follows:

They are interpreted as follows:

The author admits the rule above is far from human understanding. Indeed, terms vtime_loop or member_loop are hardly used. If you are not sure, declare elements with contiguous_line. It will look like following:

element 4
elementmap PSEA 0
elementmap T 0
elementmap U 0
elementmap V 0

Allowing too much data records does not mean increase of data file size or data access speed/latency. Thus you can safely declare elements with 'no limitation' settings.

Record Format Tables

Records of NuSDaS data file have common beginning and ending (shown in Table 1). Following tables describes the PAYLOAD part.

Table 1: NuSDaS v1.0 Record Structure
OffsetLengthTypeDescription
bytebyte
04integern: record size
44characterkind of record
84integerm: payload size
124integercreation date and time in time_t value
16m - 8---PAYLOAD of record; see Table 2--6 for detail
8 + mn - m - 8---padding; should be ignored
n - 44integern: record size

Note that the `Type' is written in strange notation deliberately. They should NOT be directly interpreted as a type name of certain programming language, like C or Fortran.

character

Byte value should be interpreted as character code of ISO 646 IRV. Meaning of byte whose MSB is set is currently undefined.

integer

Certain number (usually 4) of bytes represents signed integer value. Negative value is represented with complement of 2. Note that big endian ordering of bytes is always used in NuSDaS data file.

unsigned integer

Certain number (usually 4) of bytes represents unsigned integer value.

floating

Bits in 4 or 8 bytes are used to compose IEEE 754 floating point value.

Some field is array, and that is indicated in notation like C. For example, a field noted character [2][n_lv][6] is equivalent to memory image of unsigned char [2][n_lv][6] in C or CHARACTER(LEN = 6), DIMENSION(N_LV, 2) in Fortran. However, one-dimensional array notation '[size]' for scaler character field is omitted for simplicity.

NUSD Record

Table 2: NuSDaS v1.0 NUSD Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
1680charactercreator host and user name.
964integerNuSDaS version: currently 1
1004unsigned integertotal number of bytes in file
1044integernumber of records in file
1084integernumber of INFO records in file
1124integernumber of SUBC records in file

CNTL Record

Table 3: NuSDaS v1.0 CNTL Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
1616characterdata type
3212characterbase time in format like "date +%Y%m%d%H%M"
444integerbase time in sequential minute from 1801-01-01T000Z
484charactertime unit for valid times
524integern_dc: number of members
564integern_vt: number of valid times
604integern_lv: number of planes
644integern_el: number of elements
684charactermap projection
722 * 4integer [2]number of grid points in X and Y directions
802 * 4floating [2]grid index of reference point
882 * 4floating [2]latitude/longitude of reference point
962 * 4floating [2]latitude/longitude distance between grid points
1042 * 4floating [2]1st STD latitude/longitude of map projection
1122 * 4floating [2]2nd STD latitude/longitude of map projection
1202 * 4floating [2]3rd STD latitude/longitude of map projection
1282 * 4floating [2]4th STD latitude/longitude of map projection
1364characterPVAL: representation method of grid
1402 * 4---reserved for future use of map projection
1486 * 4---reserved for future use
172n_dc * 4character [n_dc][4]list of member name
(1)n_vt * 8integer [2][n_vt]list of valid time pair
(2)n_lv * 12character [2][n_lv][6]list of plane pair
(3)n_el * 6character [n_el][6]list of element name
  1. 172 + 4 * n_dc
  2. 172 + 4 * n_dc + 8 * n_vt
  3. 172 + 4 * n_dc + 8 * n_vt + 12 * n_lv

SUBC Records

SUBC ETA/SIGM

This kind of SUBC record is employed to describe vertical grid structure. You can get pressure by p[k] = b[k] * (p_surface - c) + a[k], where k is the index of vertical plane and p_surface the surface pressure.

Table 4a: NuSDaS v1.0 SUBC ETA/SIGM Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164character"ETA△" or "SIGM"
204integernumber of planes
24(n_lv + 1) * 4float [n_lv + 1]parameter a
...(n_lv + 1) * 4float [n_lv + 1]parameter b
...4floatparameter c

SUBC Z*

Table 4b: NuSDaS v1.0 SUBC Z* Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164character"Z*△△"
202 * 4integernx and ny: number of grid points in X and Y directions
284integernumber of planes
32(n_lv + 1) * 4float [n_lv + 1]z-star location for each plane
...4floatheight of model top
...(nx * ny) * 4float [nx * ny]surface height

SUBC TDIF

This kind of SUBC record is employed for time integration/average product. The size of SUBC TDIF record depends on parameters n_dc (members) and n_vt described in CNTL record.

Table 4c: NuSDaS v1.0 SUBC TDIF Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164character"TDIF"
32n_dc * n_lv * 4integer [n_dc][n_lv]difference between accurate valid time and nominal valid time
...n_dc * n_lv * 4float [n_dc][n_lv]integration time in seconds

SUBC RADR

This kind of SUBC record is used for datasets of radar observation. The size of SUBC RADR record depends on parameters n_dc (members), n_vt, n_lv, and n_el described in CNTL record.

Table 4d: NuSDaS v1.0 SUBC RADR Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164character"RADR"
32n_dc * n_vt * n_lv * n_el * 4integer [n_dc][n_vt][n_lv][n_el]flags

Value of flags has these means:

0

ND.

1

Echo exists.

2

No echo exists.

3

No operation.

SUBC ISPC

This kind of SUBC record is used for datasets of synthesized multiple radar observations. The size of SUBC ISPC record depends on parameters n_vt, n_lv, and n_el described in CNTL record.

Table 4d: NuSDaS v1.0 SUBC ISPC Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164character"ISPC"
32n_vt * n_lv * n_el * 512integer [n_vt][n_lv][n_el][128]flags

DATA Records

DATA NONE

Table 5a: NuSDaS v1.0 DATA NONE Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164charactermember name
208integer [2]valid times
2812character [2][6]plane names
406characterelement name
462---reserved
482 * 4integer[2]nx and ny: number of grid points in X and Y directions
564characterpacking scheme such as "2PAC"
604character"NONE"
64......PACKED DATA: see following description

DATA UDVF

Table 5b: NuSDaS v1.0 DATA UDFV Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164charactermember name
208integer [2]valid times
2812character [2][6]plane names
406characterelement name
462---reserved
482 * 4integer[2]nx and ny: number of grid points in X and Y directions
564characterpacking scheme such as "2PAC"
604character"UDFV"
64(various)integer/floatingmissing value
.........PACKED DATA: see following description

DATA MASK

Table 5c: NuSDaS v1.0 DATA MASK Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164charactermember name
208integer [2]valid times
2812character [2][6]plane names
406characterelement name
462---reserved
482 * 4integer[2]nx and ny: number of grid points in X and Y directions
564characterpacking scheme such as "2PAC"
604character"MASK"
644integern_ms: number of bytes used for mask bitmap
68n_msbitmapmask bitmap
.........PACKED DATA: see following description

Packed Data Format

When the packing scheme is 1PAC, 2PAC, or 2UPC, two 4-byte floating-point field base and amp is followed by an array of packed type. See Reference Manual about the packed type. Unpacking is adding base after multiplying amp.

When the packing scheme is 4PAC, it is similar to 2PAC but base and amp is 8-byte floating-point value.

When the packing scheme is RLEN, three 4-byte integer field nbit, maxv, num is followed by octet stream containing compressed bit stream.

When the packing scheme is GRIB, the GRIB octet stream itself will be the packed data; although this feature is not implemented yet.

Otherwise, the packed data is array of packed type. Note that if the packing scheme is 'N1I2' the packed value is 10 times of unpacked value.

END Record

Table 6: NuSDaS v1.0 END Record Format (only Payload shown)
OffsetLengthTypeDescription
bytebyte
164unsigned integertotal number of bytes in file
204integernumber of records in file

*1see User Data Array Types table at the bottom of Reference Manual for available data types
*2Pandora data server and some tools uses notation using period (such as _GSMLLPP.FCSV.STD1) for readability
*3In NuSDaS, the size of record size fields (4 + 4 = 8 bytes) is INCLUDED in the record size itself, while is is NOT INCLUDED in Fortran files. It may be changed in future versions. Also note that the Fortran file format is usually written in the native byte order of creating computer.