BUFR template mapping¶

The mapping between the input CSV data and the output BUFR data is specified in a JSON file. The csv2bufr module validates the mapping file against the schema shown at the bottom of this page prior to attempting the transformation to BUFR. This schema specifies 7 primary properties all of which are mandatory:

inputDelayedDescriptorReplicationFactor - array of integers, values for the delayed descriptor replication factors to use
inputShortDelayedDescriptorReplicationFactor - array of integers, values for the short delayed descriptor replication factors to use
inputExtendedDelayedDescriptorReplicationFactor - array of integers, values for the extended delayed descriptor replication factors to use
number_header_rows - integer, the number of header rows in the file before the first data, including the row with column names.
column_names_row - integer, the row number that gives the column names.
wigos_station_identifier - either constant WIGOS station identifier (e.g. const:0-20000-0-123) or column from csv data file containing the WSI (e.g. data:WSI_column).
header - array of objects (see below), header section containing metadata
data - array of object (see below) section mapping from the CSV columns to the BUFR elements

The header and data sections contain arrays of bufr_element objects mapping to either the different fields in the header sections of the BUFR message or to the data section respectively. More information is provided below. In both cases the field eccodes_key from the bufr_element object is used to indicate the BUFR element mapped rather than the 6 digit BUFR FXXYYY code. The field value specifies where the data to encode comes from. This can be one of the following:

data: this specifies that the data should come from the data file.
const: this specifies that a constant value should be used

For example, the code block below shows how the pressure reduced to mean sea level would be mapped from the column “mslp” in the CSV file to the BUFR element indicated by the eccodes key “pressureReducedToMeanSeaLevel” (FXXYYY = 010051).

{
    "data":[
        {"eccodes_key": "pressureReducedToMeanSeaLevel", "value": "data:mslp"}
    ]
}

The code block below gives examples for both constant values and a value read from the data file:

{
    "header":[
        {"eccodes_key": "dataCategory", "value": "const:0"}
    ],
    "data":[
        {"eccodes_key": "latitude", "value": "const:46.2234923"},
        {"eccodes_key": "longitude", "value": "const:6.1475485"},
        {"eccodes_key": "pressureReducedToMeanSeaLevel", "value": "data:mslp"},
    ]
}

In this example, the dataCategory field in BUFR section 1 (see Anatomy of a BUFR4 message) is mapped to the constant value 0; the latitude and longitude to the value specified; and the pressureReducedToMeanSeaLevel to the data from the “mslp” column in the CSV file.

The keys used for the header elements are listed on the Anatomy of a BUFR4 message page, with the mandatory keys highlighted in red. The list of keys can also be found at:

https://confluence.ecmwf.int/display/ECC/BUFR+headers

Similarly the keys for the different data elements can be found at:

https://confluence.ecmwf.int/display/ECC/WMO%3D37+element+table

input<Short|Extended>DelayedDescriptorReplicationFactor¶

Due to the way that eccodes works any delayed replication factors need to be specified before encoding and included in the mapping file. This currently limits the use of the delayed replication factors to static values for a given mapping. For example every data file that uses a given mapping file has the same optional elements present or the same number of levels in an atmospheric profile present.

For sequences that do not include delayed replications the inputDelayedDescriptorReplicationFactor etc must still be included but may be set to an empty array. e.g.

{
    "inputDelayedDescriptorReplicationFactor": [],
    "inputShortDelayedDescriptorReplicationFactor": []
    "inputExtendedDelayedDescriptorReplicationFactor": []
}

bufr_element¶

Each item in the header and data arrays of the mapping template must conform with the definition of the bufr_element object specified in the schema shown below. This object contains an eccodes_key field specifying the BUFR element the data are being mapped to as described above and up to 3 others pieces of information:

the source of the data (value)
valid range information (valid_min, valid_max)
simple scaling and offset parameters (scale, offset)

Only one source can be mapped, if multiple sources are specified the validation of the mapping file by csv2bufr will fail. As noted at the start of this page. the value field maps the data to one of:

data: this specifies that the data should come from the data file
const: this specifies that a constant value should be used

and takes the form "value": "<keyword>:<column|value>" where <keyword> is the string data or const. <column|value> can specify either the column name from the data file or it can specify a constant value to use.

The valid_min and valid_max are optional and can be used to perform a basic quality control of numeric fields. The values to use are specified in the same way as for the value element, with the values coming from either a constant value or from the data file. If these fields are specified the csv2bufr module checks the value extracted from the source to the indicated valid minimum and maximum values. If outside of the range the value is set to missing. This check is applied before any scaling of the data.

The scale and offset fields are conditionally optional, either both can be omitted or both can included. Including only one will result in a failed validation of the mapping file. These allow simple unit conversions to be performed, for example from degrees Celsius to Kelvin or from hectopascals to Pascals. Again, the values are specified in the same way as for the other fields. The scaled values are calculated as:

$\mbox{scaled\_value} = \mbox{value} \times 10^{\mbox{scale}} + \mbox{offset}$

The scaled value is then used to set the indicated BUFR element. For example:

{
    "data":[
        {
            "eccodes_key": "pressureReducedToMeanSeaLevel",
            "value": "data:mslp",
            "scale": "const:2",
            "offset": "const:0"
        }
    ]
}

Would convert the value contained in the “mslp” column of the CSV file from hPa to Pa by multiplying by 100 and adding 0.

For each of the above elements (value, valid_min, valid_max, scale, offset) null values must be excluded from the mapping file.

An individual BUFR descriptor can occur multiple times within a single BUFR message. To allow the indexing of the descriptors within a particular message, and the inclusions of multiple descriptors or keys with the same name, eccodes prepends an index number to the eccodes_key. For the first occurrence the index number can be omitted but for all other cases it should be included. The index is indicated within the eccodes_key using #index#eccodes_key, an example is given below.

{
    "data":[
        {
            "#1#eccodes_key": "pressureReducedToMeanSeaLevel",
            "csv_column": "data:mslp",
            "scale": "const:2",
            "offset": "const:0"
        }
    ]
}

Units¶

It should be noted that the units of the data to be encoded into BUFR should match those specified in BUFR table B (e.g. see https://confluence.ecmwf.int/display/ECC/WMO%3D37+element+table), i.e. Kelvin for temperatures, Pascals for pressure etc. Simple conversions between units are possible as specified above using the scale and offset fields. Some additional examples are given below.

{
    "data":[
        {
            "eccodes_key": "airTemperature",
            "value": "data:AT-fahrenheiht",
            "scale": "const:-0.25527",
            "offset": "const:459.67"
        },
        {
            "eccodes_key": "airTemperature",
            "value": "data:AT-celsius",
            "scale": "const:0",
            "offset": "const:273.15"
        },
        {
            "eccodes_key": "pressure",
            "value": "data:pressure-hPa",
            "scale": "const:2",
            "offset": "const:0"
        }
    ]
}

Schema¶

{
    "$id": "csv2bufr.wis2.0.node.wis",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "conformsTo": {},
        "metadata": {
            "type": "object",
            "required": ["label","description","version","author","editor","dateCreated","dateModified","id"],
            "properties": {
                "label": {
                    "type": "string"
                },
                "description": {
                    "type": "string"
                },
                "version": {
                    "type": "string"
                },
                "author": {
                    "type": "string"
                },
                "editor": {
                    "type": "string"
                },
                "dateCreated": {
                    "type": "string",
                    "format": "date"
                },
                "dateModified": {
                    "type": "string",
                    "format": "date"
                },
                "id": {
                    "type": "string",
                    "format": "uuid4"
                }
            }
        },

        "inputShortDelayedDescriptorReplicationFactor": {
            "type": "array",
            "items": {"type": "integer"}
        },
        "inputDelayedDescriptorReplicationFactor": {
            "type": "array",
            "items": {"type": "integer"}
        },
        "inputExtendedDelayedDescriptorReplicationFactor": {
            "type": "array",
            "items": {"type": "integer"}
        },
        "number_header_rows": {
            "type": "integer",
            "description": "Number of header rows in file before the data"
        },
        "column_names_row": {
            "type": "integer",
            "description": "Which header line the column names is given on"

        },
        "wigos_station_identifier": {
            "type": "string",
            "description": "Either the WIGOS station identifier for the data or the column in the CSV file containing the identifier"
        },
        "delimiter": {
            "type": "string",
            "description": "The delimiter used to separate fields in the input csv file, must be one of ',', ';'. '|' or [tab]"
        },
        "quoting": {
            "type": "string",
            "description": "CSV quoting method to use, must be one of QUOTE_NONNUMERIC, QUOTE_ALL, QUOTE_MINIMAL or QUOTE_NONE"
        },
        "quotechar": {
            "type": "string",
            "description": "quote character to use, e.g. \", ' etc"
        },
        "header":{
            "type": "array",
            "items": {"$ref": "#/$defs/bufr_element"},
            "description": "Contents of header sections of BUFR message"
        },
        "data": {
            "type": "array",
            "items": {"$ref": "#/$defs/bufr_element"},
            "description": "mapping from CSV file (or metadata json file) to BUFR"
        }
    },
    "required" : [
        "conformsTo", "metadata",
        "inputShortDelayedDescriptorReplicationFactor",
        "inputDelayedDescriptorReplicationFactor",
        "inputExtendedDelayedDescriptorReplicationFactor",
        "column_names_row","number_header_rows","header","data"],

    "$defs":{
        "bufr_element": {
            "type": "object",
            "properties": {
                "eccodes_key": {
                    "type": "string",
                    "descripition": "eccodes key used to set the value in the BUFR data"
                },
                "value": {
                    "type": [
                        "string"
                    ],
                    "description": "where to extract the value from, can be one off 'data','metadata','const','array' followed by the value or column header"
                },
                "valid_min": {
                    "type": "string",
                    "description": "Minimum valid value for parameter if set"
                },
                "valid_max": {
                    "type": "string",
                    "description": "Maximum value for for the parameter if set"
                },
                "scale": {
                    "type": "string",
                    "description": "Value used to scale the data by before encoding using the same conventions as in BUFR"
                },
                "offset": {
                    "type": "string",
                    "description": "Value added to the data before encoding to BUFR following the same conventions as BUFR"
                }
            },
            "required": ["eccodes_key", "value"],
            "allOf": [
                {
                    "dependentRequired": {"scale": ["offset"]}
                },
                {
                    "dependentRequired": {"offset": ["scale"]}
                }
            ]
        }
    }
}

Built in templates and search path¶

Several preconfigured templates are available from the csv2bufr-templates repository:

https://github.com/wmo-im/csv2bufr-templates

By default, csv2bufr searches in the current working directory and the /opt/csv2bufr/templates directory, additional search paths can be added by setting the CSV2BUFR_TEMPLATES environment variable.