BUFR template mapping

The mapping between the input CSV data and the output BUFR data is specified in a JSON file. The csv2bufr module validates the mapping file against the schema shown at the bottom of this page prior to attempted the transformation to BUFR. This schema specifies 6 primary properties:

  • inputDelayedDescriptorReplicationFactor - array of integers, values for the delayed descriptor replication factors to use

  • number_header_rows - integer, the number of header rows in the file before the data rows

  • names_on_row - integer, which row the column names appear on

  • header - array of objects (see below), header section containing metadata

  • data - array of object (see below) section mapping from the CSV columns to the BUFR elements

  • wigos_station_identifier - object (see below), section to contain the WIGOS station identifier

Out of these, only the inputDelayedDescriptorReplicationFactor, header and data are mandatory, with the unexpandedDescriptors described on the previous page included in the header section. Both the number_header_rows and names_on_row default to one if not specified.

The header and data sections contain arrays of bufr_element objects mapping to either the different fields in the header sections of the BUFR message or to the data section respectively. More information is provided below. In both cases the field eccodes_key is used to indicate the BUFR element being mapped to rather than the 6 digit FXXYYY code. For example, the code block below shows how the pressure reduced to mean sea level would be mapped from the column “mslp” in the CSV file to the BUFR element indicated by the eccodes key “pressureReducedToMeanSeaLevel” (FXXYYY = 010051).

{
    "data":[
        {"eccodes_key": "pressureReducedToMeanSeaLevel", "csv_column": "mslp"}
    ]
}

In addition to mapping to the CSV columns, constant values and values from the JSON metadata file can be mapped using the “value” and “jsonpath” fields. Building on the prior example:

{
    "header":[
        {"eccodes_key": "dataCategory", "value": 0}
    ],
    "data":[
        {"eccodes_key": "latitude", "jsonpath": "$.locations[0].latitude"},
        {"eccodes_key": "longitude", "jsonpath": "$.locations[0].latitude"},
        {"eccodes_key": "pressureReducedToMeanSeaLevel", "csv_column": "mslp"},
    ]
}

Would map: the dataCategory field in BUFR section 1 (see Anatomy of a BUFR4 message) to the constant value 0; the latitude and longitude to the elements specified by resolving the jsonpath in the metadata file; and the pressureReducedToMeanSeaLevel to the data from the “mslp” column in the CSV file.

The keys used for the header elements are listed on the Anatomy of a BUFR4 message page, with the mandatory keys highlighted in red. The list of keys can also be found at:

Similarly the keys for the different data elements can be found at:

inputDelayedDescriptorReplicationFactor

Due to the way that eccodes works any delayed replication factors need to be specified before encoding and included in the mapping file. This currently limits the use of the delayed replication factors to static values for a given mapping. For example every data file that uses a given mapping file has the same optional elements present or the same number of levels in an atmospheric profile present.

For sequences that do not include delayed replications the inputDelayedDescriptorReplicationFactor must still be included but may be set to an empty array, e.g.

{
    "inputDelayedDescriptorReplicationFactor": []
}

bufr_element

Each item in the header and data arrays of the mapping template must conform with the definition of the bufr_element object specified in the schema shown below. This object contains an eccodes_key field specifying the BUFR element the data are being mapped to as described above and up to 3 others pieces of information:

  • the source of the data (value, csv_column, jsonpath)

  • valid range information (valid_min, valid_max

  • simple scaling and offset parameters (scale, offset)

Only one source can be mapped, if multiple sources are specified the validation of the mapping file by csv2bufr will fail. The value source maps a constant value to the indicated BUFR element. The csv_column source maps the indicated column from the CSV file to the indicated BUFR element. The jsonpath source maps from the value found by resolving the JSON path in the metadata file to the indicated BUFR element.

The valid_min and valid_max are optional and can be used to perform a basic quality control of numeric fields. If these fields are specified the csv2bufr module checks the value indicated extracted from the source to the indicated valid minimum and maximum values. If outside of the range the value is set to missing.

The scale and offset fields are conditionally optional, either both can be omitted or both can included. Including only one will result in a failed validation of the mapping file. These allow simple unit conversions to be performed, for example from degrees Celsius to Kelvin or from hectopascals to Pascals. The scaled values are calculated as:

\mbox{scaled\_value} =
    \mbox{value} \times 10^{\mbox{scale}} + \mbox{offset}

The scaled value is then used to set the indicated BUFR element. For example:

{
    "data":[
        {
            "eccodes_key": "pressureReducedToMeanSeaLevel",
            "csv_column": "mslp",
            "scale": 2,
            "offset": 0
        }
    ]
}

Would convert the value contained in the “mslp” column of the CSV file from hPa to Pa by multiplying by 100 and adding 0.

For each of the above elements (value, csv_column, jsonpath, valid_min, valid_max, scale, offset) null values must be excluded from the mapping file.

An individual BUFR descriptor can occur multiple times within a single BUFR message. To allow the indexing of the descriptors within a particular message, and the inclusions of multiple descriptors or keys with the same name, eccodes prepends an index number to the eccodes_key. For the first occurrence the index number can be omitted but for all other cases it should be included. The index is indicated within the eccodes_key using #index#eccodes_key, an example is given below.

{
    "data":[
        {
            "#1#eccodes_key": "pressureReducedToMeanSeaLevel",
            "csv_column": "mslp",
            "scale": 2,
            "offset": 0
        }
    ]
}

Units

It should be noted that the units of the data to be encoded into BUFR should match those specified in BUFR table B (e.g. see https://confluence.ecmwf.int/display/ECC/WMO%3D37+element+table), i.e. Kelvin for temperatures, Pascals for pressure etc. Simple conversions between units are possible as specified above using the scale and offset fields. Some additional examples are given below.

{
    "data":[
        {
            "eccodes_key": "airTemperature",
            "csv_column": "AT-fahrenheiht",
            "scale": -0.25527,
            "offset": 459.67
        },
        {
            "eccodes_key": "airTemperature",
            "csv_column": "AT-celsius",
            "scale": 0,
            "offset": 273.15
        },
        {
            "eccodes_key": "pressure",
            "csv_column": "pressure-hPa",
            "scale": 2,
            "offset": 0
        }
    ]
}

Schema

{
    "$id": "csv2bufr.wis2.0.node.wis",
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "inputDelayedDescriptorReplicationFactor": {
            "type": "array",
            "items": {"type": "integer"}
        },
        "number_header_rows": {
            "type": "integer",
            "description": "Number of header rows in the file"
        },
        "names_on_row": {
            "type": "integer",
            "description": "Which row the column names appear on"
        },
        "header":{
            "type": "array",
            "items": {"$ref": "#/$defs/bufr_element"},
            "description": "Contents of header sections of BUFR message"
        },
        "data": {
            "type": "array",
            "items": {"$ref": "#/$defs/bufr_element"},
            "description": "mapping from CSV file (or metadata json file) to BUFR"
        },
        "wigos_station_identifier": {
            "type": "object",
            "description": "Field to contain WIGOS identifier (currently unused)",
            "properties": {
                "csv_column": {"type": "string"},
                "jsonpath": {"type": "string"},
                "value": {"type": "string"}
            },
            "oneOf": [
                        {"required": ["value"]},
                        {"required": ["csv_column"]},
                        {"required": ["jsonpath"]}
                    ]
        }
    },
    "required" : ["inputDelayedDescriptorReplicationFactor","header","data"],
    "$defs":{
        "bufr_element": {
            "type": "object",
            "properties": {
                "eccodes_key": {
                    "type": "string",
                    "descripition": "eccodes key used to set the value in the BUFR data"
                },
                "value": {
                    "type": [
                        "boolean", "object", "array", "number", "string", "integer"
                    ],
                    "description": "fixed value to use for all data using this mapping"
                },
                "csv_column": {
                    "type": "string",
                    "description": "column from the CSV file to map to the BUFR element indicated by eccodes_key"
                },
                "jsonpath": {
                    "type": "string",
                    "description": "json path to the element in the JSON metadata file"
                },
                "valid_min": {
                    "type": "number",
                    "description": "Minimum valid value for parameter if set"
                },
                "valid_max": {
                    "type": "number",
                    "description": "Maximum value for for the parameter if set"
                },
                "scale": {
                    "type": "number",
                    "description": "Value used to scale the data by before encoding using the same conventions as in BUFR"
                },
                "offset": {
                    "type": "number",
                    "description": "Value added to the data before encoding to BUFR following the same conventions as BUFR"
                }
            },
            "required": ["eccodes_key"],
            "allOf": [
                {
                    "oneOf": [
                        {"required": ["value"]},
                        {"required": ["csv_column"]},
                        {"required": ["jsonpath"]}
                    ]
                },
                {
                    "dependentRequired": {"scale": ["offset"]}
                },
                {
                    "dependentRequired": {"offset": ["scale"]}
                }
            ]
        }
    }
}