Skip to content

codelists

Format the usage codelist into a convenient JSON file for the JavaScript app.

Functions:

Name Description
format_codelist_json

Helper script to format the CSV codelist into a more convenient JSON file.

format_codelist_json(input_csv_path, output_json_path)

Helper script to format the CSV codelist into a more convenient JSON file.

Parameters:

Name Type Description Default
input_csv_path pathlib.Path

The input CSV path to the codelist.

required
output_json_path pathlib.Path

The output JSON path of the formatted codelist.

required

Raises:

Type Description
RuntimeError

If a code is duplicated in the input.

RuntimeError

If a CSV row does not have a code.

Source code in python/src/data_pipeline/utils/codelists.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def format_codelist_json(input_csv_path: Path, output_json_path: Path):
    """
    Helper script to format the CSV codelist into a more convenient JSON file.

    Parameters
    ----------
    input_csv_path : Path
        The input CSV path to the codelist.
    output_json_path : Path
        The output JSON path of the formatted codelist.

    Raises
    ------
    RuntimeError
        If a code is duplicated in the input.
    RuntimeError
        If a CSV row does not have a code.
    """
    specific_columns = ("Code [str]", "Implies [list,str]")
    attributes_all, specific_values_all = csv_read_attributes(
        csv_path=input_csv_path, specific_columns=specific_columns
    )
    codes_attributes = {}
    codes_implied_by = defaultdict(lambda: [])
    for attributes, specific_values in zip(attributes_all, specific_values_all):
        _, code = specific_values[0]
        implies_column, implies_value = specific_values[1]
        if code in codes_attributes:
            raise RuntimeError(f"The code '{code}' is duplicated in the input.")
        if code == "":
            raise RuntimeError("One row misses its code.")
        attributes[implies_column] = implies_value
        codes_attributes[code] = attributes

        # Store the implication backwards
        for code_implied in implies_value:
            codes_implied_by[code_implied].append(code)

    # Compute the implication backwards
    for code in codes_attributes.keys():
        codes_attributes[code]["Implied by"] = codes_implied_by[code]

    output_json_path.parent.mkdir(parents=True, exist_ok=True)
    with open(output_json_path, "w") as f:
        f.write(json.dumps(codes_attributes))