Skip to content

declearn.dataset.examples.load_heart_uci

Load and/or download a pre-processed UCI Heart Disease dataset.

See [https://archive.ics.uci.edu/dataset/45/heart+disease] for information on the UCI Heart Disease dataset.

Arguments

name: str Name of a center, the dataset from which to return. folder: str or None, default=None Optional path to a folder where to write output csv files. If the file already exists in that folder, read from it.

Returns:

Name Type Description
data pd.DataFrame

Pre-processed dataset from the name center. May be passed as data of a declearn InMemoryDataset.

target str

Name of the target column in data. May be passed as target of a declearn InMemoryDataset.

Source code in declearn/dataset/examples/_heart_uci.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def load_heart_uci(
    name: Literal["cleveland", "hungarian", "switzerland", "va"],
    folder: Optional[str] = None,
) -> Tuple[pd.DataFrame, str]:
    """Load and/or download a pre-processed UCI Heart Disease dataset.

    See [https://archive.ics.uci.edu/dataset/45/heart+disease] for
    information on the UCI Heart Disease dataset.

    Arguments
    ---------
    name: str
        Name of a center, the dataset from which to return.
    folder: str or None, default=None
        Optional path to a folder where to write output csv files.
        If the file already exists in that folder, read from it.

    Returns
    -------
    data: pd.DataFrame
        Pre-processed dataset from the `name` center.
        May be passed as `data` of a declearn `InMemoryDataset`.
    target: str
        Name of the target column in `data`.
        May be passed as `target` of a declearn `InMemoryDataset`.
    """
    # If the pre-processed file already exists, read and return it.
    if folder is not None:
        path = os.path.join(folder, f"data_{name}.csv")
        if os.path.isfile(path):
            data = pd.read_csv(path)
            return data, "num"
    # Download (and optionally save) or read from the source zip file.
    source = get_heart_uci_zipfile(folder)
    # Extract the target shard and preprocess it.
    data = extract_heart_uci_shard(name, source)
    data = preprocess_heart_uci_dataframe(data)
    # Optionally save the preprocessed shard to disk.
    if folder is not None:
        path = os.path.join(folder, f"data_{name}.csv")
        data.to_csv(path, sep=",", encoding="utf-8", index=False)
    return data, "num"