Load and/or download a pre-processed UCI Heart Disease dataset.
See [https://archive.ics.uci.edu/dataset/45/heart+disease] for
information on the UCI Heart Disease dataset.
Arguments
name: str
Name of a center, the dataset from which to return.
folder: str or None, default=None
Optional path to a folder where to write output csv files.
If the file already exists in that folder, read from it.
Returns:
Name | Type |
Description |
data |
pd.DataFrame
|
Pre-processed dataset from the name center.
May be passed as data of a declearn InMemoryDataset . |
target |
str
|
Name of the target column in data .
May be passed as target of a declearn InMemoryDataset . |
Source code in declearn/dataset/examples/_heart_uci.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74 | def load_heart_uci(
name: Literal["cleveland", "hungarian", "switzerland", "va"],
folder: Optional[str] = None,
) -> Tuple[pd.DataFrame, str]:
"""Load and/or download a pre-processed UCI Heart Disease dataset.
See [https://archive.ics.uci.edu/dataset/45/heart+disease] for
information on the UCI Heart Disease dataset.
Arguments
---------
name: str
Name of a center, the dataset from which to return.
folder: str or None, default=None
Optional path to a folder where to write output csv files.
If the file already exists in that folder, read from it.
Returns
-------
data: pd.DataFrame
Pre-processed dataset from the `name` center.
May be passed as `data` of a declearn `InMemoryDataset`.
target: str
Name of the target column in `data`.
May be passed as `target` of a declearn `InMemoryDataset`.
"""
# If the pre-processed file already exists, read and return it.
if folder is not None:
path = os.path.join(folder, f"data_{name}.csv")
if os.path.isfile(path):
data = pd.read_csv(path)
return data, "num"
# Download (and optionally save) or read from the source zip file.
source = get_heart_uci_zipfile(folder)
# Extract the target shard and preprocess it.
data = extract_heart_uci_shard(name, source)
data = preprocess_heart_uci_dataframe(data)
# Optionally save the preprocessed shard to disk.
if folder is not None:
path = os.path.join(folder, f"data_{name}.csv")
data.to_csv(path, sep=",", encoding="utf-8", index=False)
return data, "num"
|