`declearn.metrics.BinaryRocAUC`

Bases: Metric[AurocState]

ROC AUC metric for binary classification.

This metric applies to a binary classifier, and computes the (opt. weighted) amount of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) predictions over time around a variety of thresholds; from which TP rate, FP rate and finally ROC AUC metrics are eventually derived.

Computed metrics are the following:

fpr: 1-d numpy.ndarray True-positive rate values for a variety of thresholds. Formula: TP / (TP + FN), i.e. P(pred=1|true=1)
tpr: 1-d numpy.ndarray False-positive rate values for a variety of thresholds. Formula: FP / (FP + TN), i.e. P(pred=1|true=0)
thresh: 1-d numpy.ndarray Array of decision thresholds indexing the FPR and TPR.
roc_auc: float ROC AUC, i.e. area under the receiver-operator curve, score.

Note that this class supports aggregating states from another BinaryRocAUC instance with different hyper-parameters into it, unless its bound parameter is set - in which case thresholds are not authorized to be dynamically updated, either at samples processing or states-aggregating steps.

Source code in declearn/metrics/_roc_auc.py

class BinaryRocAUC(Metric[AurocState]):
    """ROC AUC metric for binary classification.

    This metric applies to a binary classifier, and computes the (opt.
    weighted) amount of true positives (TP), true negatives (TN), false
    positives (FP) and false negatives (FN) predictions over time around
    a variety of thresholds; from which TP rate, FP rate and finally ROC
    AUC metrics are eventually derived.

    Computed metrics are the following:

    * fpr: 1-d numpy.ndarray
        True-positive rate values for a variety of thresholds.
        Formula: TP / (TP + FN), i.e. P(pred=1|true=1)
    * tpr: 1-d numpy.ndarray
        False-positive rate values for a variety of thresholds.
        Formula: FP / (FP + TN), i.e. P(pred=1|true=0)
    * thresh: 1-d numpy.ndarray
        Array of decision thresholds indexing the FPR and TPR.
    * roc_auc: float
        ROC AUC, i.e. area under the receiver-operator curve, score.

    Note that this class supports aggregating states from another
    BinaryRocAUC instance with different hyper-parameters into it,
    unless its `bound` parameter is set - in which case thresholds
    are not authorized to be dynamically updated, either at samples
    processing or states-aggregating steps.
    """

    name = "binary-roc"
    state_cls = AurocState

    def __init__(
        self,
        scale: float = 0.1,
        label: Union[int, str] = 1,
        bound: Optional[Tuple[float, float]] = None,
    ) -> None:
        """Instantiate the binary ROC AUC metric.

        Parameters
        ----------
        scale: float, default=.1
            Granularity of the set of threshold values around which
            to binarize input predictions for fpr/tpr estimation.
        label: int or str, default=1
            Value of the positive labels in input true-label arrays.
        bound: (float, float) tuple or None, default=None
            Optional lower and upper bounds for threshold values. If
            set, disable adjusting the scale based on input values.
            If None, start with (0, 1) and extend the scale on both
            ends when input values exceed them.

        Notes
        -----
        - Using the default `bound=None` enables the thresholds at which
          the ROC curve points are compute to vary dynamically based on
          inputs, but also based on input states to the `agg_states`
          method, that may come from a metric with different parameters.
        - Setting up explicit boundaries prevents thresholds from being
          adjusted at update time, and a ValueError will be raise by the
          `agg_states` method if inputs are adjusted to a distinct set
          of thresholds.
        """
        self.scale = scale
        self.label = label
        self.bound = bound
        super().__init__()

    def get_config(
        self,
    ) -> Dict[str, Any]:
        return {"scale": self.scale, "label": self.label, "bound": self.bound}

    @property
    def prec(self) -> int:
        """Numerical precision of threshold values."""
        return int(f"{self.scale:.1e}".rsplit("-", 1)[-1])

    def build_initial_states(
        self,
    ) -> AurocState:
        if self.bound is None:
            bounds = (0.0, 1.0)
            aggcls = AurocStateUnbound  # type: Type[AurocState]
        else:
            bounds = self.bound
            aggcls = AurocState
        thresh = self._build_thresholds(*bounds)
        names = ("tpos", "tneg", "fpos", "fneg")
        states = {key: np.zeros_like(thresh) for key in names}
        return aggcls(**states, thresh=thresh)

    def _build_thresholds(
        self,
        lower: float,
        upper: float,
    ) -> np.ndarray:
        """Return a 1-d array of increasing threshold values."""
        t_min = np.floor(lower / self.scale)
        t_max = np.ceil(upper / self.scale)
        return (np.arange(t_min, t_max + 1) * self.scale).round(self.prec)

    def get_result(
        self,
    ) -> Dict[str, Union[float, np.ndarray]]:
        # Unpack state variables for code readability.
        tpos = self._states.tpos[::-1]
        tneg = self._states.tneg[::-1]
        fpos = self._states.fpos[::-1]
        fneg = self._states.fneg[::-1]
        # Compute true- and false-positive rates and derive AUC.
        with np.errstate(invalid="ignore"):
            tpr = np.nan_to_num(tpos / (tpos + fneg), copy=False)
            fpr = np.nan_to_num(fpos / (fpos + tneg), copy=False)
        auc = sklearn.metrics.auc(fpr, tpr)
        return {
            "tpr": tpr,
            "fpr": fpr,
            "thresh": self._states.thresh[::-1],
            "roc_auc": auc,
        }

    def update(
        self,
        y_true: np.ndarray,
        y_pred: np.ndarray,
        s_wght: Optional[np.ndarray] = None,
    ) -> None:
        # Set up the scaled set of thresholds at which to estimate states.
        thresh = self._states.thresh
        if self.bound is None:
            thresh = self._build_thresholds(
                min(y_pred.min(), thresh[0]),
                max(y_pred.max(), thresh[-1]),
            )
            aggcls = AurocStateUnbound  # type: Type[AurocState]
        else:
            aggcls = AurocState
        # Adjust inputs' shape if needed.
        y_pred = y_pred.reshape((-1, 1))
        y_true = y_true.reshape((-1, 1))
        s_wght = (
            np.ones_like(y_pred) if s_wght is None else s_wght.reshape((-1, 1))
        )
        # Compute threshold-wise prediction truthness values.
        pos = y_true == self.label
        tru = (y_pred >= thresh) == pos
        # Aggregate the former into threshold-wise TP/TN/FP/FN scores.
        states = aggcls(
            tpos=(s_wght * (tru & pos)).sum(axis=0),
            tneg=(s_wght * (tru & ~pos)).sum(axis=0),
            fpos=(s_wght * ~(tru | pos)).sum(axis=0),
            fneg=(s_wght * (~tru & pos)).sum(axis=0),
            thresh=thresh,
        )
        # Aggregate these scores into the retained states.
        self._states += states

    def set_states(
        self,
        states: AurocState,
    ) -> None:
        # Prevent bounded instances from assigning unmatching inputs.
        if self.bound:
            if isinstance(states, AurocStateUnbound):
                states = AurocState.from_dict(states.to_dict())
            if not (
                (len(self._states.thresh) == len(states.thresh))
                and np.all(self._states.thresh == states.thresh)
            ):
                raise TypeError(
                    f"Cannot assign '{self.__class__.__name__}' states with "
                    "unmatching thresholds to an instance with bounded ones."
                )
        # Prevent unbounded instances from switching to bouded states.
        elif self.bound is None and not isinstance(states, AurocStateUnbound):
            states = AurocStateUnbound.from_dict(states.to_dict())
        # Delegate assignment to parent call (that raises on wrong type).
        return super().set_states(states)

`prec: int` `property`

Numerical precision of threshold values.

`init(scale=0.1, label=1, bound=None)`

Instantiate the binary ROC AUC metric.

Parameters:

Name	Type	Description	Default
`scale`	`float`	Granularity of the set of threshold values around which to binarize input predictions for fpr/tpr estimation.	`0.1`
`label`	`Union[int, str]`	Value of the positive labels in input true-label arrays.	`1`
`bound`	`Optional[Tuple[float, float]]`	Optional lower and upper bounds for threshold values. If set, disable adjusting the scale based on input values. If None, start with (0, 1) and extend the scale on both ends when input values exceed them.	`None`

Notes

Using the default bound=None enables the thresholds at which the ROC curve points are compute to vary dynamically based on inputs, but also based on input states to the agg_states method, that may come from a metric with different parameters.
Setting up explicit boundaries prevents thresholds from being adjusted at update time, and a ValueError will be raise by the agg_states method if inputs are adjusted to a distinct set of thresholds.

Source code in declearn/metrics/_roc_auc.py

def __init__(
    self,
    scale: float = 0.1,
    label: Union[int, str] = 1,
    bound: Optional[Tuple[float, float]] = None,
) -> None:
    """Instantiate the binary ROC AUC metric.

    Parameters
    ----------
    scale: float, default=.1
        Granularity of the set of threshold values around which
        to binarize input predictions for fpr/tpr estimation.
    label: int or str, default=1
        Value of the positive labels in input true-label arrays.
    bound: (float, float) tuple or None, default=None
        Optional lower and upper bounds for threshold values. If
        set, disable adjusting the scale based on input values.
        If None, start with (0, 1) and extend the scale on both
        ends when input values exceed them.

    Notes
    -----
    - Using the default `bound=None` enables the thresholds at which
      the ROC curve points are compute to vary dynamically based on
      inputs, but also based on input states to the `agg_states`
      method, that may come from a metric with different parameters.
    - Setting up explicit boundaries prevents thresholds from being
      adjusted at update time, and a ValueError will be raise by the
      `agg_states` method if inputs are adjusted to a distinct set
      of thresholds.
    """
    self.scale = scale
    self.label = label
    self.bound = bound
    super().__init__()

declearn.metrics.BinaryRocAUC

prec: int property

__init__(scale=0.1, label=1, bound=None)

Notes

`declearn.metrics.BinaryRocAUC`

`prec: int` `property`

`init(scale=0.1, label=1, bound=None)`