`declearn.optimizer.modules.L2GlobalClipping`

Bases: OptiModule

Fixed-threshold, global-L2-norm gradient clipping module.

This module implements the following algorithm:

Init(max_norm):
    assign max_norm
Step(grads):
    norm = euclidean_norm(flatten_and_stack(grads))
    clip = max(max_norm / norm, 1.0)
    grads *= clip

In other words, (batch-averaged) gradients are clipped based on the L2 (euclidean) norm of their concatenated values, so that if that norm is above the selected max_norm threshold, all gradients are scaled by the same factor.

This is equivalent to the tensorflow.clip_by_global_norm and torch.utils.clip_grad_norm_ utils. If you would rather clip gradients on a per-parameter basis, use the L2Clipping module.

This may be used to bound the sensitivity of gradient-based model updates, and/or to prevent exploding gradients issues.

Source code in declearn/optimizer/modules/_clipping.py

class L2GlobalClipping(OptiModule):
    """Fixed-threshold, global-L2-norm gradient clipping module.

    This module implements the following algorithm:

        Init(max_norm):
            assign max_norm
        Step(grads):
            norm = euclidean_norm(flatten_and_stack(grads))
            clip = max(max_norm / norm, 1.0)
            grads *= clip

    In other words, (batch-averaged) gradients are clipped based on
    the L2 (euclidean) norm of their *concatenated* values, so that
    if that norm is above the selected `max_norm` threshold, all
    gradients are scaled by the same factor.

    This is equivalent to the `tensorflow.clip_by_global_norm` and
    `torch.utils.clip_grad_norm_` utils. If you would rather clip
    gradients on a per-parameter basis, use the `L2Clipping` module.

    This may be used to bound the sensitivity of gradient-based model
    updates, and/or to prevent exploding gradients issues.
    """

    name: ClassVar[str] = "l2-global-clipping"

    def __init__(
        self,
        max_norm: float = 1.0,
    ) -> None:
        """Instantiate the L2-norm gradient-clipping module.

        Parameters
        ----------
        max_norm: float, default=1.0
            Clipping threshold of the L2 (euclidean) norm of
            concatenated input gradients.
        """
        self.max_norm = max_norm

    def run(
        self,
        gradients: Vector,
    ) -> Vector:
        # Handle the edge case of an empty input Vector.
        if not gradients.coefs:
            return gradients
        # Compute the total l2 norm of gradients.
        sum_of_squares = (gradients**2).sum()
        total_sum_of_squares = sum(
            type(gradients)({"norm": value})
            for value in sum_of_squares.coefs.values()
        )
        l2_norm = total_sum_of_squares**0.5
        # Compute and apply the associate scaling.
        scaling = (self.max_norm / l2_norm).minimum(1.0).coefs["norm"]
        return gradients * scaling

    def get_config(
        self,
    ) -> Dict[str, Any]:
        return {"max_norm": self.max_norm}

`init(max_norm=1.0)`

Instantiate the L2-norm gradient-clipping module.

Parameters:

Name	Type	Description	Default
`max_norm`	`float`	Clipping threshold of the L2 (euclidean) norm of concatenated input gradients.	`1.0`

Source code in declearn/optimizer/modules/_clipping.py

def __init__(
    self,
    max_norm: float = 1.0,
) -> None:
    """Instantiate the L2-norm gradient-clipping module.

    Parameters
    ----------
    max_norm: float, default=1.0
        Clipping threshold of the L2 (euclidean) norm of
        concatenated input gradients.
    """
    self.max_norm = max_norm

declearn.optimizer.modules.L2GlobalClipping

__init__(max_norm=1.0)

`declearn.optimizer.modules.L2GlobalClipping`

`init(max_norm=1.0)`