`declearn.optimizer.modules.YogiModule`

Bases: AdamModule

Yogi additive adaptive moment estimation module.

This module implements the following algorithm:

Init(beta_1, beta_2, eps):
    state_m = 0
    state_v = 0
Step(grads, step):
    state_m = beta_1*state_m + (1-beta_1)*grads
    sign_uv = sign(state_v - grads**2)
    state_v = state_v + sign_uv*(1-beta_2)*(grads**2)
    m_hat = state_m / (1 - beta_1**step)
    v_hat = state_v / (1 - beta_2**step)
    grads = state_m / (sqrt(v_hat) + eps)

In other words, Yogi [1] implements the Adam [2] algorithm, but modifies the update rule of the 'v' state variable that is used to scale the learning rate.

Note that this implementation allows combining the Yogi modification of Adam with the AMSGrad [3] one.

References

[1] Zaheer and Reddi et al., 2018. Adaptive Methods for Nonconvex Optimization.
[2] Kingma and Ba, 2014. Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980
[3] Reddi et al., 2018. On the Convergence of Adam and Beyond. https://arxiv.org/abs/1904.09237

Source code in declearn/optimizer/modules/_adaptive.py

class YogiModule(AdamModule):
    """Yogi additive adaptive moment estimation module.

    This module implements the following algorithm:

        Init(beta_1, beta_2, eps):
            state_m = 0
            state_v = 0
        Step(grads, step):
            state_m = beta_1*state_m + (1-beta_1)*grads
            sign_uv = sign(state_v - grads**2)
            state_v = state_v + sign_uv*(1-beta_2)*(grads**2)
            m_hat = state_m / (1 - beta_1**step)
            v_hat = state_v / (1 - beta_2**step)
            grads = state_m / (sqrt(v_hat) + eps)

    In other words, Yogi [1] implements the Adam [2] algorithm,
    but modifies the update rule of the 'v' state variable that
    is used to scale the learning rate.

    Note that this implementation allows combining the Yogi
    modification of Adam with the AMSGrad [3] one.

    References
    ----------
    - [1]
        Zaheer and Reddi et al., 2018.
        Adaptive Methods for Nonconvex Optimization.
    - [2]
        Kingma and Ba, 2014.
        Adam: A Method for Stochastic Optimization.
        https://arxiv.org/abs/1412.6980
    - [3]
        Reddi et al., 2018.
        On the Convergence of Adam and Beyond.
        https://arxiv.org/abs/1904.09237
    """

    name: ClassVar[str] = "yogi"

    def __init__(
        self,
        beta_1: float = 0.9,
        beta_2: float = 0.99,
        amsgrad: bool = False,
        eps: float = 1e-7,
    ) -> None:
        """Instantiate the Yogi gradients-adaptation module.

        Parameters
        ----------
        beta_1: float
            Beta parameter for the momentum correction
            applied to the input gradients.
        beta_2: float
            Beta parameter for the (Yogi-specific) momentum
            correction applied to the adaptive scaling term.
        amsgrad: bool, default=False
            Whether to implement the Yogi modification on top
            of the AMSGrad algorithm rather than the Adam one.
        eps: float, default=1e-7
            Numerical-stability improvement term, added
            to the (divisor) adapative scaling term.
        """
        super().__init__(beta_1, beta_2, amsgrad=amsgrad, eps=eps)
        self.ewma_2 = YogiMomentumModule(beta=beta_2)

`init(beta_1=0.9, beta_2=0.99, amsgrad=False, eps=1e-07)`

Instantiate the Yogi gradients-adaptation module.

Parameters:

Name	Type	Description	Default
`beta_1`	`float`	Beta parameter for the momentum correction applied to the input gradients.	`0.9`
`beta_2`	`float`	Beta parameter for the (Yogi-specific) momentum correction applied to the adaptive scaling term.	`0.99`
`amsgrad`	`bool`	Whether to implement the Yogi modification on top of the AMSGrad algorithm rather than the Adam one.	`False`
`eps`	`float`	Numerical-stability improvement term, added to the (divisor) adapative scaling term.	`1e-07`

Source code in declearn/optimizer/modules/_adaptive.py

def __init__(
    self,
    beta_1: float = 0.9,
    beta_2: float = 0.99,
    amsgrad: bool = False,
    eps: float = 1e-7,
) -> None:
    """Instantiate the Yogi gradients-adaptation module.

    Parameters
    ----------
    beta_1: float
        Beta parameter for the momentum correction
        applied to the input gradients.
    beta_2: float
        Beta parameter for the (Yogi-specific) momentum
        correction applied to the adaptive scaling term.
    amsgrad: bool, default=False
        Whether to implement the Yogi modification on top
        of the AMSGrad algorithm rather than the Adam one.
    eps: float, default=1e-7
        Numerical-stability improvement term, added
        to the (divisor) adapative scaling term.
    """
    super().__init__(beta_1, beta_2, amsgrad=amsgrad, eps=eps)
    self.ewma_2 = YogiMomentumModule(beta=beta_2)

declearn.optimizer.modules.YogiModule

References

__init__(beta_1=0.9, beta_2=0.99, amsgrad=False, eps=1e-07)

`declearn.optimizer.modules.YogiModule`

`init(beta_1=0.9, beta_2=0.99, amsgrad=False, eps=1e-07)`