`declearn.optimizer.modules.YogiMomentumModule`

Bases: EWMAModule

Yogi-specific momentum gradient-acceleration module.

This module impements the following algorithm:

Init(beta):
    state = 0
Step(grads):
    state = state + sign(state-grads)*(1-beta)*grads
    grads = state

In other words, gradients are corrected in a somewhat-simlar fashion as in the base momentum formula, but so that the magnitude of the state update is merely a function of inputs rather than of both the inputs and the previous state [1].

Note that this module is actually meant to be used to compute a learning-rate adaptation term based on squared gradients.

References

[1] Zaheer and Reddi et al., 2018. Adaptive Methods for Nonconvex Optimization.

Source code in declearn/optimizer/modules/_momentum.py

class YogiMomentumModule(EWMAModule):
    """Yogi-specific momentum gradient-acceleration module.

    This module impements the following algorithm:

        Init(beta):
            state = 0
        Step(grads):
            state = state + sign(state-grads)*(1-beta)*grads
            grads = state

    In other words, gradients are corrected in a somewhat-simlar
    fashion as in the base momentum formula, but so that the
    magnitude of the state update is merely a function of inputs
    rather than of both the inputs and the previous state [1].

    Note that this module is actually meant to be used to compute
    a learning-rate adaptation term based on squared gradients.

    References
    ----------
    [1] Zaheer and Reddi et al., 2018.
        Adaptive Methods for Nonconvex Optimization.
    """

    name: ClassVar[str] = "yogi-momentum"

    def run(
        self,
        gradients: Vector,
    ) -> Vector:
        sign = (self.state - gradients).sign()
        self.state = self.state - (sign * (1 - self.beta) * gradients)
        return self.state