SiaNet.Model.Optimizers Namespace
Classes
Class | Description | |
---|---|---|
![]() |
AdaDelta |
Adadelta is an extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rate. Instead of accumulating all past squared gradients, Adadelta restricts the window of accumulated past gradients to some fixed size w.
|
![]() |
AdaGrad |
Adagrad is an algorithm for gradient-based optimization that does just this: It adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters
|
![]() |
Adam |
Adaptive Moment Estimation (Adam) is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vtvt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mtmt, similar to momentum.
|
![]() |
Adamax |
The Vt factor in the Adam update rule scales the gradient inversely proportionally to the ℓ2 norm of the past gradients (via the vt−1 term) and current gradient.
|
![]() |
BaseOptimizer | |
![]() |
MomentumSGD |
Momentum of Stochastic gradient descent optimizer.
|
![]() |
RMSProp |
RMSprop is an unpublished, adaptive learning rate method proposed by Geoff Hinton. This optimizer is usually a good choice for recurrent neural networks.
|
![]() |
SGD |
SGD is an optimisation technique. It is an alternative to Standard Gradient Descent and other approaches like batch training or BFGS. It still leads to fast convergence, with some advantages:
- Doesn't require storing all training data in memory (good for large training sets)
- Allows adding new data in an "online" setting
|