Tuning configurations#
Parameters#
Training duration#
max_epochs: Optional[int] = None
max_steps: Optional[int] = None
Respectively the maximum number of epochs (full pass across the dataset) or [optimisation] steps to train for. If both are set, whichever of these two is reached first will stop training.
Batch size#
batch_size: int = 64
.
This is the number of sample in a forward-backward pass. If you use several devices and/or have
device batches of a size bigger than \(1\), this must be a multiple of device_batch_size*total_devices
Adam parameters#
betas: Tuple[float, float] = (0.9, 0.98)
epsilon: float = 1e-8
learning_rate: float = 1e-4
weight_decay: Optional[float] = None
These are respectively the \(β\) and \(ε\) parameters and the base learning rate for the Adam optimizer [Kingma and Ba, 2014] and the weight decay rate. See the Pytorch documentation for more details.
Gradient clipping#
gradient_clipping: Optional[Union[float, int]] = None
If non-None
, this is the maximum allowed gradient norm. Longer gradients will be clipped to this
length, preserving their direction. See the Pytorch
documentation for
implementation details.
Learning rate schedule#
lr_decay_steps: Optional[int] = None
warmup_steps: int = 0
These are the number of step in the slanted triangular learning rate schedule
[Howard and Ruder, 2018]: the base learning rate is made to follow an upward linear
slope for warmup_steps
steps up to learning_rate
, then decayed linearly to \(0\) in
lr_decay_steps
.
Note that setting lr_decay_steps
overrides max_steps
.
Bibliography#
Jeremy Howard and Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 328–339. Association for Computational Linguistics, July 2018. doi:10.18653/v1/P18-1031.
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations. December 2014. URL: http://arxiv.org/abs/1412.6980 (visited on 2019-03-04), arXiv:1412.6980.