# Tuning configurations#

## Parameters#

### Training duration#

`max_epochs: Optional[int] = None`

`max_steps: Optional[int] = None`

Respectively the maximum number of epochs (full pass across the dataset) or [optimisation] steps to train for. If both are set, whichever of these two is reached first will stop training.

### Batch size#

`batch_size: int = 64`

.

This is the number of sample in a forward-backward pass. If you use several devices and/or have
device batches of a size bigger than \(1\), this **must** be a multiple of `device_batch_size*total_devices`

### Adam parameters#

`betas: Tuple[float, float] = (0.9, 0.98)`

`epsilon: float = 1e-8`

`learning_rate: float = 1e-4`

`weight_decay: Optional[float] = None`

These are respectively the \(β\) and \(ε\) parameters and the base learning rate for the Adam optimizer [Kingma and Ba, 2014] and the weight decay rate. See the Pytorch documentation for more details.

### Gradient clipping#

`gradient_clipping: Optional[Union[float, int]] = None`

If non-`None`

, this is the maximum allowed gradient norm. Longer gradients will be clipped to this
length, preserving their direction. See the Pytorch
documentation for
implementation details.

### Learning rate schedule#

`lr_decay_steps: Optional[int] = None`

`warmup_steps: int = 0`

These are the number of step in the slanted triangular learning rate schedule
[Howard and Ruder, 2018]: the base learning rate is made to follow an upward linear
slope for `warmup_steps`

steps up to `learning_rate`

, then decayed linearly to \(0\) in
`lr_decay_steps`

.

Note that setting `lr_decay_steps`

overrides `max_steps`

.

## Bibliography#

Jeremy Howard and Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. In *Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics*, 328–339. Association for Computational Linguistics, July 2018. doi:10.18653/v1/P18-1031.

Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In *International Conference on Learning Representations*. December 2014. URL: http://arxiv.org/abs/1412.6980 (visited on 2019-03-04), arXiv:1412.6980.