Gluon Learning Rate Finder


Problem

Selecting a learning rate and learning rate schedule are hyperparameters that all ML engineers need to address. Picking a learning rate too low will cause experiments to run longer than needs (and experiments may get stuck in local minimums), while selecting too large of learning rate will cause experiments to diverge.

Goals/Use cases

The goal of this project is to provide an API to easily select the optimal learning rate and learning rate schedule.

Proposed Approach

Smith' approach for selecting the learning rate seems to be the most cited (https://arxiv.org/abs/1506.01186). The technique involves picking a very small learning rate and increasing it slowly after every iteration while recording the validation loss (see below).
The learning rate to use is the point at which the validation loss decreases the most (somewhere between 0.001 and 0.01 in this example). We could get a more exact value by taking the derivative of the above plot (after smoothing) and looking for the most negative value (which indicates the greatest decrease).


The above figure indicates that this point is around 0.01.

We currently have a tutorial for how to implement this with mxnet in our documentation (https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/learning_rate_finder.html); however, this is not exposed as an official package nor is the implementation easy to use (the developer needs to implement three classes - LRFinderStoppingCriteria, LRFinder, Learner + ContinuousBatchSampler). Additionally, it only produces the above figure for the developer to interpret and does not return a suggested learning rate. We could provide an API (inspired by fast.ai) that provides a complete solution that returns back a learning rate.

net = mx.gluon.model_zoo.vision.resnet18_v2(classes=10)
learner = Learner(net=net, data_loader=data_loader, ctx=ctx)
lr = learner.find_lr()

The find_lr method should take some time to run as it is iterating through the dataset until the val_loss diverges

Current Implementations

Fast.AI

Fast.AI has the most complete implementation for finding the the optimal learning rate; however it is build on top of PyTorch (https://github.com/fastai/fastai/blob/master/old/fastai/learner.py#L309)

Keras

Keras provides an API similar to the mxnet learning_rate_finder already implemented (https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/learning_rate_finder.html) where the output is a figure and the developer is expected to select the learning rate after looking.

https://github.com/surmenok/keras_lr_finder

Open Questions

  • Provided a learning rate, what the best learning rate schedule that generalizes the best. Below are a few schedules.


Triangular

Cosine

Warm-up

Cool-down

Mxnet provides a Scheduler class that gives the developer flexibility to implement these. However, if our goal is to eliminate selecting the learning rate + schedule, we should determine the scheduler that generalizes the best so that the developer does not need to try them all out.

  • No labels