Kavli Affiliate: Jing Wang | First 5 Authors: Yunfei Teng, Jing Wang, Anna Choromanska, , | Summary: Modern deep learning (DL) architectures are trained using variants of the SGD algorithm that is run with a $textit{manually}$ defined learning rate schedule, i.e., the learning rate is dropped at the pre-defined epochs, typically when the training loss […]
Continue.. AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop