对学习率的设置 : 初始学习率设置为0.01,训练过程中,发现初始loss为9.3,约为-log(1/10575)正常,稍加训练后,loss上升到80+,说明学习率设置过大,调整为0.001,并以inv方式进行衰减。发现loss逐渐衰减了。
全连接层Dropout设置为0.7。不同层SGD的参数也不一样,前面除了fc2层,momentum设为0.9,weight decay为5e-4,fc2层为了防止过拟合,weight decay为5e-3。learning rate从1e-3降到5e-5。最终在GTX980上训练了两周。
测试:用lfw的pair.txt测试 迭代次数115万: 96% 迭代次数240万:98%Firstly, you need to generate snapshots. This can be done by specifing in solver.PRototxt file.
snapshot: 500snapshot_prefix: "snapshot/"This means that it will take a snapshot every 500 iterations. And you will see snapshots in the your defined folder snapshot_prefix :
_iter_500.solverstate _iter_500.caffemodel _iter_1000.solverstate _iter_1000.caffemodel ...Once you have the snapshot, you can specify to use the snapshot in the training script.
$caffe train -solver="xxx.prototxt" –snapshot=cifar10_quick_iter_3000.solversThis will start the training at the 3000th iteration
新闻热点
疑难解答