Selected article for: "initial learning rate and learning rate"

Author: Xuehai He; Xingyi Yang; Shanghang Zhang; Jinyu Zhao; Yichen Zhang; Eric Xing; Pengtao Xie
Title: Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans
  • Document date: 2020_4_17
  • ID: l3f469ht_68
    Snippet: 2) Additional Experimental Settings: Following the same setting in MoCo, we added a 2-layer multi-layer perceptron (MLP) head with 2048 hidden units. The size of the dynamic dictionary was set to 512. Stochastic gradient descent (SGD) was used as the optimizer for self-supervised learning (SSL), with a minibatch size of 128, a weight decay of 0.0001, a momentum of 0.9, and an initial learning rate of 0.015. The learning rate was adjusted by the c.....
    Document: 2) Additional Experimental Settings: Following the same setting in MoCo, we added a 2-layer multi-layer perceptron (MLP) head with 2048 hidden units. The size of the dynamic dictionary was set to 512. Stochastic gradient descent (SGD) was used as the optimizer for self-supervised learning (SSL), with a minibatch size of 128, a weight decay of 0.0001, a momentum of 0.9, and an initial learning rate of 0.015. The learning rate was adjusted by the cosine learning rate scheduler. The training was conducted on 4 GPUs with data parallelism. We carefully design data augmentation methods to serve as the pretext tasks for the Self-Trans methods. Specifically, we utilize data augmentation including random horizontal flip, random cropping with a size of 0.2 in area, random color jittering such as random brightness with a ratio of 0.4, random contrast of 0.4, random saturation of 0.4, random hue of 0.1, Gaussian blur, and random gray-scale conversion.

    Search related documents: