公式：

$\begin{aligned} & m_t^- =\frac{m_t}{1-\beta_1^t} \\ & \nu_t^- = \frac{\nu_t}{1-\beta_2^t} \\ & \theta_t = \theta_{t-1} = \frac{\eta}{\sqrt{ \nu_t^- +\epsilon}\centerdot m_t^-} \end{aligned}$

Adaptive Moment Estimation(Adam) 也是一种不同参数自适应不同学习速率方法，与Adadelta与RMSprop区别在于，它计算历史梯度衰减方式不同，不使用历史平方衰减，其衰减方式类似动量

论文中建议默认值： β1=0.9，β2=0.999，ϵ=10−8。论文中将Adam与其它的几个自适应学习速率进行了比较，效果均要好。

参考： https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650728879&idx=3&sn=897571064a4b367c08d1aaef6360832b&scene=21#wechat_redirect

Adam.md

results matching ""

No results matching ""