公式: Adadelta 是Adagrad的一种扩展
gt=∇J(θt−1)Gt=γGt+(1−γ)gt⊙gt∇θt=∇t−1+ϵGt+ϵ∇t=γ∇t−1+(1−γ)∇θt⊙∇θtθt=θt−1+∇θt \begin{aligned} &g_t= \nabla J(\theta_{t-1}) \\ &G_t = \gamma G_t+(1-\gamma)g_t \odot g_t\\ &\nabla \theta_t = \frac{\sqrt{\nabla_{t-1}+\epsilon}}{\sqrt{G_t + \epsilon}}\\ &\nabla_t = \gamma\nabla_{t-1} +(1-\gamma)\nabla\theta_t\odot\nabla\theta_t \\ &\theta_t = \theta_{t-1}+\nabla\theta_t \end{aligned} gt=∇J(θt−1)Gt=γGt+(1−γ)gt⊙gt∇θt=√Gt+ϵ√∇t−1+ϵ∇t=γ∇t−1+(1−γ)∇θt⊙∇θtθt=θt−1+∇θt