
Originally Posted by
Jarek
- Hessian should be estimated online to have better local dependence, preferably from gradient sequence alone. It can be done e.g. by linear regression of gradients (as Hessian is their linear trend), by updating four exponential moving averages: of g, x, gx, xx. Sure quasi-Newton can also estimate from gradients, but from a few - making it numerically unstable. Exponential moving average, linear regression should be much better for extracting statistical trends from noisy gradients.