• 0 Posts
  • 2 Comments
Joined 1 year ago
cake
Cake day: November 2nd, 2023

help-circle
  • Multiple passes at lower learning rates isn’t supposed to produce different results.

    Oh yes it is. The whole point of gradient descent is to slowly explore the dimensions of the gradient. With smaller steps you have a totally different trajectory than with bigger steps. And every pass makes you move.

    If you choose a too small learning rate you often will indeed just move slower on the same path but a too big learning rate makes you skip entire paths.

    OP seems to have been in that case with their first attempt.