• 0 Posts
  • 2 Comments
Joined 1 year ago
cake
Cake day: November 9th, 2023

help-circle

  • While there are some techniques for sub-quadratic transformer scaling (Longformer, Longnet, Retentive Network, and others), there haven’t really been any state of the art models trained with these techniques and they cannot be applied to existing models, so they aren’t really being used. As for why these techniques aren’t being used in new models, that is either because they harm performance too much, or are considered to be too risky and more proven architectures are used instead. If there were models trained using these techniques and they were good, then people would use them, but for now there just isn’t any models actually using them, so we are stuck with O(n^2) computation and memory.