Auto-regressive just means it’s a time series that depends on its previous predictions.
So, when you predict a token at time t – you condition on the previous tokens you already predicted.
Consider, “the cat in the hat”. A transformer that predicted it would have predicated it in the following manner (assuming that each of the words are a token bc I’m lazy):
-P(“the”|prompt) is highest
-P(“cat”|“the”,prompt) is highest
-P(“in”|“the”,“cat”,prompt) is highest
So you can see there is a dependency between each of its predictions and the next prediction. This is what is meant by auto-regressive.
ermm, idk what you mean by any of those words.
Auto-regressive just means it’s a time series that depends on its previous predictions.
So, when you predict a token at time t – you condition on the previous tokens you already predicted.
Consider, “the cat in the hat”. A transformer that predicted it would have predicated it in the following manner (assuming that each of the words are a token bc I’m lazy):
-P(“the”|prompt) is highest
-P(“cat”|“the”,prompt) is highest
-P(“in”|“the”,“cat”,prompt) is highest
So you can see there is a dependency between each of its predictions and the next prediction. This is what is meant by auto-regressive.