lans_throwaway@alien.topB to

LocalLLaMAEnglish · 3 years ago

Look ahead decoding offers massive (~1.5x) speedup for inference

4

cross-posted to:
localllama

1

Look ahead decoding offers massive (~1.5x) speedup for inference

lans_throwaway@alien.topB to

LocalLLaMAEnglish · 3 years ago

4

cross-posted to:
localllama

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding | LMSYS Org

TL;DR: We introduce lookahead decoding, a new, exact, and parallel decoding algorithm to accelerate LLM inference. Look...

Chat

_Lee_B_@alien.topB
link
fedilink
English
arrow-up
1·
3 years ago
Hmm, it looks like such a standard linear algebra optimisation that I’m surprised GPUs don’t do it automatically. But yep, looks good, either way.