BIITKN@alien.topB to

LocalLLaMAEnglish · 2 years ago

A new way to speed up the work of transformers.

5

1

A new way to speed up the work of transformers.

BIITKN@alien.topB to

LocalLLaMAEnglish · 2 years ago

5

Has anyone already read this new article on ArXiv? https://arxiv.org/abs/2311.10770

Looks very promising, potential inference acceleration of PyTorch x30, and when implemented on native CUDA x117, and also an estimate of the maximum acceleration x341 times.

As far as I understand, this is achieved by replacing traditional forward propagation layers with so-called fast forward propagation layers.

Is there anyone here with real experience of contributing to the development of PyTorch, llama.cpp or releasing open models, what do you say to this?

Chat

luxsteele@alien.topB
link
fedilink
English
arrow-up
1·
2 years ago
you might want to read here: https://www.reddit.com/r/MachineLearning/comments/1815a05/r_exponentially_faster_language_modelling/