Mistral 7B runs around 7 tokens per second on a regular CPU, that is like 5 words per second.
On above setups 512 GB ram size we can fit a 512B parameters model, that will run 5*7/512=0.068 words per second with the current architecture, if this new architecture actually works and give 78x speed up it will be 5.3 words per second, the average persons reading speed is around 4 words per second. And average persons speaking speed is around 2 words per second.
Fingers crossed this can put a small dent on Nvidia’s stock price.
I am just gonna do some bad maths.
For the price of single 4090 you can get
CPU Mainboard combo with 16 ram slots. $1,320
16 x 32 GB ddr4 ram $888
Mistral 7B runs around 7 tokens per second on a regular CPU, that is like 5 words per second.
On above setups 512 GB ram size we can fit a 512B parameters model, that will run 5*7/512=0.068 words per second with the current architecture, if this new architecture actually works and give 78x speed up it will be 5.3 words per second, the average persons reading speed is around 4 words per second. And average persons speaking speed is around 2 words per second.
Fingers crossed this can put a small dent on Nvidia’s stock price.