Inferencing with AND X3D Processors

ccbadd@alien.top · 2 years ago

Inferencing with AND X3D Processors

tu9jn@alien.top · 2 years ago

Vcache only helps when you want to access lots of tiny chunks of data that fit inside the 96-128mb cache.

During inference you have to read the entire several Gb model for each token generation, so your botleneck is still the Ram bandwidth.

ccbadd@alien.top · 2 years ago

In the article they said that that is what was expected but the gains impacted the entire ramdrive and the concept has been proven now. The test used a 500mb+ block so bigger than the cache alone.

https://www.tomshardware.com/news/amd-3d-v-cache-ram-disk-182-gbs-12x-faster-pcie-5-ssd