Ok_Post_149@alien.topB to

LocalLLaMAEnglish · 1 year ago

I scaled Mistral 7B to 200 GPUs in less than 5 minutes

5

1

I scaled Mistral 7B to 200 GPUs in less than 5 minutes

Ok_Post_149@alien.topB to

LocalLLaMAEnglish · 1 year ago

5

I’ve been working on a project with my roommate to make it incredibly simple to run batch inference on LLMs while leveraging a massive amount of cloud resources. We finally got the tool working and created a tutorial on how to use it on Mistral 7B.

Also, if you’re a frequent HuggingFace user you can easily adapt the code to run inference on other LLM models. Please test it out and provide feedback, I feel really good about how easy it is to use but I want to figure out if anything is not intuitive. I hope the community is able to get some value out of it! Here is the link to the tutorial https://docs.burla.dev/Example:%20Massively%20Parallel%20Inference%20with%20Mistral-7B

Chat

Ok_Post_149@alien.topOPB
link
fedilink
English
arrow-up
1·
1 year ago
This is really cool! We are more focused on lengthy workloads so running 500k inputs through an LLM in one batch instead of on-demand inference (starting to support this). Right now the startup time is pretty long (2-5 minutes) but we are working on cutting it down.