Are you balanced yourself? or it’s just your false ideas of balance?
- 1 Post
- 9 Comments
noobgolang@alien.topBto LocalLLaMA•Need help setting up a cost-efficient llama v2 inference API for my micro saas appEnglish1·1 year agofor cuda version you can use this link for linux version https://github.com/janhq/nitro/releases/download/v0.1.17/nitro-0.1.17-linux-amd64-cuda.tar.gz , you need to make sure the system has cudatoolkit. i remcommend following the exact step in quickstart docs here https://nitro.jan.ai/quickstart to make sure it will work
noobgolang@alien.topBto LocalLLaMA•Need help setting up a cost-efficient llama v2 inference API for my micro saas appEnglish1·1 year agom1 models of apple and on main page it mentions m2 models as well?
yeah arm64 mac should be able to run on all mac m1 and m2 including, we also have cuda version in the release
noobgolang@alien.topBto LocalLLaMA•Need help setting up a cost-efficient llama v2 inference API for my micro saas appEnglish1·1 year agoalso the build is 100% built in public with the source code on the page, you can check the Actions button to see it, there is nothing hidden here
noobgolang@alien.topBto LocalLLaMA•Need help setting up a cost-efficient llama v2 inference API for my micro saas appEnglish1·1 year agoyou can try https://nitro.jan.ai/ its built for this purpose
i self host it on my homelab very good
Disclosure : I’m the maintainer of nitro project
We have a simple llama server with just single binary that you can download try right away here https://github.com/janhq/nitro it will be a viable option if you want to set up an openai compatible endpoint to test out new model
I use this
https://nitro.jan.ai/