ctransformers VS llama-cpp-python which one should I use?

ZiadHAsan23@alien.top · 3 years ago

ctransformers VS llama-cpp-python which one should I use?

vatsadev@alien.top · 3 years ago

Also AWQ has entire engines for efficieny, look into aphrodite engine, supposably the fastest for awq

mcmoose1900@alien.top · 3 years ago

vLLM is way faster, but its pretty barebones and VRAM spikes hard.

vatsadev@alien.top · 3 years ago

“Do I need to learn llama.cpp or C++ to deploy models using llama-cpp-python library?” No its pure python

andershaf@alien.top · 3 years ago

Have you tried ollama?

randull@alien.top · 3 years ago

I use both, and they’re pretty interchangeable in my experience. It looks like it’s been 2 months since the last ctransformers update, and llama-cpp has always been a little more popular with more contributors and stars on github.

_Lee_B_@alien.top · 3 years ago

Learn docker compose. Run ollama as one of your docker containers (it’s already available as a docker container). Run your website server as another docker container. Deploy it securely, and you’re done. If/when you want to scale it and make it more enterprisey, upgrade from docker compose to kubernetes.