There’s a bunch of examples in the repo. Various Python scripts for doing inference and such, even a Colab notebook now.
As for the “usual” Python/HF setup, ExLlama is kind of an attempt to get away from Hugging Face. It reads HF models but doesn’t rely on the framework. I’ve been meaning to write more documentation and maybe even a tutorial, but in the meantime there are those examples, the project itself, and a lot of other projects using it. TabbyAPI is coming along as a stand-alone OpenAI-compatible server to use with SillyTavern and in your own projects where you just want to generate completions from text-based requests, and ExUI is a standalone web UI for ExLlamaV2.
Yes, the model directory is just all the files from a HF model, in one folder. You can download them directly from the “files” tab of a HF model by clicking all the little download arrows, or there’s
huggingface-cli
. Alsogit
can be used to clone models if you’ve gotgit-lfs
installed.It specifically needs the following files:
But it may utilize other files in the future such as tokenizer_config.json, so best just to download all the files and keep them in one folder.