TabbyAPI released! A pure LLM API for exllama v2.

panchovix@alien.top · 2 years ago

TabbyAPI released! A pure LLM API for exllama v2.

a_beautiful_rhind@alien.top · 2 years ago

Nice. A lightweight loader. Will make us free of gradio.

oobabooga4@alien.top · 2 years ago

Gradio is a 70MB requirement FYI. It has become common to see people calling text-generation-webui “bloated”, when most of the installation size is in fact due to Pytorch and the CUDA runtime libraries.

https://preview.redd.it/pgfsdld7xw0c1.png?width=370&format=png&auto=webp&s=c50a14804350a1391d57d0feac8a32a5dcf36f68

tronathan@alien.top · 2 years ago

Gradio is a 70MB requirement

That doesn’t make it fast, just small. Inefficient code can be compact.

kpodkanowicz@alien.top · 2 years ago

I think there is room for everyone - Text Gen is a piece of art - it’s the only thing in the whole space that always works and is reliable. However, if im building an agent and getting a docker build, I can not afford to change text gen etc.

Right-Structure-1619@alien.top · 2 years ago

Does anyone know if they expose all the good stuff that Guidance uses for their guided generation and speedup? This plus guidance (kv cache, grammar control, etc) would be fast fast!

panchovix@alien.top · 2 years ago

By the hard work of kingbri, Splice86 and turboderp, we have a new API loader for LLMs using the exllamav2 loader! This is on a very alpha state, so if you want to test it may be subject to change and such.

TabbyAPI also works with SillyTavern! Doing some special configurations, it can work as well.

As a reminder, exllamav2 added mirostat, tfs and min-p recently, so if you used those on exllama_hf/exllamav2_hf on ooba, these loaders are not needed anymore.

Enjoy!

TabbyAPI released! A pure LLM API for exllama v2.

TabbyAPI released! A pure LLM API for exllama v2.

GitHub - theroyallab/tabbyAPI