What UI do you use and why?

Deadlibor@alien.top · 2 years ago

What UI do you use and why?

LoSboccacc@alien.top · 2 years ago

Bettergpt with llama.cpp server and its openai adapter, sleek, supports editing past messages without truncating the history, swapping roles at any time etc.

sophosympatheia@alien.top · 2 years ago

Text Gen Web UI + Silly Tavern for me. Works like a charm.

sumrix@alien.top · 2 years ago

TavernAI, because it’s simple and easy to use.

mcmoose1900@alien.top · 2 years ago

Don’t forget exui: https://github.com/turboderp/exui

Once it implements notebook mode, I am probably going to switch to that, as all my reasons for staying on text gen ui (the better samplers, notebook mode) will be pretty much gone, and (as said below) text gen ui has some performance overhead.

ReturningTarzan@alien.top · 2 years ago

Notebook mode is almost ready. Probably I’ll release later today or early tomorrow.

mcmoose1900@alien.top · 2 years ago

BTW, one last thing on my wishlist (in addition to notebook mode) is prompt caching/scrolling.

I realized that the base exllamav2 backend in ooba (and not the HF hack) doesn’t cache prompts, so prompt processing with 50K+ context takes well over a minute on my 3090. I don’t know if that’s also the case in exui, as I did not try a mega context prompt in my quick exui test.

ReturningTarzan@alien.top · 2 years ago

Well, it depends on the model and stuff, and how you get to that 50k+ context. If it’s a single prompt, as in “Please summarize this novel: …” that’s going to take however long it takes. But if the model’s context length is 8k, say, then ExUI is only ever going to do prompt processing on up to 8k tokens, and it will maintain a pointer that advances in steps (the configurable “chunk size”).

So when you reach the end of the model’s native context, it skips ahead e.g. 512 tokens and then you’ll only have full context ingestion again after a total 512 tokens of added context. As for that, though, you should never experience over a minute of processing time on a 3090. I don’t know of a model that fits in a 3090 and takes that much time to inference on. Unless you’re running into the NVIDIA swapping “feature” because the model doesn’t actually fit on the GPU.

mcmoose1900@alien.top · 2 years ago

I don’t know of a model that fits in a 3090 and takes that much time to inference on

Yi-34B-200K is the base model I’m using. Specifically the Capybara/Tess tunes.

I can squeeze 63K context on it at 3.5bpw. Its actually surprisingly good at continuing a full context story, referencing details throughout and such.

Anyway I am on linux, so no gpu swap like windows. I am indeed using it in a chat/novel style chat, so the context does scroll and get cached in ooba.

ReturningTarzan@alien.top · 2 years ago

Notepad mode is up fwiw. It probably needs more features, but it’s functional.

ProfessionalGuitar32@alien.top · 2 years ago

I use synology chat

Sabin_Stargem@alien.top · 2 years ago

KoboldCPP + Silly Tavern. I would use the KoboldAI frontend instead of Silly Tavern, if it weren’t for the fact that it is intended to create a dedicated system volume in order to work well. I personally find that creepy and unsettling, because I am uncomfortable with the technical aspects of computing. I can do intermediate stuff, but I still feel unhappy at the very idea of ever needing to troubleshoot.

Anyhow, I hope a commercial all-in-one LLM program is made, one meant for user privacy, roleplaying, approachable, open source, content editors, and an integrated marketplace for characters, rules, and other content. While the freeware efforts are neat, I am a boring person who wants things to Just Work, with only basic tinkering on my end.

At the moment, KoboldCPP + ST is probably the closest to being user-friendly without sacrificing privacy nor being subjected to a subscription.

Unlucky-Message8866@alien.top · 2 years ago

my own: https://github.com/knoopx/llm-workbench reasons: fast, private, lightweight, hackeable

sime@alien.top · 2 years ago

You’re kidding me. I recently surfaced my own UI with the same name. damn it. -> https://github.com/sedwards2009/llm-workbench

Monkey_1505@alien.top · 2 years ago

ST. By far the most customizability.

OC2608@alien.top · 2 years ago

I used to use Text Generation Web UI, but I changed to KoboldCpp because it’s more lightweight. Besides, I realized I didn’t use all the features of the textgen UI. KoboldCpp as the backend and SillyTavern as the frontend when I want to chat. KoboldCpp alone when I want to play with models by creating stories or something.

BangkokPadang@alien.top · 2 years ago

Text gen web ui. Let’s me use all model formats depending on what I want to test at that moment.

Couler@alien.top · 2 years ago

rocm version of KoboldCPP on my AMD+Linux

TobyWonKenobi@alien.top · 2 years ago

LM Studio - very clean UI and easy to use with gguf.

SomeOddCodeGuy@alien.top · 2 years ago

Text Gen Web UI. Works great on Mac. I use ggufs, since Llamacpp supports metal.

durden111111@alien.top · 2 years ago

Text Gen UI for general inference

llama.cpp server for multimodal

sebo3d@alien.top · 2 years ago

KoboldCPP. Double click Kobold Icon, Load, select preset, Launch. 10 or so second later you’re good to go. Easy, quick, efficient.