A bit related. I think all the tools mentioned here are for using an existing UI.
But what if you wanted to easily roll your own, preferably in Python. I know of some options:
Gradio https://www.gradio.app/guides/creating-a-custom-chatbot-with-blocks
Panel https://www.anaconda.com/blog/how-to-build-your-own-panel-ai-chatbots
Reflex (formerly Pynecone) https://github.com/reflex-dev/reflex-chat https://news.ycombinator.com/item?id=35136827
Solara https://news.ycombinator.com/item?id=38196008 https://github.com/widgetti/wanderlust
I like streamlit (simple but not very versatile) And reflex seems to have a richer set of features.
My questions - Which of these do people like to use the most? Or are the tools mentioned by OP also good for rolling your own UI on top of your own software ?
Langroid has a DocChatAgent, you can see an example script here -
https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py
Every generated answer is accompanied by Source (doc link or local path), and Extract (the first few and last few words of the reference — I avoid quoting the whole sentence to save on token costs).
There are other variants of RAG scripts in that same folder, like multi-agent RAG (doc-chat-2.py) where you have one master agent delegating smaller questions to a retrieval agent and asking it in different ways if it can’t answer etc. There’s also a doc-chat-multi-llm.py where you can have the master agent powered by GPT4 and the RAG agent powered by a local LLM (because after all it only needs to do extraction and summarization).
> intuitively it seems like you might be able to avoid calling a model at all b/c shouldn’t the relevant sentences just be closer to the search
Not really, as I mention in my reply to u/jsfour above: Embeddings will give you similarity to the query, whereas an LLM can identify relevance to answering a query. Specifically, embeddings won’t be able to find cross-references (e.g. Giraffes are tall. They eat mostly leaves), and won’t be able to zoom in on answers -- e.g. the President Biden question I mention there.
Here is the comparison for that specific example.
You mean we don’t need to use llama-cpp-Python anymore to serve this at an OAI-like endpoint?