RAG - Vectara's Hallucination leaderboard

AdamDhahabi@alien.top · 1 year ago

Unethical practices, one-man-shops attempting to pump up the account value artificially, aiming for a sale later on.

AdamDhahabi@alien.top · 1 year ago

Can be done with a Mattermost self-hosted server https://github.com/mattermost/openops

AdamDhahabi@alien.top · 1 year ago

Interested to know how it scores for RAG use cases, there is a benchmark for that https://github.com/vectara/hallucination-leaderboard

Up to now, Mistral underperforms Llama2.

AdamDhahabi@alien.top · 1 year ago

FYI, discussed here 11 days ago https://www.reddit.com/r/LocalLLaMA/comments/17m2lql/best_framework_for_llm_based_applications_in/

AdamDhahabi@alien.top · 1 year ago

Llama.cpp supports batched inference since 4 weeks https://github.com/ggerganov/llama.cpp/issues/2813

-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)

AdamDhahabi@alien.top · 1 year ago

RAG - Vectara's Hallucination leaderboard

AdamDhahabi@alien.top · 1 year ago

Yesterday I tried GPT4All, and it references context by outputting 3 passages from my local documents. I could click on each of them and read the passage. But their implementation is only using some algorithm at the moment. Embedding based on semantic search is still on their roadmap.

https://preview.redd.it/dnoqmk4olazb1.png?width=1807&format=png&auto=webp&s=cdd1f17a2ea20100504c275094e52b61a6e054f7

AdamDhahabi@alien.top · 1 year ago

I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don’t have hands-on experience with it yet.