Can be done with a Mattermost self-hosted server https://github.com/mattermost/openops
- 1 Post
- 7 Comments
AdamDhahabi@alien.topBto
LocalLLaMA•Neural-chat-7b-v3-1 GGUF. New Mistral finetuneEnglish
1·2 years agoInterested to know how it scores for RAG use cases, there is a benchmark for that https://github.com/vectara/hallucination-leaderboard
Up to now, Mistral underperforms Llama2.
AdamDhahabi@alien.topBto
LocalLLaMA•Anyone Hosting Llama Models in Production? Seeking Insights on Scaling and Resource OptimizationEnglish
1·2 years agoLlama.cpp supports batched inference since 4 weeks https://github.com/ggerganov/llama.cpp/issues/2813
-cb, --cont-batching enable continuous batching (a.k.a dynamic batching) (default: disabled)
AdamDhahabi@alien.topBto
LocalLLaMA•Chunking and storing structured data and vectors for RAGEnglish
1·2 years agoYesterday I tried GPT4All, and it references context by outputting 3 passages from my local documents. I could click on each of them and read the passage. But their implementation is only using some algorithm at the moment. Embedding based on semantic search is still on their roadmap.
I think batched inference is a must for companies who want to put an on-premise chatbot in front of their users. This is a use case many are busy with at the moment. I saw llama.cpp now supports batched inference, only since 2 weeks, I don’t have hands-on experience with it yet.
Unethical practices, one-man-shops attempting to pump up the account value artificially, aiming for a sale later on.