RAG beginner questions

PersonSuitLevitating@alien.top · 2 years ago

RAG beginner questions

MLRS99@alien.top · 2 years ago

Any ideas to get better quality responses with document search7retrival?

I’ve done some experiments with langchan and its embedchain and I must say that the quality of answers when querying with embedded documents is lacking.

Do I need to use better prompting - if so what kind of technique is best?

Often i get responses like ‘found nothing’ etc.

__SlimeQ__@alien.top · 2 years ago

Let me start off by saying I haven’t gotten this right yet. But still…

When autogpt went viral, the thing everyone started talking about was vector db’s and how they can magically extend the context window. This was not a very informed idea and implementations have been lacking.

It turns out that merely finding similar messages in the history and dumping them into the context is not enough. While this may sometimes give you a valuable nugget, most of the time it will just fill the context with repetitive garbage.

What you really need for this to work imo, is a structured narrative around finding the data, reading it, and reporting the data. LLMs respond extremely poorly to random, disconnected dialogue. They don’t know what to do with it. So for one thing, you’ll need a reasonable amount of pre-context for each data point so that the bot can even understand what’s being talked about. But now this is prohibitively long, 4 or 5 matches on your search and your context is probably full. So you’ll need to do some summarizing before squeezing it into the live conversation, which means your request takes 2x longer, at a minimum, and then you need to weave that into your chat context in as natural a way as possible.

Honestly RAG as a task is so weird that I no longer expect any general models to be capable of it. Especially not 7B/13B. Even gpt4 can just barely do it. I think with a very clever dataset somebody could make an effective RAG Lora, but I’ve yet to see it.

soba-yokai@alien.top · 2 years ago

Have you seen the Self-RAG architecture that came out recently? I’m curious what you’d think of it.