It might help to think of RAG as multiple steps (Retrieve, Augment, Generate), all of which you can debug / look at, to see where it might be failing.
What I would do is look first at the retrieval stage. This is where you are executing a Vector (or Hybrid or whatever) search against your vector store and retrieve a set of documents that match your query. Keep in mind, in Retrieve, you are not sending the vectorized prompt, but more likely the question the user is asking. Take a look at what is coming back and make sure they seem correct. If not, there is probably something wrong here to look at. BTW, I personally prefer to start with 500 tokens with around 50 tokens of overlap between chunks, but that can vary greatly on models, content, etc.
If that works, I would then look at the “Augment” part which is where you are injecting the results from the Retrieval stage into your prompt. Does it look correct? I doubt this is where the issue is, but worth a look.
Finally take a look at what comes in the “Generate” stage when you pass this augmented prompt. Does it look different from what you saw previously?
It might help to think of RAG as multiple steps (Retrieve, Augment, Generate), all of which you can debug / look at, to see where it might be failing.
What I would do is look first at the retrieval stage. This is where you are executing a Vector (or Hybrid or whatever) search against your vector store and retrieve a set of documents that match your query. Keep in mind, in Retrieve, you are not sending the vectorized prompt, but more likely the question the user is asking. Take a look at what is coming back and make sure they seem correct. If not, there is probably something wrong here to look at. BTW, I personally prefer to start with 500 tokens with around 50 tokens of overlap between chunks, but that can vary greatly on models, content, etc.
If that works, I would then look at the “Augment” part which is where you are injecting the results from the Retrieval stage into your prompt. Does it look correct? I doubt this is where the issue is, but worth a look.
Finally take a look at what comes in the “Generate” stage when you pass this augmented prompt. Does it look different from what you saw previously?