I am confused about these 2 . Sometimes people use it interchangeably. Is it because rag is a method and where u store it should be vector db ? I remember before llms there was word2vec in the beginning ,before all of this llm. But isn’t the hard part to create such a meaningful word2vec , by the way word2vec is now called “embeddings” right?
RAG, which stands for Retrieval-Augmented Generation, is a method that combines retrieval and generation models to improve the quality of natural language processing tasks such as text generation and question-answering. VectorDB is a specific database used for storing vectors, which are numerical representations of words or documents. These vectors are often used in conjunction with RAG models to enable efficient retrieval and generation of text.
It’s not entirely accurate to use RAG and VectorDB interchangeably because RAG refers to the method or model, while VectorDB refers to the specific database used to store vectors. RAG can be implemented using various databases, not just VectorDB.
Word2vec is indeed an earlier method for generating word embeddings, which are numerical representations of words. Word embeddings, including those generated by Word2vec, are often used as a foundation for various natural language processing tasks. The term “embeddings” is used more broadly to refer to any type of numerical representation of words or documents, not just limited to Word2vec.
Creating meaningful word embeddings is indeed a challenging task, and it’s an active area of research in natural language processing. The quality of word embeddings can significantly impact the performance of downstream NLP tasks, so there is ongoing effort to improve the methods for generating and using embeddings in models.
Source: gpt-3.5-turbo-1106 :-)
word2vec is a library used to create embeddings.
Embeddings are vector representations (a list of numbers) that represents the meaning of some text.
RAG is retrieval augmented generation. It is a method to get better answers from GPT by giving it relevant pieces of information along with the question.
RAG is done by:
- taking a long text splitting it into pieces
- creating embeddings for each piece and storing those
- when someone asks a question, create an embedding for the question
- find the most similar embeddings of sections from the long text say the top 3
- send those 3 pieces along with the question to GPT for the answer
A vector DB is a way to store those section embeddings and to also search for the most relevant sections. You do not need to use a vector DB to perform RAG, but it can help. Particularly if you want to store and use RAG with a very large amount of information.
This is a very good answer, but I’ll try to elaborate to make things clearer:
RAG is done by:
-
Taking a long text and splitting it into chunks of a certain size/length.
-
You take each chunk of text, and run it through a function which turns the text into a vector representation. This vector representation is called an embedding, and the function used is an embedding function/model. E.g. OpenAIEmbeddings(). You then generally store these vectors in a vector database (Qdrant, Weviate ++).
-
When someone asks a question, create an embedding for the question.
-
Since your question is a vector (embedding), and your data is represented as vectors (embeddings) in your vector db (from 2), you can then compare your question vector with your data vectors. Technically you measure distance between your question vector to vectors in your vector db. Vectors closer to your questions, is likely to contain data relevant to your question.
-
You grab the text corresponding to the (e.g.) 3 closest vectors from your vector db. The text is often stored along with the vector for retrieval purposes. You send that text + question to your LLM (e.g. GPT-4), and implicitly say: “Answer this question based on only these 3 chunks of text.” That way you sort of limit the language models knowledge to what you explicitly give it.
Oh thanks in 5. You also answered 1 question in my mind, how to return back to words from floatin point numbers. Then now i understand they are created by specific embedding creator models. And I guess every result is different then other models result. So isn’t this so important like best embedding creator model and query creator model, which one is more successful right now now? And if i create an embedding in one creator model , i can’t create an embedding query with different embedding creator model to query my embedding?
-
you could do a rag that uses vector embeddings, but you could also just ask the llm for a search query and use that to search a database and that would still be a rag
This is interesting, you are saying like , you have embeddings on vector db , and you ask llm to give you some kind of sql query to search in vec db ?
Most often you search the vector db with natural language, there is no special schema to use but you do need to consider how the embedding model is capturing the vectors so it is matched with the embedded query. RAG actually also describes when the LLM is driving the searches, and is the only way I have coded it, the user may ask for something but the LLM creates the search query based on that and the conversation history.
!remind me 2 days
Word2vec is a method of transforming words to vectors. An embedding in its most general sense just means a transformed representation. In the case of AI, it’s something that is (again at its most basic level) math friendly. What’s the most math friendly format AI people love? Vectors. Note some embeddings are more complicated (ex: query key value), but fundamentally it’s still using vector math, just shifted around a bit to add extra capabilities.
RAG is like the entire system for shoving data you need into a prompt from a pool a data, Vector DB is the place the data gets stored for an RAG.
Note vectorDB is kind of like storing data in a word2vec format so you can perform vector math to find similar things.