We’re going to need more modular designs for LLMs.

ZABKA_TM@alien.top · 2 years ago

We’re going to need more modular designs for LLMs.

Ion_GPT@alien.top · 2 years ago

I got the idea, but I think your example is bad. History is something that is changing really slow and is not even changing, it just increases.

waxbolt@alien.top · 2 years ago

Do you know about LoRA?

LocoLanguageModel@alien.top · 2 years ago

I keep hearing about her.

Only-Letterhead-3411@alien.top · 2 years ago

I think there’s a solution to that; LLM internet access.AI just needs to be smart enough to be able to use tools like a human. Then it can effectively search for the right question and then extract the right answer from the internet.

waxbolt@alien.top · 2 years ago

Frustratingly, we don’t have any good plugins for this on oobabooga. The authors seem to have been scared off by fast API changes.

No_Yak8345@alien.top · 2 years ago

While I’m not an expert, I understand that, in theory, large language models can process an unlimited amount of context. However, there are practical limitations. If we start by training a base model, for example, one with 70 billion parameters, to excel in reasoning and insight extraction, we could then progress to using bigger models to fine-tune . These bigger models could teach our 70b how to handle context windows ranging from 2 to 10 million tokens, essentially allowing us to store up-to-date information in a document. RAG can come in handy here as well.

LocoMod@alien.top · 2 years ago

You’ve basically described the entire purpose behind Retrieval Augmented Generation.

Mission_Revolution94@alien.top · 2 years ago

its really about the data curation and normalization.

think yi-34B they are getting results the same and better than 70B LLM’s due

to the quality of there data.

work on the data and you will more than likely be happy with the results.

the training is really the fast part when you think of what is required to really nail down quality input.

kivathewolf@alien.top · 2 years ago

So has anyone here tried this- train a LoRA adapter with say base Llama2 model and then merge the Lora adapter with say the Wizard model. As the wizard is a llama fine tuned model, will the LoRa weights merge? I might try it later as well :) If this works, then this is a way to solve your problem. As long as the model architecture doesn’t change, your specific adapter should be applicable even if the base model gets “outdated”.