In my opinion open-source projects should focus an a very narrow thing, instead of focusing on being a "GPT", that focuses on being able to do everything.

GodEmperor23@alien.top · 2 years ago

In my opinion open-source projects should focus an a very narrow thing, instead of focusing on being a "GPT", that focuses on being able to do everything.

nerdyvaroo@alien.top · 2 years ago

I was just thinking of doing something. Gotta still find the compute power to do what I want to though.

KeyAdvanced1032@alien.top · 2 years ago

Small specialized LLMs are going to be a thing the same way using frameworks is now.

a_beautiful_rhind@alien.top · 2 years ago

The big company gives you a base model and then it’s up to you to do that.

I’ve seen some agent and medical tunes. Smaller models for image and vision or tts, etc. Anyone doing it for a specific business case is probably not posting the model or advertising it.

Are you asking for people to make specialized 1.3b models from scratch? Because I think even that takes a long time on a “few” A100s.

involviert@alien.top · 2 years ago

Say, these finetunes are all merged LoRA stuff, aren’t they? Is nobody doing stuff where you just continue regular training with your own dataset?

Defektivex@alien.top · 2 years ago

This is the right answer. I’ve been building enterprise LLM solutions the last 9mo, there’s a ton of use cases in healthcare and finance related bi.

Lots and lots of classification and labeling work that requires domain specific context.

I’m finding less work in the avenue of generating ‘content’, and a ton in what is effectively workflow solutions or business process automation.

At this point I’m advising my company to avoid chatbot jobs all together as they seem to be low value.

Drited@alien.top · 2 years ago

So do those enterprises generally build their own small but targeted model? Or is it more fine tuning with an existing llm as a base?

MaxwellsMilkies@alien.top · 2 years ago

Wait until you realize that they can generate amino acid sequences too.

Tridente@alien.top · 2 years ago

I think you are 100% correct!

blueeyedlion@alien.top · 2 years ago

And here we are again: https://en.wikipedia.org/wiki/Single-responsibility_principle

100% correct tho. Individual devs only have so much time and money.

Dangerous_Injury_101@alien.top · 2 years ago

Perhaps we should have like hundred different 7B models for different categories like history, arts, science etc. and then above that there’s new layer where there’s generic LLM which parses the question to correct category, and then finally the correct 7B model loads into your VRAM? :D Like if you had the fastest NVME (not sure if DirectStorage would help, probably not?) perhaps the waiting wouldn’t be too terrible unless every of your question is in different category

ithkuil@alien.top · 2 years ago

https://github.com/XueFuzhao/OpenMoE

Conflictx@alien.top · 2 years ago

I wouldn’t even mind, loading in a 23B model takes 8 seconds and then generates at 28t/s using a 3090ti. Lower parameters go even faster.

Thats heaps faster than loading in a 70B model, and it taking 6 minutes to generate a single reply at 1.1t/s for something that might not even be correct because you didn’t prompt it right.

iChrist@alien.top · 2 years ago

What 23B model are you running?

slifeleaf@alien.top · 2 years ago

I think the main problem is GPU resources needed to train a model from scratch. Finetuning requires fraction of time in comparison to training, hence why there are a lot of GPT-like models, and almost no specialised models

Ansible32@alien.top · 2 years ago

I think most targeted models are going to be too targeted for anyone other than the organization that trained it to use. I can’t really see a 7B translation model being worth anything vs. GPT or Google Translate. (And for real translation, you really want something which can answer questions about connations/puns/whatever, which requires general understanding.)

ithkuil@alien.top · 2 years ago

https://github.com/XueFuzhao/OpenMoE

Mixture of Experts may help with getting more performance from smaller models.

amemingfullife@alien.top · 2 years ago

I disagree. I think narrow models are the previous ML generation. This generation is defined by its generalisability, AGI is, after all, just a machine that can have a solution to any problem that can be solved by computers. This is what we are competing with. So if you want to compete, and I think it’s a good thing for the human race if we do compete, then you need to compete on the same level of abstraction.

If you want narrow AI then we already have all the state of the art tools that can do this, just be prepared to know linear algebra inside-out.

I do agree in the sense that if you want to bring real value you need to be practical, but you need to keep your eyes on the prize in the long run.

Similar-Repair9948@alien.top · 2 years ago

I think the reason this is the case is because of benchmarks, there are no benchmarks that are used to verify most specific tasks or knowledge for AI models. Most model fine tuning companies are trying to show they know how to fine tune models based on current benchmarks to get more funding from Private Equity. Unless, more benchmarks are developed, this will not change.

Laurdaya@alien.top · 2 years ago

Maybe the best way is to have multiple models combined by a “router model” like Medusa.

FPham@alien.top · 2 years ago

The big company give you 13b base to play with and you can fine tune it to fit your specifications.

I agree that people should not focus on OpenAI GPT killer, but mostly because it is a losing proposition, so they are basically wasting time after certain point. You can finetune 13b until it is blue, it will still not be OpenAi GPT.

But then, back to the top - YOU can finetune it to whatever you want. It just happened the other people want to make it a general GPT - toddler. I don’t. And I finetune it in whatever I want.

Still, 13b is playing with a toy car, 33b is playing with a toy truck. From 70b it starts to be more interesting (a toy airplane?) but it also require a bit different setup to play with it. So out of my toy box.

To think we are scratching at the feet of company that got 10B from Microsoft and can hire the brightest minds is unrealistic. We are not even playing in the same field. We are in a sandbox, somewhere near our mama’s home, they are in a big arena sponsored by big money. With a marching band and cheerleaders and beer and everything.

Feztopia@alien.top · 2 years ago

I don’t know I think Openhermes 2.5 comes close to ChatGPT 3.5 Turbo, in some tests I preferred Openhermes output. So for me it seems like reaching gpt is possible. And that’s what I want, an offline ChatGPT 3.5 like ai that can run on my phone (Mistral support for mlc is on the way which means Openhermes on my phone is on the way). So 7b models are in a sweet spot that they run on weaker hardware and still give useful output. Over the time I expect both that they run a bit more efficient and get a bit better. I don’t need the best ai if it doesn’t run on my phone which I have everywhere with me.