Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

vatsadev@alien.top · 2 years ago

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything!

vatsadev@alien.top · 2 years ago

No its Victorian era frankenstein obvs

vatsadev@alien.top · 2 years ago

Hmm, will have to check this stuff with the people on the rwkv discord server.

V5 is stable at context usage, and V6 is trying to get better at using the context, so we might see improvement on this

vatsadev@alien.top · 2 years ago

Um The dataset is opensource, its all public HF datasets

vatsadev@alien.top · 2 years ago

Thats the point of rwkv, you could have a 10 mil contx len and it would be the same as 100 ctx len

vatsadev@alien.top · 2 years ago

Its trained on 100+ languages, the focus is multilingual

vatsadev@alien.top · 2 years ago

RWKV v5 7b, Fully Open-Source, 60% trained, approaching Mistral 7b in abilities or surpassing it.

vatsadev@alien.top · 2 years ago

Also AWQ has entire engines for efficieny, look into aphrodite engine, supposably the fastest for awq

vatsadev@alien.top · 2 years ago

“Do I need to learn llama.cpp or C++ to deploy models using llama-cpp-python library?” No its pure python

vatsadev@alien.top · 2 years ago

it outputs the call

https://twitter.com/abacaj/status/1727747892922769751

vatsadev@alien.top · 2 years ago

OpenHermes 2.5 is amazing from what I’ve seen. it can call functions, summarize text, is extremely competitive, all the works

vatsadev@alien.top · 2 years ago

There are plenty of datasets, Just take the ones meant for stable diff training, rip out the prompt text, profit

Heres some high quality captions used for dalle3, etc:

https://huggingface.co/datasets/laion/dalle-3-dataset https://huggingface.co/datasets/laion/gpt4v-dataset https://huggingface.co/datasets/laion/wuerstchen-dataset https://huggingface.co/datasets/laion/220k-GPT4Vision-captions-from-LIVIS https://huggingface.co/datasets/laion/gpt4v-emotion-dataset

vatsadev@alien.top · 2 years ago

RWKV v5 7b, its only half trained rn, but the model surpasses Mistral on all multilingual benchmarks, cause the is meant to be multilingual.

vatsadev@alien.top · 2 years ago

OpenHermes 2.5 is the latest version, but the openHermes series has a history in ai models of being good, and I used it for some function calling, its really good

vatsadev@alien.top · 2 years ago

IMPORTANT!

this isnt trained, its another mistral finetune, with dpo, but with slimorca, not ultrachat.

I would be using openHermes, its much more trialed, and its proven solid

vatsadev@alien.top · 2 years ago

Sad, and we thought HF was the harbor of unaligned models, but maybe im missing the whole story. Hopefully they dont kill models for saying taiwan good or something

vatsadev@alien.top · 2 years ago

Open source -> Mistral instruct worked great for me, Zephyr alpha was crazy aligned, while beta was better

Closed Source -> Inflections Pi is smooth! Pray for API access

vatsadev@alien.top · 2 years ago

“I want to chat with a PDF, I don’t care for my LLM to speak French, be able to write Python or know that Benjamin Franklin wrote a paper on flatuence (all things RWKV v5 World 1.5B knows).”

This is Prime RAG, bring snippets in, make the model use them. The more knowledge the model has, the better it gets for your usecase as well, as it knows more stuff.

Also, nice using rwkv v5, hows it work for you?

vatsadev@alien.top · 2 years ago

there are ggufs, check the bloke or greensky

vatsadev@alien.top · 2 years ago

RWKV 1.5B, its Sota for its size, outperforms tinyLlama, and uses no extra vram for fitting its whole ctx len in browser.

vatsadev@alien.top · 2 years ago

Noice man

vatsadev@alien.top · 2 years ago

Why not test all models for training on the test data with Min-K% Prob?

vatsadev@alien.top · 2 years ago

Well the 5 million was just an example of the OP stuff out there

vatsadev@alien.top · 2 years ago

Thinking about what people ask for in llama 3

vatsadev@alien.top · 2 years ago

I have to ask, why is no one using fuyu?