Section 4.6 of the executive order is what we need to know about.

wind_dude@alien.top · 2 years ago

Or 2 a6000s. But yea $$$ matters.

wind_dude@alien.top · 2 years ago

you can try changing the attention to something like flash attention

wind_dude@alien.top · 2 years ago

so it sounds like for the 600b they just finetuned llama2 again with the same stuff Llama2 was trained with, just more of it…

RefinedWeb

Opensource code from GitHub

Common Crawl we fine-tuned the model on a huge dataset (generated manually and with automation) for logical understanding and reasoning. We also trained the model for function calling capabilities.

wind_dude@alien.top · 2 years ago

I hate that people are training on that garbage… but they’re ummmm… you know…

wind_dude@alien.top · 2 years ago

PRM8k, made the rounds maybe 6+ months but they never publicly released the model.

wind_dude@alien.top · 2 years ago

yea, that seems to be what a few news articles have referenced.

wind_dude@alien.top · 2 years ago

is there something other than the letter Q making you think it’s Q-learning?

wind_dude@alien.top · 2 years ago

What would happen if you replace the decoder during finetuning? Would you also see a speed up but at the expense of vram?

wind_dude@alien.top · 2 years ago

This is a giant cluster fuck.

wind_dude@alien.top · 2 years ago

Only if Sam sign a contract with MS, and it explicitly prevents that.

wind_dude@alien.top · 2 years ago

I will use this now for some tests.

wind_dude@alien.top · 2 years ago

I believe they said they’re going to release training data. We’ll see. That’s about the only way to easily verify what made it in.

wind_dude@alien.top · 2 years ago

Section 4.6 of the executive order is what we need to know about.