Reuters is reporting that OpenAI achieved an advance with a technique called Q* (pronounced Q-Star).
So what is Q*?
I asked around the AI researcher campfire and…
It’s probably Q Learning MCTS, a Monte Carlo tree search reinforcement learning algorithm.
Which is right in line with the strategy DeepMind (vaguely) said they’re taking with Gemini.
Another corroborating data-point: an early GPT-4 tester mentioned on a podcast that they are working on ways to trade inference compute for smarter output. MCTS is probably the most promising method in the literature for doing that.
So how do we do it? Well, the closest thing I know of presently available is Weave, within a concise / readable Apache licensed MCTS lRL fine-tuning package called minihf.
https://github.com/JD-P/minihf/blob/main/weave.py
I’ll update the post with more info when I have it about q-learning in particular, and what the deltas are from Weave.
I think there a few llms that incorporate mcts on github
so just GTPQ ?
We can infer that any such advance by OpenAI that follows the naming convention of “Q*” would likely be a significant development in the field of reinforcement learning, possibly expanding upon or enhancing traditional Q-Learning methodologies.
Thanks, ChatGPT
I’m wondering if Q-Star is a recursive self improvement mechanism? Perhaps the in house model they have can innovate and consistently learn on top of what it’s been trained on?
There is too much hype about AGI and Singularity. We’ll get smaller models that give better answers - but AGI this is not.
If it teaches itself to learn it’s just a matter of time until it teaches itself to code
Calling it Q was a terrible idea. The cookers are going to go crazier
I’m not a fan of tech companies in general but Amazon is definitely one of most disliked.
It’s a silicon based version of Qanon. I will be terminated by telling you but wait 'till they launch MAGA (Machine Augmented General AI) !!!
FREEDUM
gguf when
I heard they have an even bigger breakthrough up their sleeve… Rumor is that it’s called GPT2, and it’s too dangerous to even release to the public 👀
HOLY SHIT
The letter from that triggered it all is here. Nothing named Q* was mentioned.
Join qtox it’s the same as jump and cos but it’s brand new 3 days old so get in early and start earning interest on however much you deposit and earn commission by inviting people
is there something other than the letter Q making you think it’s Q-learning?