Please enlighten me, why are people building LLM Twitter bots?

FPham@alien.top · 2 years ago

I could write a book about this.

First no model will be able to write well in style x and then in style y. Not at the size we can work with. Not at the size of ChatGPT either, although it would generally work better (it usually well approximates what the style is and falls into a Hollywood imagination of that style)

With finetuning you can make the model do one of those things.

## Instruction:

Rewrite the following text as a poem: Lister sat near the fireplace, looking at his arms which were covered in ointment. I find it hard to believe that pain can form words. Is this something internal or external? On his left arm, the message read “Dying” while on his right arm, it said “Living”. He did not know what to think. A man who had been through a lot.

## Response:

Lister sat by the hearth and gazed

Upon the ointment-soaked clay

That coated his sinewy bare arms.

How could such agony frame

Such mystic characters? On one arm

The word “Dying” was graven clear;

On the other, “Living” appeared.

What could it mean? Ah! there,

A man whose life had been one tale

Of woes, of tortures, of crosses, of shame,

Who knew not what to hail as his own name.

FPham@alien.top · 2 years ago

The ridle json + sydney actually makes the model far more lucid than normally. I applied it on Mythomax and the answers are really good.

https://preview.redd.it/gmisu6ac0l3c1.png?width=925&format=png&auto=webp&s=4d6fcf5bd5b89a6e6bd8ec86def35b19514473b1

FPham@alien.top · 2 years ago

I find it very strange attaching the gguf file to an exe - it’s a very bad security idea (your antivirus needs to hash 10 GB file) and then on windows you still need to split it to exe and data, because the exe limit is 4GB so basically instead of llama.cpp you are now using llamafile that is llamacpp. Or am I missing something?

FPham@alien.top · 2 years ago

Geez, Alexa is in middle ages… I want fun Alexa!

FPham@alien.top · 2 years ago

Hahaha, this is totally Llama 2 with Alexa’s voice.

FPham@alien.top · 2 years ago

Call it a voodoo, not an art.

FPham@alien.top · 2 years ago

And also made LLM for the users to abuse… Wait when twitter replies start sounding like Hitchhiker’s guide to the galaxy and citing Vogon poetry, coz that’s the “style” of Grok.

FPham@alien.top · 2 years ago

I use twitter while drinking morning coffee - it makes it 2x stronger.

FPham@alien.top · 2 years ago

I’m thinking that’s probably it.

FPham@alien.top · 2 years ago

Commu - what? Blasphemy…

FPham@alien.top · 2 years ago

Please enlighten me, why are people building LLM Twitter bots?

FPham@alien.top · 2 years ago

That looks very clean for sure.

FPham@alien.top · 2 years ago

My private finetunes are about text rewriting - input text paragraph - rewrite it in a certain style.

No 7b finetuned model can grasp the idea of submitted text in entirety, tried maybe 100 different runs. It would make a common mistake of “someone” who just scan the text quickly while also watching youtube on a phone, failing to comprehend who is who or what the paragraph is about.

13b with the same finetuning does much better - it would comprehend the relations. For example if two people are speaking, it can keep track who is who, even without mentioning it in the text.

33b - gets even further - sometimes surprise with the way it understand the text. And so the rewritten text is a mirror image of the input, just with different style

7b are impressive if you want a small local LLM to give you answers on questions, but that’s probably the limit. If you want an assistant that can also do other things, then it falls short, because your instructions are not necessary understood fully.

FPham@alien.top · 2 years ago

Well, no LLama model would come even close to GPT-4 in translation…

FPham@alien.top · 2 years ago

On somehow similar note of adding noise during finetuning to help with generalization: I you using oobabooga, you can look at Training PRO

https://github.com/FartyPants/Training_PRO

And then experiment with NEFtune noise scale.

It is somehow simillar idea - but on the other end - pretraining, I assume you are talking about adding noise in interference in sampler. Worth pursuing for sure - the results, however are unpredictable before trying it…

FPham@alien.top · 2 years ago

Get a base model of your choice, finetune it with plain text book of a style you want it to talk. Done.

FPham@alien.top · 2 years ago

There is very little info.

It seems to be instruction finetuned, but what template? ChatML? There is no mention of anything. Posting it this way is pretty bad.

FPham@alien.top · 2 years ago

Are you saying you want a model that will spit out LORA’s? Like “Please generate me Lora that will make yourself totally amazing?”

If so, this is more in the realm of star trek food replicator. AKA it works amazingly on a TV screen.

If not then, sorry.

The closest to this would be a model that will pickup the correct LORA needed to reply. Adapters can be easily switched on the fly and so a model can be made that would call a function to select correct adapter. Maybe this is how ChatGPT works. maybe not.

FPham@alien.top · 2 years ago

The more it doesn’t know something the more it will create a perfect nonsense with a straight face. It’s like replies on twitter.

FPham@alien.top · 2 years ago

Don’t start writing entire GUI. First make a simple code that loads model and does interference. There are probably 10 lines in total. You can just grab the code people have with their models (like TheBloke always post a code snippet)

Now create gui and instead of preset text add text box and a button “Go” so it can do interfernce from your text.

Boom, A GUI.

Now go from there, keep adding - dropdown list to select model, dropdown list to select instruction template…etc…

FPham@alien.top · 2 years ago

If your epoch is 50 steps then you are not going to use 100 warmup steps.

In Training Pro extension I use 0.1 of total steps for warmup, but max 100 (there isn’t point to go higher, after 100 steps you should have primed most of the weights)

So if you have 3800 samples, which is a ton, 100 warmup step is as good as any.

val_set_size seems to be size of evaluation data. Now it depends if you want to even use evaluation data or not (some type of training have no reason to use evaluation data as it will not evaluate anything useful) . Again with big dataset 0.04 is fine. With small dataset 0.04 will create 1 evaluation sample - you are far better not to have ANY evaluation dataset.

FPham@alien.top · 2 years ago

When OpenAI takes neutering to the extreme...

FPham@alien.top · 2 years ago

Writing Partner Mistral 7B

FPham@alien.top · 2 years ago

Down to memory lane, 2022 - "Google's LaMDA Ai is sentient, I swear"

FPham@alien.top · 2 years ago

Reverso Expanded 13b - Now Even Fancier Question Asking Model

FPham@alien.top · 2 years ago

The smartest model under the sun: Prof. McSmarty BS 13b GPTQ