@AfterAte

AfterAte@alien.top · 1 year ago

Any model that won’t give me instructions on how to make napalm is sensored in my book. ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

AfterAte@alien.top · 1 year ago

Do a test with llama-cpp.exe directly and using oobabooga (which uses llama-cpp-python) and see if there’s a consistent difference. I’m guessing even glue can be a bottleneck.

AfterAte@alien.top · 1 year ago

If nobody has any other suggestions, try Phind.com with GPT4 selected in the drop-down. You get 10 free tries a day, the downside is that they use your data for training their in-house model. That’s as good as you’re gonna get for less known languages, for free.

Since all local models suck for Rust, I’m gonna assume all general purpose coding models suck for anything but the most common languages (Python, JS/Type Script, C/C++, Java). Although there is an/a SQL only model that is really good with SQL. Maybe someone did one for PS…

AfterAte@alien.top · 1 year ago

Great! Have fun!

AfterAte@alien.top · 1 year ago

I don’t know much about Rust, but Easy Rust is a good source for learning: https://github.com/Dhghomon/easy_rust

But in a useful format for fine-tuning… no idea where to get that. And I’m not qualified to make it either. But i don’t want to burden you with extra work so I guess C++ will have to do for now :) Thank you for the model, from me and everyone else with a potato PC m(_ _)m

AfterAte@alien.top · 1 year ago

Also, set the temperature to 0.1 or 0.2. those two things helped me getting it to work nicely.

AfterAte@alien.top · 1 year ago

Btw, does your dataset include coding examples? If so, do you include Rust? I find current models really suck at Rust, but can make a pretty good Snake game in Python 😂

AfterAte@alien.top · 1 year ago

Try using the alpaca template, turn temperature down to 0.1 or 0.2 and repetition penalty to 1. I haven’t tested this yet, but those settings work for Deepseek-coder. If you’re using oobabooga, the StarChat preset works for me.

AfterAte@alien.top · 1 year ago

Wow, that’s amazing. On the eval+ leaderboards, Deepseek-coder-1.3B-instruct gets 64.6, so that’s a ~4% increase. It’s about 3% less than Phind-v2’s result, which is amazing.

AfterAte@alien.top · 1 year ago

Oh nice! I’ll have to try those settings and compare with the StarChat preset in Oobabooga. I hear ya, I get 1t/s too… it’s unbearable.

AfterAte@alien.top · 1 year ago

Nice, I didn’t know that.

AfterAte@alien.top · 1 year ago

Take a look at Phind.com. They use the web to enhance their model’s answers. That means that you can get up to date information on APIs instead of relying on the data with a cutoff of 2021 or 2022. You can use their in house Phind V8 model for free, but if you want to use GPT4, you get 10 tries a day. If you want more, they have paid plans. They recently announced that their free V8 model was as good as GPT4, but other people here have disagreed with them. I have never used GPT4, but their free Phind model was better than anything local we have.

AfterAte@alien.top · 1 year ago

Thank you for that leaderboard, it’s definitely a space to keep track of. I keep hearing that China is the leader in AI so I hope they’ll give a good competition to Silicone Valley.

I’ve only heard of Qwen-7b, but have not tested yet. It seems the 14b version performs well against ChatGPT. I’ll put it on my list of models to test.