Hello after a long time :)
I am TokenBender.
Some of you may remember my previous model - codeCherryPop
It was very kindly received so I am hoping I won’t be killed this time as well.
Releasing EvolvedSeeker-1.3B v0.0.1
A 1.3B model with 68.29% on HumanEval.
The base model is quite cracked, I just did with it what I usually try to do with every coding model.
Here is the model - https://huggingface.co/TokenBender/evolvedSeeker_1_3
I will post this in TheBloke’s server for GGUF but I find that Deepseek coder’s GGUF sucks for some reason so let’s see.
EvolvedSeeker v0.0.1 (First phase)
This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on 50k instructions for 3 epochs.
I have mostly curated instructions from evolInstruct datasets and some portions of glaive coder.
Around 3k answers were modified via self-instruct.
Recommended format is ChatML, Alpaca will work but take care of EOT token
This is a very early version of 1.3B sized model in my major project PIC (Partner-in-Crime)
Going to teach this model json/md adherence next.
I will just focus on simple things that I can do for now but anything you guys will say will be taken into consideration for fixes.
Do you plan to release the dataset? Have you checked for data contamination with benchmarks? I am overall pretty confused by scores of this model on HumanEval, not just your finetune. DeepSeek AI got very weird scaling in benchmarks, since their 6.7B model scores really closely to 33B one, which usually doesn’t work this way. 6.7B instruct scores 78.6% while 33B instruct scores 79.3%. I am now using 33B model daily at work and it’s really good. I have no evidence to support my claim, but I totally wouldn’t be surprised if they were pre-training on contaminated dataset.