rhinohoof@alien.topB to

LocalLLaMAEnglish · 1 year ago

Resources for creating datasets for code generation?

1

1

Resources for creating datasets for code generation?

rhinohoof@alien.topB to

LocalLLaMAEnglish · 1 year ago

1

I tried some code generation models on huggingface but they were really poor in the responses I got even though I clearly explained what I need in the prompt. My assumption is that it was because my question is related to a niche framework and the model was trained on a large dataset on a wide variety of languages and may not have come across the framework I’m working with. I’m not looking for a general model but one that is specific to the not-so-popular framework I work with, so I’m guessing I’ll have to generate a custom dataset.

I also don’t need the model to know so many languages. If I can get it to generate just Python, JavaScript, Golang, and C, that alone would be great but I can do with fewer languages as well. So, does this mean I’ll end up with a smaller model suitable for inference on an RTX4090?

How will it understand what I am asking it? Do I also need to scrape Stackoverflow and some forums for the specific language tags I am interested in?

How do I go about creating such a dataset? I can scrape from multiple sources but in what format am I supposed to put it all together for training?

I am doing this for the first time.

You must log in or register to comment.

Chat

tylerjdunn@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
I might be able to help. A couple questions:

What coding tasks do you want to use an LLM to help with?

Which models have you tried so far?