In the case of petals where any client can drop off at anytime, each client would need multiple layers for redundancy, maybe not the full weight but at least 20-30% so if someone drops off, another one can take over instantly
In the case of petals where any client can drop off at anytime, each client would need multiple layers for redundancy, maybe not the full weight but at least 20-30% so if someone drops off, another one can take over instantly
Yes, you are right. Although I guess it can work in petals as well if each person has the full model downloaded, then the GPU can be instructed to load the next weights locally when it is done with the current one ?
Isn’t that how things like petals.dev work ?
https://continue.dev. It supports many LLMs.
You might want to look into llamaIndex’s SECinsight repo. https://github.com/run-llama/sec-insightsz they do a lot of parsing on financial documents.
Cost is really the main issue. You can train a local LLM, or you can train ChatGPT as well. I wouldn’t be surprised if someone is already making a custom GPT for helping with unity of unreal engine projects.
For Privacy, company with money will use a private instance from Azure, it is like 2-3 times the cost , but your data is safe as you have a contract with MS to keep it safe and private, with large financial penalties if it isn’t.
Also, running LLM locally isn’t 0 cost, depending on the electricity price of your area. GPU consume a LOT of power. The 4090 is like 460 watts.
Please look up fine tuning and LoRA, those are the method to “evolve “ a model after it is born.
It does exist, but really only works when you have very high speed, low latency connections between the machine. Like infiniteband.
If you just want to try it out, install privateGPT on your local PC/Mac, no GPU required.
You can try something like Claude.ai which has long context and is free to use.
You can use a python script to load the model, split the text into chunks, and ask the model to translate per chunk, then you don’t need a model with 64K context window (which will take up a lot of memory and are not that common).
It also depends on the language you are trying to translate, it would be best to find models that has been trained in the original language, most models have a large english corpus, with many finetuned with chinese data, but there are specialty models for German/arabic/japanese, try google search or find on hugging face.