Title says it all. Why spend so much effort finetuning and serving models locally when any closed-source model will do the same for cheaper in the long run. Is it a philosophical argument? (As in freedom vs free beer) Or are there practical cases where a local model does better.
Where I’m coming from is the requirement of a copilot, primarily for code but maybe for automating personal tasks as well, and wondering whether to put down the $20/mo for GPT4 or roll out my own personal assistant and run it locally (have an M2 max, compute wouldn’t be a huge issue)
Are there any good tutorials on where to start? Im a FW engineer with a M1 Macbook, I dont know much about AI or LLMs
Look up ollama.ai as a starting point…
https://github.com/oobabooga/text-generation-webui
How much ram do you have? It matters a lot.
For a BIF simplification, think of the models you can run as the size (billion parameter, for example 13B means 13 billion) = 50-60% of your RAM.
If you have 16 GB, you can run a 7B model for example.
If you have 128GB, you can run 70B,
If you are cool just using the command line, ollama is great and easy to use.
Otherwise, you could download LMStudio app on Mac, then download a model using the search feature, then you can start chatting. Models from TheBloke are good. You will probably need to try a few models (GGML format most likely). Mistral 7B or llama2 7B is a good starting place IMO.
GPT4all may be the easiest on ramp for your Mac. 7b models run fine on 8gb system, although take much of the memory.