I know the typical answer is “no because all the libs are in python”… but I am kind of baffled why more porting isn’t going on especially to Go given how Go like Python is stupid easy to learn and yet much faster to run. Truly not trying to start a flame war or anything. I am just a bigger fan of Go than Python and was thinking coming in to 2024 especially with all the huge money in AI now, we’d see a LOT more movement in the much faster runtime of Go while largely as easy if not easier to write/maintain code with. Not sure about Rust… it may run a little faster than Go, but the language is much more difficult to learn/use but it has been growing in popularity so was curious if that is a potential option.
There are some Go libs I’ve found but the few I have seem to be 3, 4 or more years old. I was hoping there would be things like PyTorch and the likes converted to Go.
I was even curious with the power of the GPT4 or DeepSeek Coder or similar, how hard would it be to run conversions between python libraries to go and/or is anyone working on that or is it pretty impossible to do so?
Because the ML people already learned python, are comfortable and are not interested in putting in the effort to learn a new language for basically no benefit
Do a test with llama-cpp.exe directly and using oobabooga (which uses llama-cpp-python) and see if there’s a consistent difference. I’m guessing even glue can be a bottleneck.
I’m mostly using Bash and Perl.
Could you elaborate on the Perl part, if possible? I don’t mind learning as much Python as necessary as I go along, but I’d much rather be doing all that is convenient in Perl.
Sure! I’ve been doing a few LLM’ing things in Perl:
-
A previous project, implemented in Perl, indexes a local wikipedia dump in Lucy and allows searching for pages. I’ve been reusing that project for RAG inference.
-
My “infer” utility is written in Perl. It wraps llama.cpp’s “main” utility with IPC::Open3 and I’m using it for inference, for RAG, for stop-words, for matching prompt templates to models, and for summarization. It’s gloriously broken at the moment and in dire need of a refactor.
-
I recently started writing a “copilot” utility in Perl, to better streamline using inference for research and code-writing copilots. It also wraps llama.cpp’s “main”, but in a much more simple way than “infer” (blocking I/O, no stop words, not trying to detect when the LLM infers the prompt text, etc).
If you’re more interested in using the existing Python libraries and not wrapping llama.cpp, you should take a look at the Inline::Python module. I’ve only dabbled with LangChain, but if/when I get back to it, I will probably implement Perl bindings with a simple Inline::Python wrapper. It makes it pretty easy.
If you do decide to wrap llama.cpp, you might be more comfortable with IPC::Run rather than IPC::Open3. It’s considered the more modern module. I’m just using IPC::Open3 out of familiarity.
-
The runtime of your code basically doesn’t matter. You hand it off to a GPU for all the hard calculations, and even a fast environment, your code is going to spend 99% of its execution time waiting to get stuff back from the GPU.
I’m sure Python’s I/O polling sleeps are just as efficient as Go’s.
Interesting. I was thinking more the code that is used to train models. It seems the ability to run a model is pretty well covered with the likes of llamacpp and such. So not sure it makes much sense in that area. But I assume the team at OpenAI spent a lot of time wriiting code that is used for the training aspect? It can’t just be a couple lines of code that read in some vector data and train the model, then write it out. There must be a ton of logic/etc for that as well?
But then again, maybe that doesnt need to be fast either. I don’t know.
The model training is also GPU-intensive. If you read about model training costs, they talk about things like “millions of GPU-hours.”
As I understand the process (note, I am a software developer, but do not professionally work on AI), you’re feeding lots of example text into the GPU during the training process. The hard work is curating and creating those examples, but that’s largely either human-intensive or you’re using a LLM to help you with it, which is…GPU-intensive.
Optimizing the non-GPU code just isn’t much of a win. I know all the cool kids hate Python because it’s typeless and not compiled, but it’s just not much of a bottleneck in this space.
Yah… the thing is… do I have to learn very fancy advanced python to do this… or can I use more simple python that then makes use of as you said more optimized libraries. I am wondering how much time its going to take to figure out python well enough to be of use and hence was thinking Go and Rust might work well as I know those well enough.
If it’s just calling APIs even to a local running model, I Can do that in Go just fine. If it’s writing advanced AI code in python, then I have to spend likely months or longer learning the language to use it well enough for that sort of work. In which case I am not sure I am up to that task. I am terrible with math/algos, so not sure how much of all that I am going to have to somehow “master” to be a decent to good AI engineer.
I think the idea is… am I a developer using AI (via APIs or CLI tools) in some capacity, or am I writing AI code that will be used for training, etc AI. I don’t know which path to go, but I’d lean towards using models and API calls over having to become a deep AI expert.
My sense of it is that most training is still just using the APIs to talk to the GPU, and the art is more in the assembly of the training set than it is optimizing the APIs. There are serious researchers working on improving AI, but they’re figuring out how to make the data get in and out of the GPU faster in a way that doesn’t hurt later quality. But that’s not a code optimization problem, it’s much more of an “understanding the math” kind of problem and whether you’re then using Python or Go to tell the GPU to execute this slightly different math isn’t much of a concern.
I think it’s a lot like data science. Getting the good clean data to work with is actually the hard part. For training, getting great training sets is the hard part.
If you just wish to write code that uses AI, or train models for that purpose, the current Python toolkit is more than sufficient, especially given how quickly everything is moving right now; we might have totally different architectures in three years, and Python will be quicker for that R&D iteration than Go is.
Finally, on your personal thing - I’ve been coding for 39 years. I’ve worked in BASIC, Assembly, C, C++, Perl, Python, and Go (and 37 other languages here and there). Go to Python isn’t going to be a difficult jump. Especially now that you can…use an AI to help you if you’re at all confused how to turn a Go concept into a Python one.
For rust there is candle. There are example binaries in the crate that you can just build to run e.g. a llama model. TheBloke also links to Candle in the documentation of models that it can run.
Another rust lib is burn, which has promising support for different backends, but can’t run too many models yet.
Burn looks interesting. So I think I am just lacking the understanding of how this all works. I assumed there is code that is used in AI that handles the models… some sort of way to use the models as “data” but the NLP, AI “logic” brain, etc would be done in code. I assumed that that is largely the python code. I assumed that models were more or less data that a runtime AI engine uses to find answers to the questions asked, thus thought the model runners handled the NLP work and turned incoming queries in to some model specific format that allows the algorithms of the model to do what they do… e.g. return responses and answers as if a human replied. I assumed ALL that was tons and tons of code done in python, and thus, was thinking if that is the runtime “brain” that AI uses, then wouldn’t it run even faster in Go or Rust or something.
I am sadly not sure if I am explaining this right. I just assumed there was likely millions of lines of code behind the AI “brain” and that the model was basically gobs of data in some sort of… for lack of a better word compressed database format. So when “training” occurs… I am not entirely clear what is going on, other than it takes a ton of compute and it results in a single .gguf or similar file that is the model that can then be loaded by the likes of ollama, etc and then queried against by users using plain english. The code behind the training, the code behind running a model… that is what I am foggy on. Is there code IN the model… in binary format or something along with ALL the data it draws from?
I originally thought AI would use the internet in real time… but that would clearly take a LOT longer for AI to search the web for answers and then formulate some sort of intelligent response rather than just some sort of paste of snippets it finds.
There’s some movement in the rust space, the main advantage being that you can compile to wasm and serve models in any browser. There are several efforts in this direction. This can also be linked to edge-computing, with more services starting to use wasm/wasi etc. There’s a world where you have your entire codebase in rust, and you get to deliver models either to browsers or wasm “VMs” in an edge provider.
That sounds pretty slick.
Most AI training code is just a big for loop with each line calling highly performant c/cpp libraries underneath. There is no value that go or rust can add here.
Most AI training code is just a big for loop with each line calling highly performant c/cpp libraries underneath. There is no value that go or rust can add here (most libs used in python are not in python only their stubs are in python).
some people want to train on procedural generators (eg game engine) which would be in C++. being able to have the whole codebase in one language would smooth this out. (in my case I have a rust 3d engine code base that i’d like to use to drive AI )
ggml is a great idea IMO.
Python is just the glue.
But the actual training code… isn’t there a crap ton of code that trains the model so that the model can respond with NLP and other capabilities? There has to be code behind all that somewhere? The ability for the “logic” of the AI to do what it does… that code is python as well yah? I would assume that in Go or Rust or C would execute much faster and this AI could be much faster (and less memory, no python runtime, etc)? Or is there already some back end c/cpp code that does ALL that AI logic/guts, and python even for training models is still just glue that calls in to the c/cpp layer?
Correct, even for training the models, all the Python code you see is really just a friendly interface over highly optimized C/cuda code.
There are no “loops” or matrix multiplication being done in Python. All the heavy lifting is done in lower level highly optimized code.
So most of the python AI coding folks… aren’t writing CUDA/high end math/algo style code… they’re just using the library similar to any other SDK?
Yes. Even the authors of the AI frameworks like PyTorch aren’t usually writing the low level cuda code for NNs. They are wrapping the cuDNN library from NVIDIA which has highly optimized cuda code for NN operations.
The deeper you go the less abstraction you get and the more stupid you feel
Yes, of course! Don’t make the mistake of thinking any particular area of CS is fundamentally different from the classic case: we stand on the shoulders of great libraries :)
Also - it’s important to note that the “logic” that you are trying to find is not written in Python or any other programming language, it is held in the weights and architecture of the model you use (which hint: is also something you will be borrowing from an academic or research arm of a tech firm rather than writing yourself!)
I use raw php /s
yep and the future is optimizers, custom compiled cuda kernels, and more asic chips (eventually). python is just the glue that is commonly used. it’s good glue though, but there are other glues…
I disagree that go is as simple as python. Don’t stay only with the syntax but also see the stdlib, built ins, etc. Simply making a Fibonacci function with memoization in go is much more complicated in go and requires you to think of more low level unnecessary stuff than python. I’m python you have it in literally four lines of code
Go is neither as simple as python, nor as powerful. In fact, I don’t know of any modern general-purpose language that’s more limited. It’s faster, and produces native code, and it’s type-safe to an extent, but that’s about it. In almost every way, it’s a bad excuse for a modern language.
Go doesn’t have GIL and this reason alone is why I left Python years ago ;)
What is GIL?
I don’t know why you say “nor as powerful”. In which way? For a Go developer that knows Go, performance/memory wise, its much more powerful. Maintaining code… its on par… a go dev and a python dev are going to very likely be “equal” in ability to read/maintain code in their languages of choice. Go lacks some things, sure, and is more verbose in some areas, sure, but I dont see how that makes it a bad excuse for a modern language. On the contrary, it is very fast/easy to learn, produces small fast memory efficient (largely) binaries that can be built on all platforms for all platforms and has probably some of the best threading capabilities of any language.
I would argue that for the use case, perhaps python IS better suited for AI… and I am just not at all knowledgeable in how that may be. So I’ll give you that… if the runtime and training bits of AI do NOT need the much more performance of a language like Go or Rust, then so be it. But if it’s the usual “There are just good libraries in python that have been written for many years and would have to be rewritten in Go/Rust/etc” excuse… then that doesn’t tell me that python is better for AI, just that it would require work that nobody wants to do to convert the existing python libs that are so good, to Go/Rust/etc.
Check out this rust repository by hugging face: https://github.com/huggingface/candle
I saw that from another response. Very cool Burn also looks pretty good.
Learning it cause I believe it will be valuable in the future.
Sure python is just the glue, and we won’t actually see much difference in terms of speed. But executing models as simple binaries without dependencies is more valuable than people think for scalability.
That’s an interesting response. I responded elsewhere that I am just missing how all this comes together. I assumed the runtime (e.g. python glue in this case) handles the NLP query you type in, turns it in to some meaningful structure that is then applied to the model to find responses. I am unclear if the model is the binary code/logic/brain of the AI… or is it just a ton of data in a compressed format that the model runner uses to find stuff? I assume the former, since python is glue code apparently.
But also… the training stuff… I am unclear if that is gobs and gobs of python code… if so, converted to Go or Rust… wouldn’t it train MUCH faster given the exponential runtime increase of something like Go or Rust? Or does that ALSO not matter since most of the training is done on GPUs/ASICs and again python is just glue code using those GPU/Asic libraries (which are likely in C/C++)? E.g. TensorFlow and the likes by nvidia I assume is used for training.
But the code that actually does training… that is what I am really trying to understand. Code somehow results in an AI that can “think” (though not AGI or sentient… but seems like it can think) like a human… and respond with often much better details and data than any human. ALL of the data it is trained on… is basically rows and rows of structures that are like key/values (or something like that) I assume, and somehow that results in a single file (gguf or whatever format it is) that a very little bit of python code can then execute…
I am just baffled how all the different parts work… and the code behind those. I always assumed python was used in the early days due to “ease to learn” and that somehow the slow runtime speed nobody gave a shit about because back then it was just starting, but now that its big, its too late to “rewrite” in Go or Rust or what not. But it sounds like a lot of the training stuff uses native nvidia/asic/etc binary libraries (likely done in c/c++) and that the majority of the python code doesn’t need speed of Go/Rust/C to run. it is again glue code that just uses the underlying c/c++ libraries provided by the hardware that is used for training?
This is quite simple for me… I only know python and very small amounts of JavaScript/html/ and css. More important than efficiency gains is just me getting the job done which really is an efficiency gain in itself.
OK… so that’s fair, but I would counter with… if Go/Rust were going to increase the runtime performance of training/using the AI models by a factor of 2, 3 or more, and the time to learn Go is a couple weeks and Rust a couple years (kidding… sort of), if you’re job is for years to come doing this sort of work and the end result of say, Go is training much faster or doing data prep much faster… wouldnt the benefits of learning Go or even Rust be worth the exponential increase in time savings for training/running, as well as memory efficiency, less resources needed, etc?
Not saying you should, cause I don’t even know if Go/Rust/Zig will result in much faster training/etc. I would assume if that were the case, then company’s like OpenAI would have been using these languages already since they had the money and time to do so.
I’m not exactly a top tier programmer so anything I make is lucky to work. I would always consider using the best language for the job though given the resources so ya.
Actually using Rust with the Candle library for inference and production code for LLM. Still using python for training, as the majority of the ecosystem is in python. We choose Rust for the intrinsic benefits of the language, and also because we build desktop apps with Tauri. We also use rust for data preparation and other machine learning stuff other than LLMs. If you’re just starting up with Rust, I would recommend gaining more experience before using it as your main language for ML.
So are you (your company) building models very fine tuned for specific needs of your company (products)? That is what I am trying to learn… but knowing Go and not a big fan of Python (dont know it well), I was hoping I could utilize my knowledge of Go + the runtime speed/memory/etc to train my own custom models. I am NOT sure how all that works though. I feel like its just some loop that reads in the prepared data and puts it in a new format, and thats it. lol. I don’t quite understand what “training” does. How it works. Or the code behind training. Is it some iterative process… like keeps repeating the same thing so the AI “learns”… so like… you ask it “What is 2+2” and it says 8, 7, 9, 13, 6, 5, 4, 2, 3, 4, 5, 4, 4, 5, 4, 4, 4, 4, 4, 4, … ?? So eventually on some iteration it gets the right answer consistently… and at that point you say “trained” next question?