Google released T5X checkpoints for MADLAD-400 a couple of months ago, but nobody could figure out how to run them. Turns out the vocabulary was wrong, but they uploaded the correct one last week.
I’ve converted the models to the safetensors format, and I created this space if you want to try the smaller model.
I also published quantized GGUF weights you can use with candle. It decodes at ~15tokens/s on a M2 Mac.
It seems that NLLB is the most popular machine translation model right now, but the license only allows non commercial usage. MADLAD-400 is CC BY 4.0.
If anything needed some minimalist app, this would be it.
Nice, I will check madlad later. Now, I thought seamless4MT was the best translation model from meta, I didnt even know about this NLLB existed. Does anyone have used both and can point out the difference? seamless4mt seemd amazingly good in my experience, but have less languages perhaps, idk
Nice thank you!! Tried in space. Works well for me. Noob question. Can I run this with llama.cpp? Since it’s gguf. Can I download this and run it locally?
I tested the 3B model for Romanian, Russian, French, and German translations of the “The sun rises in the East and sets in the West.” and it works 100%: it gets 10/10 from ChatGPT
@jbochi , Is it possible to run cargo example for batch inputs?
cargo run --example t5 --release --features cuda – \ –model-id “jbochi/madlad400-3b-mt” \ –prompt “<2de> How are you, my friend?” \ –temperature 0
Thanks
Yes, I would be interested to know if this is possible
Could you please convert the other versions as well or release the code you used ?
I tested two sentences: one from hindi to english, which it translated fine. Another was romanized hindi which it couldn’t handle: input: Sir mera dhaan ka fasal hai Output was the same as input. Both ChatGPT and Google Translate can handle this.
Not sure if this has been asked yet, but how good are the translations from this model compared to normal GPT-3.5 and Claude?
Thanks.
Good question. ALMA compares itself against NLLB and GPT3.5, and the 13B barely surpasses GPT3.5. MADLAD-400 probably beats GPT3.5 on lower resource languages only.
Hi, i have the following error while trying to run it from transformers copying the code provided in huggingface
Traceback (most recent call last):
File “/home/XXX/project/translation/translateMADLAD.py”, line 10, in
tokenizer = T5Tokenizer.from_pretrained(‘jbochi/madlad400-3b-mt’)
File “/home/lXXX/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py”, line 1841, in from_pretrained
return cls._from_pretrained(
File “/home/lXXX/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py”, line 2060, in _from_pretrained
raise ValueError(
ValueError: Non-consecutive added token ‘’ found. Should have index 256100 but has index 256000 in saved vocabulary.
What would be the equivalent models based on open source and free for commercial use?
Meta’s NLLB is supposed to be the best translator model, right? But it’s for non-commercial use only. How does MADLAD compare to NLLB?
The MADLAD-400 paper has a bunch of comparisons with NLLB. MADLAD beats NLLB in some benchmarks, it’s quite close in others, and it loses some. But the largest MADLAD is 5x smaller than the original NLLB. It also supports more 2x more languages.
NLLB has horrible performance, I’ve done extensive testing with it and wouldn’t even translate a children’s book with it. Google Translator does a much better job and that’s saying something. lol
For most people, they only need a few languages, such as en cn jp. If there are multiple combination versions, I will use it to develop my own translation application
es, such as en cn jp. If there are multiple combination versions, I will use it to develop my own translation applic
Check the OPUS models by Helsinki-NLP: https://huggingface.co/Helsinki-NLP?sort_models=downloads#models
n00b here. can it run in oobabooga?
It should. Support for T5 based models was added in https://github.com/oobabooga/text-generation-webui/pull/1535
Yes, it indeed works. I managed to run the 10B model on CPU, it uses 40GB of ram, but somehow I felt like your 3b space gave me a better translation.
How do you load the model? I pasted jbochi/madlad400-3b-mt in the download model field and used “transformers” model loader, but it can’t handle it. OSError: It looks like the config file at ‘models/model.safetensors’ is not a valid JSON file.
I think I did exactly like you say, so I have no idea why you got an error.
Why the shitty name?
Gibberish names have been a things since the 90s. It’s hard coming up with a name when everyone is racing to create the next Big Thing. Also, I think techies are more tolerant of cumbersome names/domains.
The koboldcpp-1.46.1 (from October) says ERROR: Detected unimplemented GGUF Arch. It’s best to get the newest version of the backend.