Google released T5X checkpoints for MADLAD-400 a couple of months ago, but nobody could figure out how to run them. Turns out the vocabulary was wrong, but they uploaded the correct one last week.

I’ve converted the models to the safetensors format, and I created this space if you want to try the smaller model.

I also published quantized GGUF weights you can use with candle. It decodes at ~15tokens/s on a M2 Mac.

It seems that NLLB is the most popular machine translation model right now, but the license only allows non commercial usage. MADLAD-400 is CC BY 4.0.

  • redditmias@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Nice, I will check madlad later. Now, I thought seamless4MT was the best translation model from meta, I didnt even know about this NLLB existed. Does anyone have used both and can point out the difference? seamless4mt seemd amazingly good in my experience, but have less languages perhaps, idk

  • phoneixAdi@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Nice thank you!! Tried in space. Works well for me. Noob question. Can I run this with llama.cpp? Since it’s gguf. Can I download this and run it locally?

  • vasileer@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I tested the 3B model for Romanian, Russian, French, and German translations of the “The sun rises in the East and sets in the West.” and it works 100%: it gets 10/10 from ChatGPT

  • yugaljain1999@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    @jbochi , Is it possible to run cargo example for batch inputs?

    cargo run --example t5 --release --features cuda – \ –model-id “jbochi/madlad400-3b-mt” \ –prompt “<2de> How are you, my friend?” \ –temperature 0

    Thanks

  • cygn@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Could you please convert the other versions as well or release the code you used ?

  • cygn@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I tested two sentences: one from hindi to english, which it translated fine. Another was romanized hindi which it couldn’t handle: input: Sir mera dhaan ka fasal hai Output was the same as input. Both ChatGPT and Google Translate can handle this.

  • Townsiti5689@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Not sure if this has been asked yet, but how good are the translations from this model compared to normal GPT-3.5 and Claude?

    Thanks.

    • jbochi@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Good question. ALMA compares itself against NLLB and GPT3.5, and the 13B barely surpasses GPT3.5. MADLAD-400 probably beats GPT3.5 on lower resource languages only.

  • Inevitable_Emu2722@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Hi, i have the following error while trying to run it from transformers copying the code provided in huggingface

    Traceback (most recent call last):

    File “/home/XXX/project/translation/translateMADLAD.py”, line 10, in

    tokenizer = T5Tokenizer.from_pretrained(‘jbochi/madlad400-3b-mt’)

    File “/home/lXXX/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py”, line 1841, in from_pretrained

    return cls._from_pretrained(

    File “/home/lXXX/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py”, line 2060, in _from_pretrained

    raise ValueError(

    ValueError: Non-consecutive added token ‘’ found. Should have index 256100 but has index 256000 in saved vocabulary.

  • lowkeyintensity@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Meta’s NLLB is supposed to be the best translator model, right? But it’s for non-commercial use only. How does MADLAD compare to NLLB?

    • jbochi@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The MADLAD-400 paper has a bunch of comparisons with NLLB. MADLAD beats NLLB in some benchmarks, it’s quite close in others, and it loses some. But the largest MADLAD is 5x smaller than the original NLLB. It also supports more 2x more languages.

    • HaruSosake@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      NLLB has horrible performance, I’ve done extensive testing with it and wouldn’t even translate a children’s book with it. Google Translator does a much better job and that’s saying something. lol

  • Serious-Commercial10@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    For most people, they only need a few languages, such as en cn jp. If there are multiple combination versions, I will use it to develop my own translation application

      • Igoory@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Yes, it indeed works. I managed to run the 10B model on CPU, it uses 40GB of ram, but somehow I felt like your 3b space gave me a better translation.

        • cygn@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          How do you load the model? I pasted jbochi/madlad400-3b-mt in the download model field and used “transformers” model loader, but it can’t handle it. OSError: It looks like the config file at ‘models/model.safetensors’ is not a valid JSON file.

          • Igoory@alien.topB
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            I think I did exactly like you say, so I have no idea why you got an error.

    • lowkeyintensity@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Gibberish names have been a things since the 90s. It’s hard coming up with a name when everyone is racing to create the next Big Thing. Also, I think techies are more tolerant of cumbersome names/domains.

  • justynasty@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The koboldcpp-1.46.1 (from October) says ERROR: Detected unimplemented GGUF Arch. It’s best to get the newest version of the backend.