I’m currently trying to figure out where it is the cheapest to host these models and use them.

I realized that a lot of the finetunings are not available on common llm api sites, i want to use nous capybara 34b for example but the only one that offered that charged 20$/million tokens which seemed quite high, considering that i see Lama 70b for around 0.7$/million tokens.

So are there any sites where i could host custom finetunes and get similar rates to the one mentioned?

  • andrewlapp@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    You might rent a GPU from runpod or another cloud provider.

    Memory requirements:

    34B Model Memory Requirements (infer)
    
    Seq Len vs Bit Precision
    SL / BP |     4      |     6      |     8      |     16    
    -----------------------------------------------------------
        512 |     15.9GB |     23.8GB |     31.8GB |     63.6GB
       1024 |     16.0GB |     23.9GB |     31.9GB |     63.8GB
       2048 |     16.1GB |     24.1GB |     32.2GB |     64.3GB
       4096 |     16.3GB |     24.5GB |     32.7GB |     65.3GB
       8192 |     16.8GB |     25.2GB |     33.7GB |     67.3GB
      16384 |     17.8GB |     26.7GB |     35.7GB |     71.3GB