I’m currently trying to figure out where it is the cheapest to host these models and use them.

I realized that a lot of the finetunings are not available on common llm api sites, i want to use nous capybara 34b for example but the only one that offered that charged 20$/million tokens which seemed quite high, considering that i see Lama 70b for around 0.7$/million tokens.

So are there any sites where i could host custom finetunes and get similar rates to the one mentioned?

  • Kimononono@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    would a service like runpod work for you? It sells you GPU power by the hour instead of by token

  • andrewlapp@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    You might rent a GPU from runpod or another cloud provider.

    Memory requirements:

    34B Model Memory Requirements (infer)
    
    Seq Len vs Bit Precision
    SL / BP |     4      |     6      |     8      |     16    
    -----------------------------------------------------------
        512 |     15.9GB |     23.8GB |     31.8GB |     63.6GB
       1024 |     16.0GB |     23.9GB |     31.9GB |     63.8GB
       2048 |     16.1GB |     24.1GB |     32.2GB |     64.3GB
       4096 |     16.3GB |     24.5GB |     32.7GB |     65.3GB
       8192 |     16.8GB |     25.2GB |     33.7GB |     67.3GB
      16384 |     17.8GB |     26.7GB |     35.7GB |     71.3GB
    
    • yahma@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The startup time makes Replicate nearly unusable for me. Only popular models stay in memory. Other less used models shutdown, and you need to wait for startup before first inference.