Cheapest site for hosting custom LLM models?

StrangeImagination5@alien.top · 2 years ago

Cheapest site for hosting custom LLM models?

Kimononono@alien.top · 2 years ago

would a service like runpod work for you? It sells you GPU power by the hour instead of by token

m98789@alien.top · 2 years ago

dwoodwoo@alien.top · 2 years ago

Huggingface

andrewlapp@alien.top · 2 years ago

You might rent a GPU from runpod or another cloud provider.

Memory requirements:

34B Model Memory Requirements (infer)

Seq Len vs Bit Precision
SL / BP |     4      |     6      |     8      |     16    
-----------------------------------------------------------
    512 |     15.9GB |     23.8GB |     31.8GB |     63.6GB
   1024 |     16.0GB |     23.9GB |     31.9GB |     63.8GB
   2048 |     16.1GB |     24.1GB |     32.2GB |     64.3GB
   4096 |     16.3GB |     24.5GB |     32.7GB |     65.3GB
   8192 |     16.8GB |     25.2GB |     33.7GB |     67.3GB
  16384 |     17.8GB |     26.7GB |     35.7GB |     71.3GB

AntoItaly@alien.top · 2 years ago

Replicate $0.000575/sec for a Nvidia A40 (48GB Vram)

yahma@alien.top · 2 years ago

The startup time makes Replicate nearly unusable for me. Only popular models stay in memory. Other less used models shutdown, and you need to wait for startup before first inference.

No_Baseball_7130@alien.top · 2 years ago

0.000575

that is nearly 2.1$ per hour. on https://runpod.io, you could get an a40 for 0.79$ / hr. for a 34b model, 24gb vram is more than enough so you could get a A5000 for around 0.44$ / hr