Hi all I’m wondering if is there a possibility to spread load of localLLM on multiple hosts instead of adding gpu’s to speed up responses. My host do not have gpu’s since I want to be power effective, but they have decent ammont of ram 128. Thx for all ideas.

  • Feeling-Currency-360@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    On another note these gpu manufactures must get their head out of their ass and start cranking out cards with much higher memory capacities. First one to do it cost effectively will gain massive market share and huge profits. Nvidia’s A100 etc doesn’t qualify for this as it’s prohibitively expensive.