Following the release of Dimensity 9300 and S8G3 phones, I am expecting growth in popularity of LLMs running on mobile phones, as quantized 3B or 7B models can already run on high-end phones from five years ago or later. But despite it being possible, there are a few concerns, including power consumption and storage size. I’ve seen posts about successfully running LLMs on mobile devices, but seldom see people discussing about future trends. What are your thoughts?

  • NDBellisario@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Latency is one thing with the internet.

    Any model that can run locally doesn’t need a round trip to a datacenter. This can of course depending on computer power

    • Maykey@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      At current capabilities it’s faster to query server on the opposite hemisphere than to generate locally.

    • CocksuckerDynamo@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      round trip latency of an http request (or grpc or whatever pick your poison) is utterly insignificant compared to the time it takes to run the inference process, even for the smallest models with the fastest inference