I’m fascinated by the whole ecosystem popping up around llama and local LLMs. I’m also curious what everyone here is up to with the models they are running.

Why are you interested in running local models? What are you doing with them?

Secondarily, how are you running your models? Are you truly running them on a local hardware or on a cloud service?

  • SomeOddCodeGuy@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Trying to get a better understanding of how prompts work in relation to fine-tunes, and trying to see if any of them are actually reliable enough to be used in a “production” type environment.

    My end goals are basically

    • A reliable AI assistant that I know is safe, secure and private. Any information about myself, my household or my proprietary ideas won’t be saved on some company’s server to be reviewed and trained upon. I don’t want to ask sensitive questions about stuff like taxes or healthcare or whatnot, just to have some person review it and it end up in a model
    • Eventually create a fine-tuned coding model for the languages I care about. Right now they’re all python, and ChatGPT is ok but they keep accidentally breaking it while trying to put up more guardrails against people doing crazy stuff. One day it’s great at javascript, the next it’s terribad. I need consistency, and I’ve realized that with proprietary models I don’t get that. A model in my home? I do.
    • Eventually create an IoT service across my home that is managed (with tight constraints) by an AI. Lots of guardrails. I don’t trust generative AI to not set my thermostat to 150 degrees lol.
    • Tinker with these things while they’re still new so that I can know how it works under the hood, so that when AI becomes more mainstream I’ll have a leg up, since my field (development) feels like it’s right there with artists on the chopping block when AI gets better lol
    • I’m putting together some guides and tutorials to help others get into open source AI too. The more folks who can tinker with it, the better.
    • Finally, I’m creating an AI assistant prompt card that will make one who won’t lie to me/hallucinate as much, and will speak in a more natural language while still having the knowledge it needs to answer questions well for me. I’m trying model after model looking for the right one that will help accomplish this. So far, XWin 70b using Vicuna instruction templates has been fantastic for this.

    A lot of it comes down to just wanting to learn, but a big piece of it is that I have consistency, stability and privacy when running an LLM at home.

    As for how I run it? Ho ho ho… a bit overkill, since as a developer I have a lot of hardware available to me.

    • M2 Ultra Mac Studio 192GB- main inference machine. It has 147GB of VRAM available. This acts as a headless server that I connect to on any device in my house. My main AI assistant runs off of this
    • My main desktop is an RTX 4090 machine windows box, so I run phind-codellama on it most of the time. If I need to extend the context window then I swap the M2 Ultra to phind so I can do 100,000 token context… but otherwise its so darn fast on the 4090 running q4 that I use that mostly.
    • A macbook pro that runs a little Mistral 7b on it. It also acts a server when I’m not on it, allowing my windows machine to have all 3 models running at once.

    I usually connect the mistral to Continue.Dev in Visual Studio code.

      • thetaFAANG@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        tax deductible if you use your imagination

        and you get to play with gear you already wanted

        and you get experience for super high paying jobs

        just comes down to fitting it within your budget to begin with

        • Infamous_Charge2666@alien.topB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          to tax deduct anything you have to earn. and most users here are students ( undergrads/phd’s/ masters) that make less to deduct 10k in pc hardware.

          Best way is to ask your program "( phd) for sponsoring, or if undergrad to apply to scholarships

    • simcop2387@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      A reliable AI assistant that I know is safe, secure and private. Any information about myself, my household or my proprietary ideas won’t be saved on some company’s server to be reviewed and trained upon. I don’t want to ask sensitive questions about stuff like taxes or healthcare or whatnot, just to have some person review it and it end up in a model

      I’m slowly working on a change to Home Assistant (https://www.home-assistant.io/) to take the OpenAI conversation addon that they have and make it support connecting to any base url. Along with that I’m going to make some more addons for other inference servers (particularly koboldcpp, exllamav2, and text-gen-webui) so that with all their new voice work this year I can plug things in and have a conversation with my smart home and other data that I provide it.

    • Aperturebanana@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I just checked out continue.dev and thank god for you what a cool thing! Any way to connect GPT4 with an API to visual studio code?

    • LostGoatOnHill@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Love your post and ambitions, very inspiring. Looking to do similar, with family engaging assistant connecting to home automation and private data. Look forward to seeing more on what you build, anywhere in particular you share aside from here?

    • hugganao@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      My main desktop is an RTX 4090 windows box, so I run phind-codellama on it most of the time. If I need to extend the context window then I swap the M2 Ultra to phind so I can do 100,000 token context… but otherwise its so darn fast on the 4090 running q4 that I use that mostly.

      are you running exllama on phind for 4090? was there a reason you’d need to run it on m2 ultra when switching to 100k context?

      also, I didn’t know mistral could do coding tasks, how is it?