• So, i’ve been doing all my LLM-tinkering on an M1-- using llama.cpp/whisper.cpp for to run a basic voice powered assistant, nothing new at this point.
  • Currently adding a visual component to it-- ShareGPT4V-7B, assuming I manage to convert to gguf. Once thats done i should be able to integrate it with llama.cpp and wire it to a live camera feed-- giving it eyes.
  • Might even get crazy and throw in a low level component to handle basic object detection, letting the model know when something is being “shown” to the to it-- other than that it will activate when prompted to do so (text or voice).

The one thing I’m not sure about is how to run a TTS engine locally like StyleTTS2-LJSpeech? are there libraries that support tts models?

  • LyPreto@alien.topOPB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    tried coqui and had issues with performance— read online and its doesnt seem to fully support MPS.

    for now i’m using upon edge-tts which is doing the trick for now and is pretty decent/free.

    is xtts supported on macs?

    • a_beautiful_rhind@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It’s tortoise so who knows. There is mac pytorch now. You would have to figure it out from scratch. I’m not sure why nobody is trying it.

      When I tried edge-tts it was very mediocre like silero.