Sorry if this is off-topic but I think it’s adjacent since many LlaMA users are also using it.

I’m trying to use the Coqui TTS library with a view to plugging it into LLaMA.cpp but for some reason no matter which model I try my attempts at using British English source speech just ends up with an American sounding voice with various distortions. I’m running the Python module as instructed in the docs under macOS on the M1 platform, I’ve tried various models all with similar results.

Nothing at all against American accents but they’re not what I require at the moment so any help in making Coqui sound like a little more RP would be much appreciated!

  • a_beautiful_rhind@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I have the opposite problem with other TTS where there are only UK accents and those don’t work for clearly US characters.

  • Material1276@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I was struggling at first and had that American twang coming through…

    But I managed to get a very clear, short clip of an English actor from an interview. There was no background noises, it was very clear. I made sure to clip out any non speech from the start or end of the audio, then saved it as a 22050HZ mono 16bit wav.

    That seems to have done it! I get a pretty good representation of the voice and it 99% seems to stay in character with the occasional slight slip.

    I also occasionally get a little gibberish, which seems to be when my model is trying to say somehthing like " ’ " (which occasionally slips through when its generating text and I look at the backend of whats being sent for audio processing). Im guessing its possible to filter this out with a regex or something.