The_One_Who_Slays@alien.topOPBtoLocalLLaMA•Any way to decrease inference time during long chats?(+decrease repetition without breaking things)English
1·
1 year agoThat would be amazing. I think something like that could even be included into ooba’s official extension repo.
I’m still trying to figure out what are the correct settings for under 200k context. Ooba loads compress_emb(or whatever it’s called) to 5mils and I dunno if you should leave it alone or change it if you change the context size to, say, 64k.