Any way to decrease inference time during long chats?(+decrease repetition without breaking things)

The_One_Who_Slays@alien.top · 2 years ago

I’m still trying to figure out what are the correct settings for under 200k context. Ooba loads compress_emb(or whatever it’s called) to 5mils and I dunno if you should leave it alone or change it if you change the context size to, say, 64k.

The_One_Who_Slays@alien.top · 2 years ago

That would be amazing. I think something like that could even be included into ooba’s official extension repo.

The_One_Who_Slays@alien.top · 2 years ago

Any way to decrease inference time during long chats?(+decrease repetition without breaking things)

The_One_Who_Slays@alien.top · 2 years ago

They fine-tuned their model on LLama2 or what?

The_One_Who_Slays@alien.top · 2 years ago

Sooo, how’s the model?

The_One_Who_Slays@alien.top · 2 years ago

In other news: the water is wet.