qrios@alien.topBtoLocalLLaMA•I don't understand Mistral and context size, honestly.English
1·
1 year agoThat 32k is theoretical. Base mistral hasn’t actually been trained to work with contexts that large. You might want to look at amazon’s finetune of it for long context. https://huggingface.co/amazon/MistralLite
I suspect it’ll amount to trading off quality for size though.
I don’t think it’s so simple as “the nature of the beast.”
From my own experiments, you can maintain coherence by having stuff scale more the further back it is, but at some cost to accuracy. So stuff further back is more confused, but still accessible, and stuff more recent is still grounding the generation.
I haven’t tested super thoroughly though.