In no particular order! Don’t forget to use each of their specific prompts for the best generations!
AWQ, and GGUF also available.
https://huggingface.co/NurtureAI/zephyr-7b-beta-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
https://huggingface.co/NurtureAI/neural-chat-7b-v3-1-16k
https://huggingface.co/NurtureAI/SynthIA-7B-v2.0-16k
Have fun LocalLLaMA fam <3 ! Let us know what you find! <3
I’m not sure who told who that Mistral models are only 8k or 4k. The sliding window is not the context size, it is the embedding positions that is the context size which is 32k.
The official Mistral product information.
Does Mistral themselves actually mention 32k anywhere?
It has 32k, they mention it in their config “max_position_embeddings”: 32768. This is the sequence length.
https://preview.redd.it/5r2c9592vr0c1.png?width=256&format=png&auto=webp&s=be88f25168e3cec16cbe7f9aad15f678edf97e99
But “true” 16K-32K models like MistralLite seem to perform much better at long context than the default Mistral config.
There is nothing “true” context length about MistralLite. You are essentially removing the sliding window by doing what Amazon or Yarn is doing.
https://preview.redd.it/rqe1hwc1vr0c1.png?width=256&format=png&auto=webp&s=79f14a98c097d2e8fb5718ffa4d524353b059a10