I found a post from several months ago asking about this and this was recommended : lmsys/longchat-13b-16k · Hugging Face
but I wanted to check and see if there are any other recommendations? I am wanting an LLM I can run locally that can search long transcriptions of interviews, brainstorm sessions, etc. and organize it into outlines without leaving out important info.
I have an RTX 4090 24gb and 128 DDR5 ram.
Check out yi-34B 200K fine-tunes. You can load up to about 43K tokens on rtx 4090 if you use quantized version, 4.0bpw exllama v2 i believe.
Yi-34-200k is trained for summarization and does it really well