• 1 Post
  • 5 Comments
Joined 1 year ago
cake
Cake day: November 17th, 2023

help-circle
  • Desm0nt@alien.topBtoLocalLLaMA55B Yi model merges
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Hm. I just load gguf yi-34b-chat q4_k_m in oobabooga via llama.cpp with default params and 8k context and it’s just work like a charm. Better (more lively language) than any 70b from openrouter (my local machine can’t handle 70b)


  • By loading a 20B-Q4_K_M model (50/65 layers offloaded seems to be the fastest from my tests) i currently get arround 0.65 t/s with a low context size of 500 or less, and about 0.45t/s nearing the max 4096 context.

    Sound suspicious. A use Yi-Chat-34b-Q4_K_M on old 1080ti (11 gb VRAM) with 20 layers offloaded and got around 2.5 t/s.But it is on Threadripper 2920 with 4 channel RAM (also 3200). However I don’t think it would make that much difference. Ofcourse in 4 channel I have ram bandwidth x2 of your’s but I run 34b and I load only 20 layers on gpu…



  • Answers like this (I can do no harm) to questions like this clearly show how dumb LLMs really are and how far away we are from AGI. They have absolutely no idea basically what they are being asked and what their answer is. Just a cool big T9 =)

    In light of this, the drama in OpenAI with their arguments about the danger of AI capable of destroying humanity looks especially funny.