Hey All,

I have few doubts about method to calculate tokens per second of LLM model.

  1. The way I calculate tokens per second of my fine-tuned models is, I put timer in my python code and calculate tokens per second. So if length of my output tokens is 20 and model took 5 seconds then tokens per second is 4. Am I using correct method or is there any other better method for this?

  2. If tokens per second of my model is 4 on 8 GB VRAM then will it be 8 tokens per second on 16 GB VRAM?

  • MINIMAN10001@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I understanding is that tokens per second typically splits into two categories the preprocessing time and the actual token generation time.

    At least from what I remember from oobabooga