@shibe5

shibe5@alien.top · 1 year ago

Own web UI for experimenting.

shibe5@alien.top · 1 year ago

With the abundance of models, most developers and users have to select a small subset of available models for own evaluation, and that has to be based on some already available data about models’ performance. At that stage, selecting models with, for example, highest MMLU score is one way to go about it.

shibe5@alien.top · 1 year ago

I noticed this problem in llama.cpp too. I suspect that it may be because something is not implemented, that is required for Mistral models, e.g. sliding window attention. To confirm that, one can compare outputs from PyTorch with other software. I tried to do it, but PyTorch model runs out of system RAM with ~15k token prompt.

shibe5@alien.top · 1 year ago

1026

1023

1020

That would include models trained on a calculator.