bullerwins@alien.topB to

LocalLLaMAEnglish · 1 year ago

Do GGUF not take all the VRAM needed when loaded?

2

1

Do GGUF not take all the VRAM needed when loaded?

bullerwins@alien.topB to

LocalLLaMAEnglish · 1 year ago

2

Is this normal behavior?

I’m still learning but I noticed that if I load a normal LLM like https://huggingface.co/teknium/OpenHermes-2-Mistral-7B it will take all the VRAM available (I have a 3080 10GB).

But when I load the quantized model like https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF it will take almost nothing of the VRAM, maybe like 1GB?

Is this normal behaviour?

You must log in or register to comment.

Chat

bullerwins@alien.topOPB
link
fedilink
English
arrow-up
1·
1 year ago
Update: I just saw that I had the GPU layers at 0, so it was running all in CPU then?
The slider goes from 0 to 128, how do I know what to pick?

https://preview.redd.it/snrkzjg43v1c1.png?width=1442&format=png&auto=webp&s=b356f72d5deaa5a49e19fbf3e91d0c22e2bc333b
Aaaaaaaaaeeeee@alien.topB
link
fedilink
English
arrow-up
1·
1 year ago
for cpu only it is not viewable due to mmap-loading which saves time during startup. to view, use --no-mmap