Looking at mlc-llm, vllm, nomic, etc. they all seem focused on inferencing with a vulkin backend and all have made statements about multi gpu support either on their roadmaps or being worked on over the past few months. Every time I see one say they added multi gpu support it turns out they just incorporated llama.cpp’s CUDA and HIP support rather than implementing it on vulkan. Are there any projects that actually do multi gpu with vulkin and is there some technical reason it doesn’t work? I only ask because vulkan is available on multiple platforms with default installs and would surely make things easier for end users.
You must log in or register to comment.