programmerChilli@alien.topB to LocalLLaMAEnglish · 1 year agoGPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!plus-squaremessage-squaremessage-square1fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1message-squareGPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!plus-squareprogrammerChilli@alien.topB to LocalLLaMAEnglish · 1 year agomessage-square1fedilink