mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-19 22:02:06 +00:00
* Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels Performance and accuracy of these kernels are on par (tested with Llama 70B and 405B). Removes a dependency and resolves some stability issues we have been seeing. * Update test snapshots |
||
---|---|---|
.. | ||
client.nix | ||
crate-overrides.nix | ||
docker.nix | ||
impure-shell.nix | ||
overlay.nix | ||
server.nix |