mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-11-18 23:15:59 +00:00
* Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels Performance and accuracy of these kernels are on par (tested with Llama 70B and 405B). Removes a dependency and resolves some stability issues we have been seeing. * Update test snapshots |
||
|---|---|---|
| .. | ||
| client.nix | ||
| crate-overrides.nix | ||
| docker.nix | ||
| impure-shell.nix | ||
| overlay.nix | ||
| server.nix | ||