Daniël de Kok
07c70e7840
nix: improve impure devshell ( #2478 )
...
- Add some test dependencies.
- Install server in venv.
- Install Python client in venv.
2024-09-25 06:13:11 +00:00
Daniël de Kok
622c9c367a
nix: build Torch against MKL and various other improvements ( #2469 )
...
Updates tgi-nix input:
- Move Torch closer to upstream by building against MKL.
- Remove compute capability 8.7 from Torch (Jetson).
- Sync nixpkgs cumpute capabilities with Torch (avoids
compiling too mana capabilities for MAGMA).
- Use nixpkgs configuration passed through by `tgi-nix`.
2024-09-25 06:11:21 +00:00
Daniël de Kok
92ac02e4f2
nix: add default package ( #2453 )
...
The default package wraps the launcher and puts the server/router in the
path.
As a result, TGI can be started using something like:
```
nix run .# -- \
--model-id hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \
--port 8080
```
2024-09-25 06:10:59 +00:00
Daniël de Kok
a5af557359
nix: add text-generation-benchmark
to pure devshell ( #2431 )
...
nix: add text-generation-benchmark to pure devshell
2024-09-25 06:10:59 +00:00
Daniël de Kok
516392d790
nix: add pure server to flake, add both pure and impure devshells ( #2430 )
...
* nix: pure server and support both pure and impure devShells
* nix: remove unused poetry2nix input
It is not wired up and we now have a pure server.
* nix: add ipdb to impure devshell
2024-09-25 06:10:59 +00:00
Nicolas Patry
635dde8af9
Prefix caching ( #2402 )
...
* Prefix caching WIP
* Fixing prefix attention.
* Fixing flashinfer import.
* Fixing black.
* Fixing medusa (still wrong outputs, but functional).
* Just medusa values now.
* Fixing medusa without prefix caching.
* Fixing prefix caching.
* Medusa requires reshaping.
* Removing the logs.
* Remove router.nix
* Fixup:
- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.
* Update flake.lock
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-09-25 06:10:59 +00:00
Daniël de Kok
ddba272a66
nix: update to CUDA 12.4 ( #2429 )
...
* Update to CUDA 12.4
* poetry2nix: follow tgi-nix nixpkgs
2024-09-25 06:10:59 +00:00
Daniël de Kok
20ed7b598e
nix: try to reduce the number of Rust rebuilds ( #2424 )
...
Try to reduce the number of router/launcher rebuilds by filtering
sources. In this way, recompiles should only be triggered by changes
in Cargo or Rust files.
2024-09-25 06:08:38 +00:00
Daniël de Kok
e5c39a5545
nix: build router incrementally ( #2422 )
2024-09-25 06:08:00 +00:00
Nicolas Patry
4baa6ff59f
Upgrading exl2. ( #2415 )
...
* Upgrading exl2.
* Fixing the other pathways.
* Fix idefics.
2024-09-25 06:07:40 +00:00
Daniël de Kok
bae161ab84
nix: partial incremental build of the router ( #2416 )
...
This is less incremental than crate2nix, but does build all dependencies
separately, so avoids full rebuilds.
2024-09-25 06:06:17 +00:00
Nicolas Patry
c5e4c1877b
Adding more kernels to flake. ( #2411 )
2024-09-25 06:06:17 +00:00
Daniël de Kok
eb561bb715
nix: incremental build of the launcher ( #2410 )
2024-09-25 06:06:17 +00:00
Nicolas Patry
18d6be6af4
Updating the flake. ( #2404 )
2024-09-25 06:06:17 +00:00
Nicolas Patry
fbe59c6267
Adding launcher to build. ( #2397 )
2024-09-25 06:04:51 +00:00
Daniël de Kok
197dd3af12
nix: add router to the devshell ( #2396 )
2024-09-25 06:04:51 +00:00
Daniël de Kok
df719fd527
flake: use rust-overlay ( #2390 )
2024-09-25 06:04:51 +00:00
Daniël de Kok
e9ba044250
flake: add fmt and clippy ( #2389 )
2024-09-25 06:03:56 +00:00
Daniël de Kok
dc0fa60f55
Add experimental flake ( #2384 )
...
Add flake.nix
2024-09-25 06:01:59 +00:00