Commit Graph

18 Commits

Author SHA1 Message Date
Daniël de Kok
622c9c367a nix: build Torch against MKL and various other improvements (#2469)
Updates tgi-nix input:

- Move Torch closer to upstream by building against MKL.
- Remove compute capability 8.7 from Torch (Jetson).
- Sync nixpkgs cumpute capabilities with Torch (avoids
  compiling too mana capabilities for MAGMA).
- Use nixpkgs configuration passed through by `tgi-nix`.
2024-09-25 06:11:21 +00:00
Daniël de Kok
92ac02e4f2 nix: add default package (#2453)
The default package wraps the launcher and puts the server/router in the
path.

As a result, TGI can be started using something like:

```
nix run .# -- \
  --model-id hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 \
  --port 8080
```
2024-09-25 06:10:59 +00:00
Daniël de Kok
a5af557359 nix: add text-generation-benchmark to pure devshell (#2431)
nix: add text-generation-benchmark to pure devshell
2024-09-25 06:10:59 +00:00
Daniël de Kok
516392d790 nix: add pure server to flake, add both pure and impure devshells (#2430)
* nix: pure server and support both pure and impure devShells

* nix: remove unused poetry2nix input

It is not wired up and we now have a pure server.

* nix: add ipdb to impure devshell
2024-09-25 06:10:59 +00:00
Nicolas Patry
635dde8af9 Prefix caching (#2402)
* Prefix caching WIP

* Fixing prefix attention.

* Fixing flashinfer import.

* Fixing black.

* Fixing medusa (still wrong outputs, but functional).

* Just medusa values now.

* Fixing medusa without prefix caching.

* Fixing prefix caching.

* Medusa requires reshaping.

* Removing the logs.

* Remove router.nix

* Fixup:

- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.

* Update flake.lock

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-09-25 06:10:59 +00:00
Daniël de Kok
ddba272a66 nix: update to CUDA 12.4 (#2429)
* Update to CUDA 12.4

* poetry2nix: follow tgi-nix nixpkgs
2024-09-25 06:10:59 +00:00
Daniël de Kok
20ed7b598e nix: try to reduce the number of Rust rebuilds (#2424)
Try to reduce the number of router/launcher rebuilds by filtering
sources. In this way, recompiles should only be triggered by changes
in Cargo or Rust files.
2024-09-25 06:08:38 +00:00
Daniël de Kok
e5c39a5545 nix: build router incrementally (#2422) 2024-09-25 06:08:00 +00:00
Nicolas Patry
4baa6ff59f Upgrading exl2. (#2415)
* Upgrading exl2.

* Fixing the other pathways.

* Fix idefics.
2024-09-25 06:07:40 +00:00
Daniël de Kok
bae161ab84 nix: partial incremental build of the router (#2416)
This is less incremental than crate2nix, but does build all dependencies
separately, so avoids full rebuilds.
2024-09-25 06:06:17 +00:00
Nicolas Patry
c5e4c1877b Adding more kernels to flake. (#2411) 2024-09-25 06:06:17 +00:00
Daniël de Kok
eb561bb715 nix: incremental build of the launcher (#2410) 2024-09-25 06:06:17 +00:00
Nicolas Patry
18d6be6af4 Updating the flake. (#2404) 2024-09-25 06:06:17 +00:00
Nicolas Patry
fbe59c6267 Adding launcher to build. (#2397) 2024-09-25 06:04:51 +00:00
Daniël de Kok
197dd3af12 nix: add router to the devshell (#2396) 2024-09-25 06:04:51 +00:00
Daniël de Kok
df719fd527 flake: use rust-overlay (#2390) 2024-09-25 06:04:51 +00:00
Daniël de Kok
e9ba044250 flake: add fmt and clippy (#2389) 2024-09-25 06:03:56 +00:00
Daniël de Kok
dc0fa60f55 Add experimental flake (#2384)
Add flake.nix
2024-09-25 06:01:59 +00:00