Commit Graph

142 Commits

Author SHA1 Message Date
Nicolas Patry
7a5855ff01
NCCL ? 2024-09-17 09:37:05 +02:00
Nicolas Patry
fb7e8c8970
Add the cache. 2024-09-17 09:20:12 +02:00
Guillaume LEGENDRE
2aa2851e01
use runners with cache 2024-09-17 08:12:19 +02:00
Nicolas Patry
87c85fdc38
Standard setup. 2024-09-16 17:04:11 +02:00
Nicolas Patry
69c20a9d3f
Tmate let's find with ldconfig ? 2024-09-16 17:03:28 +02:00
Nicolas Patry
c784cb401d
Let's try a compat drvier ? 2024-09-16 17:03:28 +02:00
Nicolas Patry
fe533dc57b
Back to failing version 2024-09-16 17:03:28 +02:00
Nicolas Patry
2f1f082abe
Tmate. 2024-09-16 17:03:28 +02:00
Nicolas Patry
1a6b9926f6
missing lib. 2024-09-16 17:03:27 +02:00
Nicolas Patry
332e42f59a
Attempt. 2024-09-16 17:03:27 +02:00
Nicolas Patry
ec6fe324c6
Link to nix owned lib 2024-09-16 17:03:27 +02:00
Nicolas Patry
83ee55a617
Trye somethign. 2024-09-16 17:03:27 +02:00
Nicolas Patry
047530216c
No idea where the shared disk is. 2024-09-16 17:03:27 +02:00
Nicolas Patry
9f548fa82a
Change the home location ? 2024-09-16 17:03:27 +02:00
Nicolas Patry
3ff12084b7
Revert "No tmate."
This reverts commit 6b9b6d951897127ae1ce09c8f61f86a64b301fec.
2024-09-16 17:03:26 +02:00
Nicolas Patry
26634f9697
No tmate. 2024-09-16 17:03:26 +02:00
Nicolas Patry
a533d086f0
Tmate to find cache. 2024-09-16 17:03:26 +02:00
Nicolas Patry
a5b81ab457
Home. 2024-09-16 17:03:26 +02:00
Nicolas Patry
98f2241a88
Put back libnvidia-ml 2024-09-16 17:03:26 +02:00
Nicolas Patry
72a805d50d
Remove tmate. 2024-09-16 17:03:26 +02:00
Nicolas Patry
45c0129976
Attempting something. 2024-09-16 17:03:25 +02:00
Nicolas Patry
2b18537f85
More tmate. 2024-09-16 17:03:25 +02:00
Nicolas Patry
12b88204b0
Putting the cuda package in the flake. 2024-09-16 17:03:25 +02:00
Nicolas Patry
d7333830b5
Tmate. 2024-09-16 17:03:25 +02:00
Nicolas Patry
c4bbe06bf1
Simpler command 2024-09-16 17:02:45 +02:00
Nicolas Patry
d0ae24a167
Release tests. 2024-09-16 17:02:25 +02:00
Nicolas Patry
5c4b2eaa30
Seeing the damage on the release tests. 2024-09-16 17:01:51 +02:00
Nicolas Patry
70f910bba6
Remove tmate. 2024-09-16 17:01:51 +02:00
Nicolas Patry
5adece6313
This doesn't seem needed. 2024-09-16 17:01:51 +02:00
Nicolas Patry
b7cb8d5145
Let's figure out the issue... 2024-09-16 17:01:30 +02:00
Nicolas Patry
3d7b81535a
Only link cuda driver librairies. 2024-09-16 17:01:30 +02:00
Nicolas Patry
ce3efc83ed
Remove tmate. 2024-09-16 17:01:30 +02:00
Nicolas Patry
7f58f7dc61
Symlink all the things. 2024-09-16 17:01:29 +02:00
Nicolas Patry
42107de71f
Let's try to find libnvidia-ml 2024-09-16 17:01:29 +02:00
Nicolas Patry
edaa7f847d
Does this work ? 2024-09-16 17:01:29 +02:00
Nicolas Patry
d1e79ddae0
Fix override. 2024-09-16 17:01:29 +02:00
Nicolas Patry
db054b95df
Check the paths. 2024-09-16 17:01:29 +02:00
Nicolas Patry
afcd047a58
Yaml yaml. 2024-09-16 17:01:29 +02:00
Nicolas Patry
60db294f9a
Link cuda to nix ? 2024-09-16 17:01:28 +02:00
Nicolas Patry
8e7c7c61f1
Let's see what the issue is ? 2024-09-16 17:01:28 +02:00
Nicolas Patry
c227345878
Run on actual GPUs. 2024-09-16 17:01:28 +02:00
Nicolas Patry
f47cdc1fe1
Attempting rapidly the integration tests. 2024-09-16 17:01:26 +02:00
Nicolas Patry
d95c670ada
Add nix test. (#2513)
* Add nix test.

* Modifying yourself means you need to rerun.

* Fixing the test + adding click (needed for pre-commit hooks).

* Try thuis.

* Our runner + pure test (not written)

* Reemove server.

* Root user.

* Different user ?

* Add the actual test target.

* Forgot this modification.

* Add a formatter.

* Add the secrets.

* Fixed the auth token ?

* Adding the other tests.

* Missing pre-commit.

* Test requires cargo for cargo fmt.

* Update it a bit.

* Up.

* Attempting to use a cache location for the models.

* Ignore the cache for now.
2024-09-12 14:54:56 +02:00
Nicolas Patry
dae3bf1d87
Fix tokenization yi (#2507)
* Fixing odd tokenization self modifications on the Rust side (load and
resave in Python).

* Fixing the builds ?

* Fix the gh action?

* Fixing the location ?

* Validation is odd.

* Try a faster runner

* Upgrade python version.

* Remove sccache

* No sccache.

* Getting libpython maybe ?

* List stuff.

* Monkey it up.

* have no idea at this point

* Tmp.

* Shot in the dark.

* Tmate the hell out of this.

* Desperation.

* WTF.

* -y.

* Apparently 3.10 is not available anymore.

* Updating the dockerfile to make libpython discoverable at runtime too.

* Put back rust tests.

* Why do we want mkl on AMD ?

* Forcing 3.11 ?
2024-09-11 22:41:56 +02:00
Nicolas Patry
e415b690a6
Lots of improvements (Still 2 allocators) (#2449)
* Making prefix/flashinfer the default and testing the full release tests.

* Include flashinfer in the docker.

* Using prebuilt.

* Allowing window_left_size (dummy version).

* Disabling flashinfer/prefix caching on odd head_dim

* Disable prefix caching for lora.

* More specific codes.

* Update lock

* Updating integration tests with new values with FI/FD.

Remove paged as a default too, and using FD everywhere.

* Update cargo lock ?

* Upgrade to 1.80 because of bitstream...

* Everywhere 1.80

* Forgot last default place.

* Apply suggestions from code review

Co-authored-by: drbh <david.richard.holtz@gmail.com>

* Updated flake lock

* Tmp

* Upgrade resolution system for less errors in resolution.

* Remove lambda for cleaner function.

* Handling debugger.

* OVerride the env in server tests.

* Is this enough to make it work ?

* This seems to be working.

* Downgrade some logs.

* Fixing the default for vlm.

* Don't enable prefix caching on VLM just yet.

* Change `add_special_tokens` in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)

* Fixing prefix caching for flashdecoding.

* Update all models.

* Fixed flashinfer version.

* add_special_tokens is internal only

* Fixing seqlen with the new vlms.

* Fixing the issue with `add_special_tokens` not being passed around.

* Fixing the test.

* Removing encoder_decoder (seq2seq).

* Update the chat test.

* Fixing the batching tokenization in flash causal lm.

* Truncating left for radix purposes.

* Oops this doesn't belong here.

* Put back default pure shell.

* Update server tests

- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room

* Only n_heads / process_group.size() are necessary.

* Revert the integrationt tests change (seem linked to head_size
modification).

* Adding error message when assert is violated.

* Fixing the free algorithm to handle times where the common prefix is
smaller.

* Apply suggestions from code review

Co-authored-by: OlivierDehaene <olivier@huggingface.co>

* Update server/text_generation_server/layers/attention/common.py

Co-authored-by: OlivierDehaene <olivier@huggingface.co>

* Fix disabling prefix caching - Fix windowing checks.

* Revert the Cohere tokenizer change (for now using a revision instead).

* Fmt.

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2024-08-29 16:29:01 +02:00
Nicolas Patry
2788d41a76
Fixing CI. (#2462) 2024-08-27 15:33:02 +02:00
Nicolas Patry
e4201f44cf
All integration tests back everywhere (too many failed CI). (#2428)
* All integration tests back everywhere (too many failed CI).

* Upgrade integration tests after 12.4

* Attempt to remove the specifed compute cap.

* Common arch list.

* Punica uses raw ASM which is not valid on 9.0 apparently.
2024-08-16 21:19:46 +02:00
Hugo Larcher
53729b74ac
doc: Add metrics documentation and add a 'Reference' section (#2230)
* doc: Add metrics documentation and add a 'Reference' section

* doc: Add API reference

* doc: Refactor API reference

* fix: Message API link

* Bad rebase

* Moving the docs.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-08-16 19:43:30 +02:00
Wang, Yi
b6bb1d5160
Cpu dockerimage (#2367)
add intel-cpu docker image

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-08-12 14:10:30 +02:00
Daniël de Kok
22fb1be588
Fix cache block size for flash decoding (#2351)
* Fix cache block size for flash decoding

This seems to have been accidentally dropped during the TRT-LLM
PR rebase.

* Also run CI on changes to `backends`
2024-08-01 15:38:57 +02:00