Commit Graph

1175 Commits

Author SHA1 Message Date
Nicolas Patry
97d4bdd685 Cleanup Vertex + Chat (#2553)
* Cleanup Vertex + Chat

* logprobs defaults to false.

* Parameters are optional

* Fix  docs.

* Changing back this logprobs default.

* Fixup doc.

* Let's debug that.

* Not unstable.

* Updating Cargo ?

* Wat?

* Dummy change.

* Trying some other install.

* Trying smething.

* Revert everything.

* Update Cargo lock.

* Fixing the pre-commit after rebase.
2024-10-25 09:01:04 +00:00
Nicolas Patry
25e0edf337 Hotfixing main. (#2562) 2024-10-25 09:01:04 +00:00
Aritra Roy Gosthipaty
782130df17 Adding note for private models in quick-tour document (#2548)
* chore: adding note for private models in quicktour doc

* Update docs/source/quicktour.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update docs/source/quicktour.md

Co-authored-by: vb <vaibhavs10@gmail.com>

* Update docs/source/quicktour.md

Co-authored-by: vb <vaibhavs10@gmail.com>

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
2024-10-25 09:01:04 +00:00
Orhun Parmaksız
5247f8938d Simplify crossterm imports (#2545) 2024-10-25 09:01:04 +00:00
Orhun Parmaksız
8c6d3e074f Update the link to the Ratatui organization (#2546) 2024-10-25 09:01:04 +00:00
Daniël de Kok
d4f995e718 Add DenseMoELayer and wire it up in Mixtral/Deepseek V2 (#2537)
This replaces the custom layers in both models.
2024-10-25 09:01:04 +00:00
Daniël de Kok
32d50c2ea7 Add support for scalar FP8 weight scales (#2550)
* Add support for scalar FP8 weight scales

* Support LLM compressor FP8 checkpoints on H100

On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype.
However, we wouldn't pick up fp8 quantization for models quantized with
LLM compressor. This change adds enough parsing to detect if models have
FP8-quantized weights.

* Remove stray debug print
2024-10-25 09:01:04 +00:00
Nicolas Patry
68cfc94f40 Hotfixing main (#2556) 2024-10-25 08:53:47 +00:00
Nicolas Patry
79ac2b741d Micro cleanup. (#2555) 2024-10-25 08:53:47 +00:00
OlivierDehaene
73e6090d53 chore: Add old V2 backend (#2551)
* wip

* added v2
2024-10-25 08:53:36 +00:00
Daniël de Kok
9aed9d5f81 nix: remove unused _server.nix file (#2538) 2024-10-25 08:53:36 +00:00
yuanwu
b590310255 Add missing import package
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-10-25 08:52:24 +00:00
yuanwu
8ebe77b3be Simplify the warmup
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-10-25 08:38:59 +00:00
yuanwu2017
8686a0fc6d
Merge branch 'habana-main' into 2.3.0 2024-10-23 16:32:12 +08:00
yuanwu
67ee45a270 Pass the max_batch_total_tokens to causal_lm
refine the warmup

Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-10-23 08:28:26 +00:00
Thanaji Rao Thakkalapelli
c5e3881051
Enables Flash Attention in TGI for gemma models (#235) 2024-10-18 09:20:42 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO)
9ae5ad5057
requirements name - cabelo@opensuse.org (#237) 2024-10-18 09:20:05 -07:00
Thanaji Rao Thakkalapelli
46b14e6b28
Remove all references to habana_quantization_toolkit for 1.18 (#229) 2024-10-18 10:59:59 +02:00
Thanaji Rao Thakkalapelli
21c13ff3a6
Remove References to torch compile mode in readme (#236) 2024-10-17 14:07:51 -07:00
Sun Choi
8ae5d4c7d6
Ignore EOS for benchmark by using TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN (#234) 2024-10-16 11:57:36 +02:00
Mandy Li
d07e7f4f62
Merge pull request #233 from huggingface/fix_sysntax
Fix sysntax error in PR 232
2024-10-15 14:33:21 -07:00
Thanaji Rao Thakkalapelli
87a1cee32c
Fix sysntax error in PR 232 2024-10-15 13:23:48 -07:00
Thanaji Rao Thakkalapelli
e06320f64e
Enabling Flash Attention support for falcon model (#232) 2024-10-15 19:50:17 +02:00
Sun Choi
0578bd917d
Fix gpt_bigcode/starcoderbase-3b accuracy issue (#228)
Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
2024-10-14 10:01:55 +02:00
Mohit Deopujari
fe8a373831
Enhancements to README (#226) 2024-10-02 12:22:33 +02:00
yuanwu
bab529c916 Make Gaudi adapt to the tgi 2.3.0
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-09-26 06:04:55 +00:00
yuanwu2017
e424752fa3
Enable the AutoGPTQ (#217)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-09-25 18:55:02 +02:00
yuanwu
14fdc4ae5e Add some missing modification of 2.3.0 because of conflict
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-09-25 07:49:49 +00:00
Nicolas Patry
514a5a737d Preparing for release. (#2540)
* Preparing for release.

* Upgrade version in docs.
2024-09-25 06:20:50 +00:00
OlivierDehaene
bd9675c8c7 fix: wrap python basic logs in debug assertion in launcher (#2539)
* fix: wrap python basic logs in debug assertion in launcher

* use level filters instead
2024-09-25 06:19:20 +00:00
Wang, Yi
3519398a14 hotfix: ipex fails since cuda moe kernel is not supported (#2532)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-09-25 06:19:20 +00:00
Daniël de Kok
b6ef2bfc1b doc: clarify that --quantize is not needed for pre-quantized models (#2536) 2024-09-25 06:19:20 +00:00
Daniël de Kok
c1a99e2f15 Update to moe-kenels 0.3.1 (#2535)
* Update to moe-kenels 0.3.1

* Attempt to fix apt failure
2024-09-25 06:19:20 +00:00
Nicolas Patry
2d470c8282 Stream options. (#2533)
* Stream options.

* Fetch stuff from nix integration test for easier testing.

* Adding the assert.

* Only send the usage when asked for.

* Update the docs.

* Impure test because we need network.

* develop.

* Optional usage.

* Fixes.

* Workflow
2024-09-25 06:19:20 +00:00
Daniël de Kok
29a93b78ba Move to moe-kernels package and switch to common MoE layer (#2511)
* Move to moe-kernels package and switch to common MoE layer

This change introduces the new `moe-kernels` package:

- Add `moe-kernels` as a dependency.
- Introduce a `SparseMoELayer` module that can be used by MoE
  models.
- Port over Mixtral and Deepseek.

* Make `cargo check` pass

* Update runner
2024-09-25 06:18:05 +00:00
OlivierDehaene
88b72c8eb3 fix: metrics unbounded memory (#2528) 2024-09-25 06:17:09 +00:00
Daniël de Kok
0ecbd61099 nix: pure Rust check/fmt/clippy/test (#2525)
Runs the tests in a Nix build sandbox.
2024-09-25 06:17:09 +00:00
Nicolas Patry
0110b83aff Adding a test for FD. (#2516)
* Adding a test for FD.

* Fixing flashdecoding (empty batch doesn't work).

* Fixing the invalid popping.

* Fixing radix with block_size > 1

* Last reference.

* Use an actual hash.

* Update hash for slice.len() == 1

* Update the locks.

* Increasing docker timeout.
2024-09-25 06:17:09 +00:00
Daniël de Kok
e8c329372b Add tests for Mixtral (#2520)
Disable by default because CI runners do not have enough GPUs.
2024-09-25 06:16:08 +00:00
Alex Strick van Linschoten
afe5cae8fc Use ratatui not (deprecated) tui (#2521)
* use ratatui not archived tui

* bump ratatui all the way with options
2024-09-25 06:16:07 +00:00
Wang, Yi
cbfe9e5185 hotfix : enable intel ipex cpu and xpu in python3.11 (#2517)
enable intel ipex cpu and xpu in python3.11

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-09-25 06:15:35 +00:00
drbh
5fc0e0c589 fix: pass missing revision arg for lora adapter when loading multiple… (#2510)
fix: pass missing revision arg for lora adapter when loading multiple adapters
2024-09-25 06:15:35 +00:00
Nicolas Patry
7d897188d5 Add nix test. (#2513)
* Add nix test.

* Modifying yourself means you need to rerun.

* Fixing the test + adding click (needed for pre-commit hooks).

* Try thuis.

* Our runner + pure test (not written)

* Reemove server.

* Root user.

* Different user ?

* Add the actual test target.

* Forgot this modification.

* Add a formatter.

* Add the secrets.

* Fixed the auth token ?

* Adding the other tests.

* Missing pre-commit.

* Test requires cargo for cargo fmt.

* Update it a bit.

* Up.

* Attempting to use a cache location for the models.

* Ignore the cache for now.
2024-09-25 06:15:35 +00:00
Daniël de Kok
7be7ab7015 nix: support Python tokenizer conversion in the router (#2515)
Ideally we wouldn't have the router wrapper that this change adds,
but when I give PyO3 a Python interpreter with packages, it ends
up linking libpython from the Python interpreter rather than the
constructed environment and cannot pick up the Python modules as
a result.
2024-09-25 06:15:35 +00:00
Nicolas Patry
f32fa568b6 Fix truffle (#2514)
* Attempting to discard the trufflehog warning.

* Attempt to fix trufflehog.
2024-09-25 06:15:35 +00:00
Nicolas Patry
c6b568b892 Fix tokenization yi (#2507)
* Fixing odd tokenization self modifications on the Rust side (load and
resave in Python).

* Fixing the builds ?

* Fix the gh action?

* Fixing the location ?

* Validation is odd.

* Try a faster runner

* Upgrade python version.

* Remove sccache

* No sccache.

* Getting libpython maybe ?

* List stuff.

* Monkey it up.

* have no idea at this point

* Tmp.

* Shot in the dark.

* Tmate the hell out of this.

* Desperation.

* WTF.

* -y.

* Apparently 3.10 is not available anymore.

* Updating the dockerfile to make libpython discoverable at runtime too.

* Put back rust tests.

* Why do we want mkl on AMD ?

* Forcing 3.11 ?
2024-09-25 06:15:35 +00:00
Nicolas Patry
510d1c76c8 Prefix test - Different kind of load test to trigger prefix test bugs. (#2490)
* Adding prefix test.

* [WIP] tmp dump of integration load tests.

* Remove other tensor creation.

* Fixed the radix tree.

Used a slice everywhere in radix.rs to keep the cheap Arc cloning
instead of recomputing the input_ids.

* Fix parsing

* Is it really flashinfer version ?

* Remove some comments.

* Revert the max prefix hit.

* Adding numpy to diff.

* Upgraded flashinfer.

* Upgrading some stuff.

* Are we done yet ?

* Minor fixup

* Remove 1 log and put back the other.

* Add comment for why slot 0 is OK.

* Mounting on the job.

* Get me a debug branch

* Debugging CIs is fun.

* Attempt #28

* wip

* Tmate.

* Praying.

* Updating VLM causal model with updated context.

* Important line got squashed.

* Tmate again.

* Fingers crossed.

* We want only 1 run of integration tests.....

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2024-09-25 06:14:07 +00:00
Vallepu Vamsi Krishna
b67a0cd37b Add Directory Check to Prevent Redundant Cloning in Build Process (#2486)
Update Makefile-fbgemm

Added Directory check for FBGEMM repository cloning.
2024-09-25 06:14:07 +00:00
Nicolas Patry
eb54d956ef Fixing more correctly the invalid drop of the batch. (#2498) 2024-09-25 06:14:07 +00:00
Martin Iglesias Goyanes
7c2ed55b2e Add links to Adyen blogpost (#2500)
* Add links to Adyen blogpost

* Adding to toctree.

* Update external.md

* Update _toctree.yml

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-09-25 06:14:07 +00:00