baptiste
e2dba5c0ad
feat(upstream): add depreaction message for the tgi-gaudi fork due to upstream of gaudi
2025-03-10 10:43:09 +00:00
Yuan Wu
aba419a0cc
Fix crash issue of llava-next fp8 ( #286 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-03-07 10:31:58 +01:00
Yuan Wu
cd57fea11b
Fix Llava next crash issue ( #285 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-03-06 10:12:21 +01:00
Yuan Wu
20ea73c6d4
Fix mistralai/Mistral-7B-Instruct failed issue ( #284 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-03-05 17:01:23 +01:00
Yuan Wu
c35810d6f0
Fix the loading issue of 90B ( #283 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-02-28 11:20:55 +01:00
Yuan Wu
1d3a4ab851
Enable mllama ( #272 )
...
Signed-off-by: Yuan Wu <yuan.wu@intel.com>
2025-02-27 16:12:15 +01:00
Tomasz Thaddey
17f0d57581
Unpin rustrc version and set it to 'stable' ( #269 )
2025-02-13 10:49:09 +01:00
kaixuanliu
b52164d38a
Complete padding of CausalLMBatch
when there exists batch bucketing ( #261 )
...
Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
2025-01-30 10:19:13 +01:00
Yuan Wu
fe7594e369
Fix the warmup issue of prefill batch_size ( #268 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-01-23 17:26:17 +01:00
Yuan Wu
63c64bb307
Use the default value in globals.py ( #265 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-01-21 10:10:23 +01:00
Karol Damaszke
8de110ae9f
Fix warmup with SKIP_TOKENIZER_IN_TGI=true ( #266 )
2025-01-21 10:09:49 +01:00
Yuan Wu
7d106477d6
Fix router input validation for SKIP_TOKENIZER_IN_TGI=true ( #267 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-01-21 10:08:53 +01:00
Yuan Wu
6d6acca5eb
Update the ReadME for 2.3.1 ( #260 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-01-03 10:55:14 +01:00
Yuan Wu
46b556805b
Upgrade to SynapseAI 1.19 ( #259 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-26 17:33:24 +01:00
regisss
5291f652a1
Merge pull request #225 from yuanwu2017/2.3.0
2024-12-19 11:42:59 -06:00
yuanwu
8e2e5d8e15
Fix benchmark build error
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-17 05:38:10 +00:00
yuanwu
eaeef6e7a4
Remove the useless modifications
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-17 02:08:12 +00:00
yuanwu
15de6c9195
Merge branch 'habana-main' into 2.3.0
2024-12-17 02:06:22 +00:00
Sun Choi
61309b2832
Remove the default max_tokens for /v1/chat/completions ( #251 )
2024-12-16 09:32:57 +01:00
Sun Choi
cc2ca4ac22
HF_TOKEN replaces HUGGING_FACE_HUB_TOKEN as it is deprecated ( #253 )
2024-12-15 09:59:58 +01:00
yuanwu
c3b8899f10
Revert "Use optimum-habana v1.15-release branch"
...
This reverts commit c6f023a06b
.
2024-12-11 08:17:17 +00:00
yuanwu
c922ef9534
Fix the warmup issue of llama2-7B.
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-09 07:20:48 +00:00
yuanwu
c6f023a06b
Use optimum-habana v1.15-release branch
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-08 13:02:31 +00:00
yuanwu
1b659788b5
Add the no-deps in pip install
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-08 12:14:38 +00:00
yuanwu
73e6e3b871
Remove the error log
...
Subsequent updates will remove these codes
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-08 11:55:13 +00:00
yuanwu
9f356ce045
Refine the warmup process
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-07 09:56:16 +00:00
yuanwu
253a992447
Remove the CI workflows we don't currently support
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-02 08:45:36 +00:00
yuanwu
0228bd0260
Doesn't run the prefill warmup when limit_hpu_graph=true
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-01 21:29:41 +00:00
yuanwu
4586325a34
Fix the starCode warmup issue
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-12-01 06:14:00 +00:00
Yuan Wu
b83419a769
Merge branch 'habana-main' into 2.3.0
2024-11-28 12:38:36 +08:00
yuanwu
636cdb4c43
Fix startcode issue
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-11-26 08:55:42 +00:00
srajabos
d49ce00f40
With this change, bucketing/padding of input is applied to health check. ( #245 )
2024-11-18 22:38:30 +01:00
yuanwu2017
56c3eb4adb
Remove the torch package in requirements.txt ( #246 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-11-07 09:22:24 -08:00
yuanwu2017
c345c734a7
Merge branch 'habana-main' into 2.3.0
2024-11-01 11:24:40 +08:00
yuanwu
fcf2e3a338
Fix the prefill warmup issue
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-11-01 05:08:52 +02:00
Thanaji Rao Thakkalapelli
6ba3d1d6e5
updated release docker image version in readme to 2.0.6 ( #242 )
2024-10-31 15:44:16 -07:00
yuanwu2017
8d84ffabf2
Upgrade to SynapseAI 1.18 ( #227 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
2024-10-31 20:14:44 +01:00
Thanaji Rao Thakkalapelli
7fb4af9a87
updated supported models list table in readme ( #241 )
...
* updated supported models list table in readme
* updated read me
* updated read me
2024-10-29 23:28:45 -07:00
yuanwu
4c9856f9e5
Add missing package
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-10-28 07:04:56 +00:00
yuanwu2017
c23584f626
Merge branch 'habana-main' into 2.3.0
2024-10-28 04:37:07 +08:00
yuanwu
372e071135
Fix the issues of tgi-gaudi for v.2.3.1
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-10-27 20:40:36 +00:00
Nicolas Patry
7e282b4153
V2.3.1
2024-10-27 04:14:35 +00:00
Nicolas Patry
34e98b14ef
New release 2.3.1 ( #2604 )
...
* New release 2.3.1
* Update doc number
2024-10-27 04:14:35 +00:00
drbh
902f526d69
Unroll notify error into generate response ( #2597 )
...
* feat: unroll notify_error if no tool is choosen
* fix: expect simple message when no tool is selected
* fix: improve test to avoid notify_error
* fix: improve docs and indicate change in expected response
* fix: adjust linting in test file
2024-10-27 04:03:57 +00:00
drbh
7664d2e2b3
CI (2592): Allow LoRA adapter revision in server launcher ( #2602 )
...
allow revision for lora adapters from launcher
Co-authored-by: Sida <sida@kulamind.com>
Co-authored-by: teamclouday <teamclouday@gmail.com>
2024-10-27 04:03:57 +00:00
Nicolas Patry
967e67111d
Max token capacity metric ( #2595 )
...
* adding max_token_capacity_metric
* added tgi to name of metric
* Adding max capacity metric.
* Add description for the metrics
---------
Co-authored-by: Edwinhr716 <Edandres249@gmail.com>
2024-10-27 04:03:57 +00:00
Nicolas Patry
51506aa57a
Mllama flash version ( #2585 )
...
* Working loading state.
* Preprocessing.
* Working state ? (Broke idefics1 temporarily).
* Cleaner condition.
* Fix idefics.
* Updating config, removing TODO
* Mllama
* Ugrade transformers 4.45
* Flashing mllama.
* Starting to get there.
* Working state.
* Integrations tests for mllama (cutting to 10 tokens because there seems'
to be instability after (meaning size of the batch matters.
* Updating model link.
* Earlier assert.
* Fix vlm ?
* remove log.
* Force ignore all images but last.
* Default dtype bfloat16.
* Update integration test after switch to bf16.
* Remove dead code.
* Removed dead code.
* Upgrade the flake to latest transformers/tokenizers
* Move to hf tgi-nix
* Upgrade to 0.5.0
2024-10-27 04:03:57 +00:00
Daniël de Kok
fa964f82d3
nix: experimental support for building a Docker container ( #2470 )
...
* nix: experimental support for building a Docker image
Run using something like:
```
docker run \
--device nvidia.com/gpu=all \
-it --rm -p 8080:80 \
-v $PWD/data:/data \
-v $PWD/tmp:/tmp \
tgi-docker:latest \
--model-id <model_id>
```
* Example of building the Docker image using Nix inside Docker
* Stream to make the builder image smaller
This avoids storing a Docker image tarball in the image. Instead,
stream the layers while doing `docker run`.
* Don't spam journalctl on Linux
* Other dockerfile.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-10-25 09:12:03 +00:00
Daniël de Kok
775e5f4c64
MoE Marlin: support desc_act
for groupsize != -1
( #2590 )
...
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
2024-10-25 09:12:03 +00:00
Daniël de Kok
692f8ddb69
Move flake back to tgi-nix main
( #2586 )
2024-10-25 09:12:03 +00:00