text-generation-inference/server/text_generation_server/models
drbh bdc47394d2 feat: support phi3.5 moe (#2479)
* feat: support phi3.5 moe model loading

* fix: prefer llama base model and improve rotary logic

* feat: return reasonable generation and add integration test

* fix: run lint and update docs

* fix: rerun lint for openapi docs

* fix: prefer do_sample false unless temp is set by user, and update chat tests

* fix: small typo adjustments

* fix: consolidate long rope paths

* fix: revert greedy by default and test changes

* Vendor configuration so that we don't have to `trust_remote_code`

* Use SparseMoELayer

* Add support for dense MoE

* Some type annotations

* Add the usual model tests

* Ruff.

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-10-25 09:12:03 +00:00
..
custom_modeling feat: support phi3.5 moe (#2479) 2024-10-25 09:12:03 +00:00
__init__.py Add missing import package 2024-10-25 08:52:24 +00:00
bloom.py Make Gaudi adapt to the tgi 2.3.0 2024-09-26 06:04:55 +00:00
causal_lm.py Simplify the warmup 2024-10-25 08:38:59 +00:00
flash_causal_lm.py Update ROCM libs and improvements (#2579) 2024-10-25 09:01:04 +00:00
galactica.py feat: add ruff and resolve issue (#2262) 2024-09-25 05:46:24 +00:00
globals.py Make Gaudi adapt to the tgi 2.3.0 2024-09-26 06:04:55 +00:00
idefics_causal_lm.py Upgrading exl2. (#2415) 2024-09-25 06:07:40 +00:00
idefics.py Upgrading exl2. (#2415) 2024-09-25 06:07:40 +00:00
mamba.py Fixing exl2 and other quanize tests again. (#2419) 2024-09-25 06:08:38 +00:00
model.py Pass the max_batch_total_tokens to causal_lm 2024-10-23 08:28:26 +00:00
pali_gemma.py feat: add ruff and resolve issue (#2262) 2024-09-25 05:46:24 +00:00
seq2seq_lm.py Fixing exl2 and other quanize tests again. (#2419) 2024-09-25 06:08:38 +00:00
starcoder.py Make Gaudi adapt to the tgi 2.3.0 2024-09-26 06:04:55 +00:00
types.py feat: add ruff and resolve issue (#2262) 2024-09-25 05:46:24 +00:00
vlm_causal_lm.py Merge branch 'habana-main' into 2.3.0 2024-10-23 16:32:12 +08:00