text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 23:15:23 +00:00

History

drbh 51a4e62ed4 Impl simple mamba model (#1480 ) This draft PR is a work in progress implementation of the mamba model. This PR currently loads weights, and produces correct logits after a single pass. This PR still needs to correctly integrate this model so it produces tokens as expected, and apply optimization to avoid all copies during runtime/unnecessary operations. [Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Albert Gu and Tri Dao)](https://arxiv.org/abs/2312.00752) https://github.com/johnma2006/mamba-minimal https://github.com/huggingface/candle/blob/main/candle-examples/examples/mamba-minimal/model.rs https://github.com/huggingface/transformers/pull/28094 Notes: this dev work is currently targeting `state-spaces/mamba-130m`, so if you want to test please use that model. Additionally when starting the router the prefill needs to be limited: `cargo run -- --max-batch-prefill-tokens 768 --max-input-length 768` Integration tests have been added and basic functionality such as model loading is supported. ```bash cd integration-tests pytest -vv models/test_fused_kernel_mamba.py ``` - [x] add tests - [x] load model - [x] make simple request - [ ] resolve warmup issue - [ ] resolve output issues fetching models tested during dev ```bash text-generation-server download-weights state-spaces/mamba-130m text-generation-server download-weights state-spaces/mamba-1.4b text-generation-server download-weights state-spaces/mamba-2.8b ``` The server can be run ```bash cd server MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 python text_generation_server/cli.py serve state-spaces/mamba-2.8b ``` router ```bash cargo run ``` make a request ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' \| jq ``` response ```json { "generated_text": "\n\nDeep learning is a machine learning technique that uses a deep neural network to learn from data." } ``` --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>		2024-04-23 11:45:11 +03:00
..
custom_modeling	Impl simple mamba model (#1480 )	2024-04-23 11:45:11 +03:00
__init__.py	chore: formatting	2024-04-18 16:26:00 +03:00
bloom.py	Add Habana copyright header (#122 )	2024-04-08 18:06:21 +02:00
cache_manager.py	feat: add mistral model (#1071 )	2023-09-28 09:55:47 +02:00
causal_lm.py	Fixing top_n_tokens. (#1497 )	2024-04-23 08:49:24 +03:00
flash_causal_lm.py	Fixing top_n_tokens. (#1497 )	2024-04-23 08:49:24 +03:00
flash_llama.py	v1.4.0 (#1494 )	2024-04-22 15:47:42 +03:00
flash_mistral.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
flash_mixtral.py	chore: formatting	2024-04-18 16:26:00 +03:00
flash_neox.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
flash_phi.py	v1.4.0 (#1494 )	2024-04-22 15:47:42 +03:00
flash_rw.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
flash_santacoder.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
galactica.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
gpt_neox.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
idefics_causal_lm.py	feat: add more latency metrics in forward (#1346 )	2024-04-19 13:41:34 +03:00
idefics.py	enable bfloat16 for cpu (#1034 )	2023-09-19 17:19:28 +02:00
mamba.py	Impl simple mamba model (#1480 )	2024-04-23 11:45:11 +03:00
model.py	feat: add more latency metrics in forward (#1346 )	2024-04-19 13:41:34 +03:00
mpt.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
opt.py	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
phi.py	v1.4.0 (#1494 )	2024-04-22 15:47:42 +03:00
rw.py	enable bfloat16 for cpu (#1034 )	2023-09-19 17:19:28 +02:00
santacoder.py	Add changes from Optimum Habana's TGI folder	2023-12-05 11:12:16 +01:00
seq2seq_lm.py	Fixing top_n_tokens. (#1497 )	2024-04-23 08:49:24 +03:00
t5.py	enable bfloat16 for cpu (#1034 )	2023-09-19 17:19:28 +02:00
types.py	Fixing top_n_tokens. (#1497 )	2024-04-23 08:49:24 +03:00