mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-10 15:35:24 +00:00

History

Daniël de Kok 47447ef017 Unify attention output handling (#2343 ) - Always return the hidden states. - Create the output tensor inside the `attention` and `paged_attention` functions. This removes the difference between how the output is handled between attention (output parameter) and paged attention (return value). This also removes the assumption that the attention implementation can write to an output tensor (in preparation of FlashInfer).		2024-08-01 17:03:28 +02:00
..
custom_kernels	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
exllama_kernels	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
exllamav2_kernels	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
tests	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
text_generation_server	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
Makefile	hotfix: update nccl	2024-07-23 23:31:28 +02:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Upgrade EETQ (Fixes the cuda graphs). (#1729 )	2024-04-12 08:15:28 +02:00
Makefile-fbgemm	chore: update to torch 2.4 (#2259 )	2024-07-23 20:39:43 +00:00
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2	Softcapping for gemma2. (#2273 )	2024-07-22 18:27:10 +02:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	Add support for Deepseek V2 (#2224 )	2024-07-19 17:23:20 +02:00
poetry.lock	Install Marlin from standalone package (#2320 )	2024-07-29 15:37:10 +02:00
pyproject.toml	Install Marlin from standalone package (#2320 )	2024-07-29 15:37:10 +02:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
requirements_cuda.txt	hotfix: pin numpy (#2289 )	2024-07-23 17:53:19 +02:00
requirements_intel.txt	hotfix: pin numpy (#2289 )	2024-07-23 17:53:19 +02:00
requirements_rocm.txt	hotfix: pin numpy (#2289 )	2024-07-23 17:53:19 +02:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev