text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-10 23:45:23 +00:00

History

Daniël de Kok 093a27c528 Add support for GPTQ Marlin (#2052 ) Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience.		2024-06-14 09:45:42 +02:00
..
__snapshots__	Add support for GPTQ Marlin (#2052 )	2024-06-14 09:45:42 +02:00
test_bloom_560m_sharded.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_bloom_560m.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_chat_llama.py	Fix seeded output. (#1949 )	2024-05-24 15:36:13 +02:00
test_completion_prompts.py	feat: accept list as prompt and use first string (#1702 )	2024-04-17 10:41:12 +02:00
test_flash_awq_sharded.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_awq.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_falcon.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_gemma_gptq.py	Gemma GPTQ checks: skip logprob checks	2024-05-30 11:28:05 +02:00
test_flash_gemma.py	Fix (flash) Gemma prefix and enable tests	2024-05-27 09:58:06 +02:00
test_flash_gpt2.py	Add GPT-2 with flash attention (#1889 )	2024-05-15 13:31:22 +02:00
test_flash_grammar_llama.py	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_flash_llama_exl2.py	Add support for exl2 quantization	2024-05-30 11:28:05 +02:00
test_flash_llama_gptq_marlin.py	Add support for GPTQ Marlin (#2052 )	2024-06-14 09:45:42 +02:00
test_flash_llama_gptq.py	feat: add cuda memory fraction (#659 )	2023-07-24 11:43:58 +02:00
test_flash_llama_marlin.py	Add support for Marlin-quantized models	2024-06-06 13:16:52 +02:00
test_flash_llama.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_medusa.py	Revamp medusa implementation so that every model can benefit. (#1588 )	2024-02-26 19:49:28 +01:00
test_flash_mistral.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_neox_sharded.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_neox.py	feat(server): add paged attention to flash models (#516 )	2023-06-30 19:09:59 +02:00
test_flash_pali_gemma.py	Pali gemma modeling (#1895 )	2024-05-16 06:58:47 +02:00
test_flash_phi.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_qwen2.py	feat: Qwen2 (#1608 )	2024-02-28 15:50:31 +01:00
test_flash_santacoder.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_flash_starcoder2.py	feat: starcoder2 (#1605 )	2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_flash_starcoder.py	feat(server): only compute prefill logprobs when asked (#406 )	2023-06-02 17:12:30 +02:00
test_grammar_llama.py	fix: correctly index into mask when applying grammar (#1618 )	2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama.py	Support chat response format (#2046 )	2024-06-11 10:44:56 -04:00
test_idefics2.py	Idefics2. (#1756 )	2024-04-23 23:04:44 +02:00
test_idefics.py	Adding Llava-Next (Llava 1.6) with full support. (#1709 )	2024-04-09 21:32:00 +02:00
test_llava_next.py	Adding Llava-Next (Llava 1.6) with full support. (#1709 )	2024-04-09 21:32:00 +02:00
test_mamba.py	fix(router): fix openapi and add jsonschema validation (#1578 )	2024-02-21 11:05:32 +01:00
test_mpt.py	feat(server): Add Non flash MPT. (#514 )	2023-07-03 13:01:46 +02:00
test_mt0_base.py	Adding Llava-Next (Llava 1.6) with full support. (#1709 )	2024-04-09 21:32:00 +02:00
test_neox_sharded.py	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_neox.py	feat(server): Rework model loading (#344 )	2023-06-08 14:51:52 +02:00
test_t5_sharded.py	Improve the defaults for the launcher (#1727 )	2024-04-12 14:20:31 +02:00
test_tools_llama.py	feat: improve tools to include name and add tests (#1693 )	2024-04-16 09:02:46 -04:00