text-generation-inference/integration-tests/models
Daniël de Kok 093a27c528
Add support for GPTQ Marlin (#2052)
Add support for GPTQ Marlin kernels

GPTQ Marlin extends the Marlin kernels to support common GPTQ
configurations:

- bits: 4 or 8
- groupsize: -1, 32, 64, or 128
- desc_act: true/false

Using the GPTQ Marlin kernels requires repacking the parameters in the
Marlin quantizer format.

The kernels were contributed by Neural Magic to VLLM. We vendor them
here for convenience.
2024-06-14 09:45:42 +02:00
..
__snapshots__ Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
test_bloom_560m_sharded.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_bloom_560m.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_chat_llama.py Fix seeded output. (#1949) 2024-05-24 15:36:13 +02:00
test_completion_prompts.py feat: accept list as prompt and use first string (#1702) 2024-04-17 10:41:12 +02:00
test_flash_awq_sharded.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_awq.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_falcon.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_gemma_gptq.py Gemma GPTQ checks: skip logprob checks 2024-05-30 11:28:05 +02:00
test_flash_gemma.py Fix (flash) Gemma prefix and enable tests 2024-05-27 09:58:06 +02:00
test_flash_gpt2.py Add GPT-2 with flash attention (#1889) 2024-05-15 13:31:22 +02:00
test_flash_grammar_llama.py fix: correctly index into mask when applying grammar (#1618) 2024-03-01 18:22:01 +01:00
test_flash_llama_exl2.py Add support for exl2 quantization 2024-05-30 11:28:05 +02:00
test_flash_llama_gptq_marlin.py Add support for GPTQ Marlin (#2052) 2024-06-14 09:45:42 +02:00
test_flash_llama_gptq.py feat: add cuda memory fraction (#659) 2023-07-24 11:43:58 +02:00
test_flash_llama_marlin.py Add support for Marlin-quantized models 2024-06-06 13:16:52 +02:00
test_flash_llama.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_medusa.py Revamp medusa implementation so that every model can benefit. (#1588) 2024-02-26 19:49:28 +01:00
test_flash_mistral.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_neox_sharded.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_neox.py feat(server): add paged attention to flash models (#516) 2023-06-30 19:09:59 +02:00
test_flash_pali_gemma.py Pali gemma modeling (#1895) 2024-05-16 06:58:47 +02:00
test_flash_phi.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_qwen2.py feat: Qwen2 (#1608) 2024-02-28 15:50:31 +01:00
test_flash_santacoder.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_flash_starcoder2.py feat: starcoder2 (#1605) 2024-02-28 12:07:08 +01:00
test_flash_starcoder_gptq.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_flash_starcoder.py feat(server): only compute prefill logprobs when asked (#406) 2023-06-02 17:12:30 +02:00
test_grammar_llama.py fix: correctly index into mask when applying grammar (#1618) 2024-03-01 18:22:01 +01:00
test_grammar_response_format_llama.py Support chat response format (#2046) 2024-06-11 10:44:56 -04:00
test_idefics2.py Idefics2. (#1756) 2024-04-23 23:04:44 +02:00
test_idefics.py Adding Llava-Next (Llava 1.6) with full support. (#1709) 2024-04-09 21:32:00 +02:00
test_llava_next.py Adding Llava-Next (Llava 1.6) with full support. (#1709) 2024-04-09 21:32:00 +02:00
test_mamba.py fix(router): fix openapi and add jsonschema validation (#1578) 2024-02-21 11:05:32 +01:00
test_mpt.py feat(server): Add Non flash MPT. (#514) 2023-07-03 13:01:46 +02:00
test_mt0_base.py Adding Llava-Next (Llava 1.6) with full support. (#1709) 2024-04-09 21:32:00 +02:00
test_neox_sharded.py feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_neox.py feat(server): Rework model loading (#344) 2023-06-08 14:51:52 +02:00
test_t5_sharded.py Improve the defaults for the launcher (#1727) 2024-04-12 14:20:31 +02:00
test_tools_llama.py feat: improve tools to include name and add tests (#1693) 2024-04-16 09:02:46 -04:00