Commit Graph

  • 9a79c2f867 feat: support logprobs in streaming and non streaming chat drbh 2024-01-09 14:04:31 -0500
  • 65c913b55d feat: support FinishReason in streaming and non streaming chat drbh 2024-01-09 13:47:54 -0500
  • 8c4ab53780 feat: support repetition_penalty and improve non stream response drbh 2024-01-09 13:31:15 -0500
  • fba1953eb6 fix: add prompt_token_count to InferResponse for chat completions drbh 2024-01-09 13:04:29 -0500
  • adad67e3d0 fix: prefer apply_chat_template logic in HubTokenizerConfig struct drbh 2024-01-09 12:27:01 -0500
  • 65db02f192 fix: use TORCH_NCCL_AVOID_RECORD_STREAMS=1 fix/avoid_record_streams OlivierDehaene 2024-01-09 17:59:16 +0100
  • 446b3b6af7 fix: prefer index on StreamResponse drbh 2024-01-09 11:59:11 -0500
  • f82ff3f64a fix: adds index, model id, system fingerprint and updates do_sample param drbh 2024-01-09 11:54:20 -0500
  • 91d7267534
    Fix missing make target platform for local install: 'install-flash-attention-v2' (#1414) R. P. Ruiz 2024-01-09 10:19:31 -0500
  • 55605a1247 remove log OlivierDehaene 2024-01-09 15:35:34 +0100
  • 3d082ccba4 fix: follow base model for tokenizer in router OlivierDehaene 2024-01-09 15:28:05 +0100
  • 564f2a3b75
    fix: fix local loading for .bin models (#1419) OlivierDehaene 2024-01-09 15:21:00 +0100
  • 3f9b3f4539 docs: update required CUDA version to 12.2 OlivierDehaene 2024-01-09 14:28:55 +0100
  • 7ffe9023da Fix local load for Medusa PYNing 2024-01-09 19:42:03 +0800
  • 84aa6ff5aa fix: fix local loading for .bin models OlivierDehaene 2024-01-09 11:48:57 +0100
  • ddf7412a6b fix: remove ChatTemplateError and add index to stream messages drbh 2024-01-08 08:52:01 -0500
  • 716fe00d92 Fix missing make target: https://github.com/huggingface/text-generation-inference/issues/1397 deepily 2024-01-05 17:29:35 -0500
  • 3ae9cd655d feat: supports openai chat completions API drbh 2024-01-05 15:33:42 -0500
  • 2358a35485 feat: add mocked http request tests drbh 2024-01-03 16:13:50 -0500
  • 252ccde104
    Control prefill and decode batch size separately (#6) Karol Damaszke 2024-01-02 18:21:01 +0100
  • 9ce069445e removing personal git workflow Łukasz Olszewski 2024-01-02 17:55:46 +0100
  • c46dd7e78b restoring original README Łukasz Olszewski 2024-01-02 17:53:54 +0100
  • ad7f839673 fix vllm import error Zeyu Li 2023-12-30 14:26:37 +0800
  • 74d9dfa89e Fix incorrect use of bias in awq chenxichen 2023-12-27 03:25:47 +0000
  • 76590818a3 fixing ram exhaustion during build issue Łukasz Olszewski 2023-12-23 13:19:25 +0100
  • 1be2d9a8ec
    Batch size bucketing (#5) Karol Damaszke 2023-12-22 21:53:01 +0100
  • 43277c6c6a fixing requirements Łukasz Olszewski 2023-12-22 16:00:28 +0100
  • ca49490e07
    Update docker-image.yml Lukasz Olszewski 2023-12-22 15:56:27 +0100
  • 91dfb2272a
    Create docker-image.yml Lukasz Olszewski 2023-12-22 15:49:28 +0100
  • 630800eed3 v1.3.4 v1.3.4 OlivierDehaene 2023-12-22 15:46:04 +0100
  • d158e75435 updating benchmark.rc Łukasz Olszewski 2023-12-22 15:45:50 +0100
  • b223ac70b6
    Merge branch 'huggingface:main' into main Lukasz Olszewski 2023-12-22 14:38:26 +0100
  • 6e43e80b50 experimental new features Łukasz Olszewski 2023-12-22 14:36:13 +0100
  • e3dcd7f2c2
    Disable tensor caching in HPU Graph execution (#4) jkaniecki 2023-12-22 13:51:16 +0100
  • d84b38e30d adding guidance and extra parameters for token bias Łukasz Olszewski 2023-12-22 11:14:06 +0100
  • 529d7c2591
    Fix local load for peft (#1373) Nicolas Patry 2023-12-21 17:29:23 +0100
  • fad3a40102 Updating hub test. Nicolas Patry 2023-12-21 16:25:34 +0000
  • 83f81a6b89 Fix local load for peft Nicolas Patry 2023-12-21 15:35:15 +0000
  • 564199bab3
    feat: update exllamav2 kernels (#1370) OlivierDehaene 2023-12-21 17:25:22 +0100
  • d3b5ae27b0 Fix santacoder. Nicolas Patry 2023-12-21 15:38:40 +0000
  • 9f42e5f6fd Preventing using exllama when act_order=True Nicolas Patry 2023-12-21 15:05:05 +0000
  • 238cc311f1 back to v2 by def OlivierDehaene 2023-12-21 15:46:07 +0100
  • 96a520ec78 remove v2 for now OlivierDehaene 2023-12-21 15:24:41 +0100
  • 672f290901 fmt OlivierDehaene 2023-12-21 11:35:06 +0100
  • 38df4c1d67 feat: update exllamav2 kernels OlivierDehaene 2023-12-21 11:32:55 +0100
  • 987c959f73
    docs: Change URL for Habana Gaudi support in doc (#1343) regisss 2023-12-21 11:05:35 +0100
  • 1108560745 fixing requirements Łukasz Olszewski 2023-12-21 10:48:08 +0100
  • eb8923a97e
    Peft safetensors. (#1364) Nicolas Patry 2023-12-20 15:37:14 +0100
  • e749d0cc5a Adding CFG (context free grammar) to TGI Łukasz Olszewski 2023-12-20 12:42:56 +0100
  • d5db3433c8 Peft safetensors. Nicolas Patry 2023-12-20 08:20:25 +0000
  • c4c799137f
    Update ldcache to include libcuda.so during docker build Blair Johnson 2023-12-19 01:51:10 -0500
  • d077150eb7
    fix: fix gpt-q with groupsize = -1 (#1358) OlivierDehaene 2023-12-18 16:07:05 +0100
  • ff3a79f272 fix: fix gpt-q with groupsize = -1 OlivierDehaene 2023-12-18 12:42:00 +0100
  • 8428ed1011
    fix: fix offline (#1341) (#1347) OlivierDehaene 2023-12-18 10:20:08 +0100
  • 1b1bfa49b0
    fix: fix logic if sliding window key is not present in config (#1352) OlivierDehaene 2023-12-15 14:56:17 +0100
  • 50bfdfc003 fix: fix logic if sliding window key is not present in config OlivierDehaene 2023-12-15 14:15:40 +0100
  • 9b56d3fbf5
    feat: relax mistral requirements (#1351) OlivierDehaene 2023-12-15 12:52:24 +0100
  • d5b7e6e38f
    Reuse the same function to list local weights everywhere Raphael Glon 2023-12-15 12:48:05 +0100
  • 29f87920a8
    Adapt unit tests to commit 28821bf Raphael Glon 2023-12-14 16:42:52 +0100
  • 492c95dcbf
    Text generation inference, fix offline Raphael Glon 2023-12-13 14:26:45 +0100
  • 5b6367f87c fix imports OlivierDehaene 2023-12-15 11:35:31 +0100
  • 68990a5635 feat: relax mistral requirements OlivierDehaene 2023-12-15 11:15:49 +0100
  • 16c6f2a893 Update README for proper usage of LIMIT_HPU_GRAPH Harish Subramony 2023-12-14 23:34:10 -0800
  • f3aea78fb6 v1.3.3 v1.3.3 OlivierDehaene 2023-12-15 01:20:42 +0100
  • 37555cf4e8
    fix: max_past default value must be -1, not 0 (#1348) OlivierDehaene 2023-12-15 01:18:39 +0100
  • 7bce6032a8 stronger parameter validation OlivierDehaene 2023-12-15 00:38:27 +0100
  • f75bbbcc63 fix: max_past default value must be -1, not 0 OlivierDehaene 2023-12-15 00:10:58 +0100
  • 9b78a6eee3 fix: only keep stop sequence buffer if we have some OlivierDehaene 2023-12-14 17:04:58 +0100
  • 80a69204c1 fix: slice stopping criteria buffer OlivierDehaene 2023-12-14 17:01:43 +0100
  • 083c2de9f8 fix: fix quant linear autotune OlivierDehaene 2023-12-14 16:45:47 +0100
  • 773aabdda6 fix: fix triton OutOfResources import OlivierDehaene 2023-12-14 16:04:26 +0100
  • 50b495f3d8
    feat: add more latency metrics in forward (#1346) OlivierDehaene 2023-12-14 15:59:38 +0100
  • 4b0bd2d7c3 fix tests validation OlivierDehaene 2023-12-14 15:38:20 +0100
  • 1e1408054b fix validation OlivierDehaene 2023-12-14 15:02:22 +0100
  • 6f9366556a fix tests OlivierDehaene 2023-12-14 14:46:44 +0100
  • 248eda7b20 fix decode timing OlivierDehaene 2023-12-14 12:41:10 +0100
  • 701dd7da67 feat: add more latency metrics in forward OlivierDehaene 2023-12-14 12:13:26 +0100
  • 44b267ab22 fix: fix gpt-q params loading OlivierDehaene 2023-12-14 11:02:16 +0100
  • 2aa262dbe5 Change URL for Habana Gaudi support in doc regisss 2023-12-13 19:05:28 +0100
  • b0b76ce711
    Text generation inference, fix offline Raphael Glon 2023-12-13 14:26:45 +0100
  • 28821bfd5d fix: default max_new_tokens to 100 OlivierDehaene 2023-12-13 09:19:19 +0100
  • 88aae2595d v1.3.2 v1.3.2 OlivierDehaene 2023-12-12 18:10:22 +0100
  • 82670d9786
    feat: add quant to mixtral (#1337) OlivierDehaene 2023-12-12 17:55:03 +0100
  • 378986e30e feat: add quant to mixtral OlivierDehaene 2023-12-12 17:15:49 +0100
  • 4353423102 Modify default for max_new_tokens in python client freitng 2023-12-12 15:24:01 +0100
  • ec6d4592d5 v1.3.1 v1.3.1 OlivierDehaene 2023-12-11 16:46:44 +0100
  • d0841cc8eb v1.3.0 v1.3.0 OlivierDehaene 2023-12-11 14:55:03 +0100
  • 72ee382ded chore: formatting OlivierDehaene 2023-12-11 14:49:52 +0100
  • 3a521c92b3
    feat: mixtral (#1328) OlivierDehaene 2023-12-11 14:43:40 +0100
  • 6c2ac3b5fb support h100 OlivierDehaene 2023-12-11 13:40:50 +0100
  • 008733313c fix megablocks install OlivierDehaene 2023-12-11 13:34:51 +0100
  • d0aff8e5e2 copy correct conda env OlivierDehaene 2023-12-11 12:54:44 +0100
  • e55870b03e rebase OlivierDehaene 2023-12-11 12:53:33 +0100
  • 66238a1c94 update megablocks commit OlivierDehaene 2023-12-11 12:50:46 +0100
  • fdd8577bcc add git OlivierDehaene 2023-12-11 11:49:25 +0100
  • b5448af381 move install megablocks to its own command OlivierDehaene 2023-12-11 11:20:11 +0100
  • 5799e5cae9 transformers format OlivierDehaene 2023-12-11 11:14:40 +0100
  • e69eed8ea3 remove a tad of cpu bottleneck OlivierDehaene 2023-12-11 10:32:13 +0100
  • af1989459c wip OlivierDehaene 2023-12-10 10:38:23 +0100
  • 9ecfa16b12
    Speculative (#1308) Nicolas Patry 2023-12-11 12:46:30 +0100