Commit Graph

  • 17a4ee92b0 fix: avoid cargo lock tweak drbh 2024-06-14 09:25:41 -0400
  • 210b9cc717 fix: avoid cargo lock changes drbh 2024-06-14 09:20:52 -0400
  • 96b7b40ca3
    Update the link for qwen2 (#2068) Tiezhen WANG 2024-06-14 17:59:33 +0800
  • f5c10d4174 Support different image sizes in prefill in VLMs Daniël de Kok 2024-06-13 15:06:29 +0200
  • 093a27c528
    Add support for GPTQ Marlin (#2052) Daniël de Kok 2024-06-14 09:45:42 +0200
  • af5041a7cf Fix too eager staging Daniël de Kok 2024-06-14 09:40:52 +0200
  • 332e16db26 Fix Qwen2 model URL in model table Daniël de Kok 2024-06-14 09:32:08 +0200
  • f085355fbe
    Update the link for qwen2 Tiezhen WANG 2024-06-14 14:35:23 +0800
  • 7396248379
    Update server/text_generation_server/utils/import_utils.py Wang, Yi 2024-06-14 08:48:39 +0800
  • aa88c4fd3a fix: add lora kernel to dockerfile, support running without kernels and refactors drbh 2024-06-14 00:35:07 +0000
  • f433f1f770
    implement Open Inference Protocol endpoints (#1942) drbh 2024-06-13 12:51:51 -0400
  • 42aa8ee1bb
    PR #2049 CI run (#2054) drbh 2024-06-13 11:53:49 -0400
  • 5d2b93ba42 Fixup residual, initial block attention config feature/phi-3-small Daniël de Kok 2024-06-13 10:38:56 +0200
  • 64182534b6 debug debug-gpt2 Felix Marty 2024-06-13 07:48:18 +0000
  • 31b8cc4386 debug Felix Marty 2024-06-13 07:41:46 +0000
  • 8f1de30b0f debug Felix Marty 2024-06-13 07:31:11 +0000
  • b3e9a13e27 fix idefics2 tests Felix Marty 2024-06-13 07:09:48 +0000
  • 60ee0b5178 Add support for GPTQ Marlin kernels Daniël de Kok 2024-06-12 13:58:36 +0200
  • 98adc45401 fix typo Stefan Daniel Schwarz 2024-06-13 00:46:35 +0200
  • 86b42f5f6d docker-compose Stefan Daniel Schwarz 2024-06-12 23:35:40 +0200
  • abe521204e fix tests OlivierDehaene 2024-06-12 18:54:25 +0200
  • 05eb4dcb17 allocate 16 by 16 OlivierDehaene 2024-06-12 18:53:14 +0200
  • 90184df79c
    fix(layers): fix SuRotaryEmbedding (#2060) OlivierDehaene 2024-06-12 18:24:47 +0200
  • 521de6cacd
    fix(server): fix OPT implementation (#2061) OlivierDehaene 2024-06-12 18:22:20 +0200
  • bbebdffa6a fix(server): fix OPT implementation OlivierDehaene 2024-06-12 18:11:27 +0200
  • 82302262ca remove logs OlivierDehaene 2024-06-12 17:50:53 +0200
  • 9775facbf7 change arange OlivierDehaene 2024-06-12 17:47:46 +0200
  • 9cc16725bf fix(layers): fix SuRotaryEmbedding OlivierDehaene 2024-06-12 17:09:13 +0200
  • 4ed551abba WIP, many bits are still missing... Daniël de Kok 2024-06-12 17:03:55 +0200
  • c0f201c9d3 Factor out sharding of packed tensors Daniël de Kok 2024-06-12 16:20:51 +0200
  • 3bf8e8e466
    Merge pull request #158 from kdamaszk/rebase-tgi-2-0-2 regisss 2024-06-12 15:48:31 +0200
  • 9ac7b7bc52 remove slots from grpc OlivierDehaene 2024-06-12 11:50:31 +0200
  • ed1d28731b add CPU tgi support Wang, Yi A 2024-06-11 17:56:50 -0700
  • 884ebabfd3 fix: cargo fmt lint for pre commit drbh 2024-06-11 18:46:30 +0000
  • c2fb459bc1 fix windowing OlivierDehaene 2024-06-11 18:40:38 +0200
  • 37266e2dbb fix rust and python unit-tests OlivierDehaene 2024-06-11 17:11:16 +0200
  • e6e87a2e26 Use minijinja's pycompat mode for python methods Armin Ronacher 2024-06-11 11:56:05 +0200
  • 376a0b7ada
    Support chat response format (#2046) drbh 2024-06-11 10:44:56 -0400
  • 7c7470542d fix tests fxmarty 2024-06-11 13:40:35 +0000
  • a6e4d63c86
    Update LLMM1 bound (#2050) fxmarty 2024-06-11 13:30:29 +0200
  • 7ee9c1af51 update commit fxmarty 2024-06-11 11:26:04 +0000
  • dadfff621e update fxmarty 2024-06-11 11:25:14 +0000
  • 73c3903214 FlashCausalLM implem OlivierDehaene 2024-06-11 12:38:07 +0200
  • 6983ec9537 small refactor OlivierDehaene 2024-06-10 11:44:50 +0200
  • 713d70b443 re-working logic, wip OlivierDehaene 2024-06-07 13:39:42 +0200
  • 298bf31e69 add terminated_generations OlivierDehaene 2024-06-07 11:26:17 +0200
  • 3c596983ba fix python tests OlivierDehaene 2024-06-06 10:18:26 +0200
  • 51fa606875 fix OlivierDehaene 2024-06-05 21:32:46 +0200
  • 35f27cbcc1 working example OlivierDehaene 2024-06-05 18:47:16 +0200
  • 1cc86930a6 wip OlivierDehaene 2024-06-05 17:01:06 +0200
  • 18e77a5cc7 wip OlivierDehaene 2024-06-05 15:28:10 +0200
  • a4bebdc281 Use minijinja's pycompat mode for python methods Armin Ronacher 2024-06-11 11:56:05 +0200
  • 73b067d193 skip exl2 tests on rocm fxmarty 2024-06-11 09:29:08 +0000
  • b452620c04 fix gptq tests, LLMM1 matrix bound fxmarty 2024-06-11 07:27:14 +0000
  • b0c0be48cf use xpu-smi to dump used memory xpu use "ZE_AFFINITY_MASK" to control card, usage is like CUDA_VISIBLE_DEVICES Wang, Yi A 2024-06-10 19:33:52 -0700
  • 4ce8494ceb fix: add trufflehog lint drbh 2024-06-10 19:16:55 +0000
  • 8c24b1282b fix: adjust typos drbh 2024-06-10 18:59:22 +0000
  • bcf2b29577 feat: support response_format in chat drbh 2024-06-10 18:33:13 +0000
  • dfca1dfc5e
    fix(ci): remove unnecessary permissions (#2045) Luc Georges 2024-06-10 18:16:53 +0200
  • 992d6c63e0
    fix(ci): remove unnecessary permissions Luc Georges 2024-06-10 18:14:22 +0200
  • 4e74ec09a8
    feat(ci): add trufflehog secrets detection (#2038) Luc Georges 2024-06-10 17:54:13 +0200
  • 2dc6e70b1a doc: add architecture to toctree Alvaro Moran 2024-06-10 17:23:28 +0200
  • b5704427fe doc: adding architecture document Alvaro Moran 2024-06-07 16:03:58 +0200
  • d6cf63ca53 Update lora.md Derek 2024-06-10 06:56:37 +0400
  • 1be1ebc438 Update lora.md Derek 2024-06-10 06:53:34 +0400
  • ce40ad26fd fix: add model_id to IdeficsCausalLM drbh 2024-06-07 04:36:32 +0000
  • 101b95adc4 fix: update all models forwards to include adapter_data drbh 2024-06-07 03:58:03 +0000
  • 1deb372564 fix: add adapter_data param to phi and neox drbh 2024-06-07 03:28:15 +0000
  • b1169273fd fix: add adapter_data param and avoid missing layers drbh 2024-06-07 03:03:15 +0000
  • 91f407226d feat: support if vlm models drbh 2024-06-07 02:21:06 +0000
  • a563a93113 fix: rename doc to retry ci build drbh 2024-06-07 01:23:52 +0000
  • 611225f017 feat: support base model generation and refactors drbh 2024-06-07 01:20:41 +0000
  • 43ec9dfe32 feat: bump launcher and add new lora docs drbh 2024-06-06 23:49:07 +0000
  • 81707bfbfa fix: include rust code for adapter id drbh 2024-06-06 23:23:17 +0000
  • 68399c1ae3 feat: prefer model id in request drbh 2024-06-06 23:21:10 +0000
  • de56a81c5c feat: add lora support to mistral and refactors drbh 2024-06-06 22:44:58 +0000
  • 9c45d34983 fix: add model_id to model test drbh 2024-06-06 21:24:29 +0000
  • dc0f76553c fix: pass model_id for all causal and seq2seq lms drbh 2024-06-06 21:13:14 +0000
  • 88bd5c2c92 fix: pass model_id for all flash causal lms drbh 2024-06-06 21:02:03 +0000
  • 73eb2ae255 fix: refactor and move changes to v3 proto drbh 2024-06-06 20:31:27 +0000
  • c927376725 fix: adjust adapter_segments logic when in batch drbh 2024-06-06 18:53:18 +0000
  • ad088d51fa fix: adjust batch for bgmv drbh 2024-06-06 17:45:08 +0000
  • 8984ce6c69 feat: perfer loraxs custom punica kernels and add mlp loras drbh 2024-06-06 15:57:00 +0000
  • d5f21d57d1 fix: prefer adapter_data and refactors drbh 2024-06-06 14:35:59 +0000
  • 8b50f4b779 feat: prefer lorax implementation and port loading logic drbh 2024-06-05 23:56:04 +0000
  • c661631225 feat: baseline impl single request multi lora support drbh 2024-06-04 20:07:28 +0000
  • a046c303f7 fix: refactor and reduce lora math drbh 2024-06-04 05:01:52 +0000
  • 0a6ea7fb57 feat: load weights within layer and refactor lora pass drbh 2024-06-04 01:38:43 +0000
  • db3d8e6518 feat: first draft load multiple lora drbh 2024-05-30 19:16:15 +0000
  • a1cbfc16e9 Contributing guide & Code of Conduct Lysandre 2024-06-10 15:47:17 +0200
  • d3c7f63416 Merge branch 'main' into amd-ci-fx amd-ci-fx fxmarty 2024-06-10 15:10:04 +0200
  • de6f2cd08d disable marlin tests on rocm/xpu fxmarty 2024-06-10 13:06:11 +0000
  • 05caaa2f31 fix: split docs and start conceptual page (#1836) drbh 2024-05-01 03:03:25 -0400
  • d9108dd7b2 (chore): torch 2.3.0 (#1833) Nicolas Patry 2024-04-30 18:15:35 +0200
  • e43e511c8d chore: update torch (#1730) OlivierDehaene 2024-04-30 14:04:28 +0200
  • c5e3357293 Handle images in chat api (#1828) drbh 2024-04-30 06:18:32 -0400
  • 5ceaeb9f31 feat: add vlm docs and simple examples (#1812) drbh 2024-04-30 06:14:39 -0400
  • 8c847f2b60 Fixing frequency penalty (#1811) Martin Iglesias Goyanes 2024-04-30 12:13:23 +0200
  • cd72a57123 feat: add how it works section (#1773) drbh 2024-04-30 05:45:49 -0400
  • 91352b1b71 fix: use get_speculate to the number of layers (#1737) OlivierDehaene 2024-04-30 11:45:26 +0200