Default Branch

c6071749db · Fix mask passed to flashinfer (#3324) · Updated 2025-09-08 17:47:03 +00:00

Branches

ddf0b02240 · All the assertions. · Updated 2025-03-04 12:32:05 +00:00

187
2

f72547c9fb · feat(metrics): remove ngrok mandatory feature for backendv3 crate · Updated 2025-02-27 21:56:04 +00:00

147
5

efb20054aa · feat: consolidate streaming and event creation logic and add tests for streaming generations · Updated 2025-02-27 16:12:51 +00:00

147
22

7e60666711 · ?? · Updated 2025-02-21 09:18:56 +00:00

157
12

95d1172347 · fix: bump ci build yaml · Updated 2025-02-17 15:24:25 +00:00

174
5

b7250f0473 · Revert "fix: expand logic for different hardware" · Updated 2025-02-11 16:14:02 +00:00

178
4

09631bc8a2 · fix: bump prompt · Updated 2025-02-11 15:15:29 +00:00

177
2

eb0194a9c1 · fix qwen2 vl crash in continous batching · Updated 2025-02-10 09:54:45 +00:00

178
1

408663e61a · fix triton to 3.1.0 to fix ipex import issue · Updated 2025-02-06 08:54:03 +00:00

182
1

463228ebfc · Update version number. · Updated 2025-01-31 13:24:45 +00:00

187
1

5452c1294c · backend(vllm): disable metrics for now · Updated 2025-01-31 09:56:54 +00:00

212
9

4e1c68e6f8 · Increase session time · Updated 2025-01-30 08:53:28 +00:00

194
8

b0b855fecd · update doc · Updated 2025-01-29 12:46:03 +00:00

217
5

c871d74b46 · More logs in the allocator. · Updated 2025-01-28 10:19:37 +00:00

198
1

bafbd06744 · Update transformers_flash_causal_lm.py · Updated 2025-01-24 14:06:50 +00:00

199
2

b70f29d729 · Bypasse perm issue. · Updated 2025-01-24 11:12:47 +00:00

201
2

6d335ca7ce · Remove modifications in Lock. · Updated 2025-01-22 12:37:17 +00:00

211
2

17192c9a0e · fix: remove test debug params · Updated 2025-01-17 16:19:02 +00:00

246
54

b4187d6022 · Add tgi_batch_current_size and tgi_batch_current_size as response header · Updated 2025-01-17 14:48:02 +00:00

221
1

bde5f9ad82 · nix: update to PyTorch 2.5.1 · Updated 2025-01-17 06:44:21 +00:00

225
1