text-generation-inference/backends/v3/src
Daniël de Kok c7b495f97d hotfix: avoid non-prefilled block use when using prefix caching (#2489)
The minimum batch size logic could cause prefix blocks to be
deallocated without prefill. The next allocation of the same
prefix would then use garbage blocks.
2024-09-25 06:13:11 +00:00
..
client Lots of improvements (Still 2 allocators) (#2449) 2024-09-25 06:13:11 +00:00
backend.rs hotfix: avoid non-prefilled block use when using prefix caching (#2489) 2024-09-25 06:13:11 +00:00
block_allocator.rs Lots of improvements (Still 2 allocators) (#2449) 2024-09-25 06:13:11 +00:00
lib.rs Keeping the benchmark somewhere (#2401) 2024-09-25 06:05:43 +00:00
main.rs Pr 2352 ci branch (#2382) 2024-09-25 06:01:59 +00:00
queue.rs Lots of improvements (Still 2 allocators) (#2449) 2024-09-25 06:13:11 +00:00
radix.rs Lots of improvements (Still 2 allocators) (#2449) 2024-09-25 06:13:11 +00:00