text-generation-inference/launcher/src
fxmarty 5e035063cf ROCm and sliding windows fixes (#2033)
* update vllm commit & fix models using sliding window

* update

* update commit

* fix bug where tunableop is bound to cuda graph even when cuda graph are disabled

* enable tunableop by default

* fix sliding window

* address review

* dead code

* precise comment

* is it flaky?
2024-09-24 03:42:29 +00:00
..
env_runtime.rs Integrate flash attention for starcoder2 tgi through habana and some fixes, enabling (#198) 2024-08-07 22:06:05 +02:00
main.rs ROCm and sliding windows fixes (#2033) 2024-09-24 03:42:29 +00:00