text-generation-inference/backends/v3
OlivierDehaene 8e0c161d0a
fix: incomplete generations w/ single tokens generations and models that did not support chunking (#2770)
* Incomplete generation stream fix (#2754)

entries.len() could > batch.size in prefill, so need to filter as well.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* entries was wrongly extended for model that did not support chunking

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
2024-11-21 16:37:55 +00:00
..
benches Keeping the benchmark somewhere (#2401) 2024-08-12 15:22:02 +02:00
src fix: incomplete generations w/ single tokens generations and models that did not support chunking (#2770) 2024-11-21 16:37:55 +00:00
build.rs Rebase TRT-llm (#2331) 2024-07-31 10:33:10 +02:00
Cargo.toml fix: bump minijinja version and add test for llama 3.1 tools (#2463) 2024-08-27 13:31:08 -04:00