mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-10-11 16:05:24 +00:00
batch.prefill_cache_indices is reset in generate_token instead of forward, so that position_id could be updated correctly Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> |
||
---|---|---|
.. | ||
client | ||
gaudi | ||
grpc-metadata | ||
llamacpp | ||
neuron | ||
trtllm | ||
v2 | ||
v3 |