This website requires JavaScript.
Explore
Help
Sign In
huggingface
/
text-generation-inference
Watch
5
Star
0
Fork
0
You've already forked text-generation-inference
mirror of
https://github.com/huggingface/text-generation-inference.git
synced
2025-04-23 07:52:06 +00:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
e210e15e27
text-generation-inference
/
router
/
src
History
Karol Damaszke
bf5263b88b
Disable watermark with FP8 quantization (
#114
)
...
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-03-27 13:32:20 +01:00
..
health.rs
Rebased
#617
(
#868
)
2023-08-28 11:43:47 +02:00
infer.rs
Revert "Prefer prefill instead of decode when max_waiting_tokens==0 (
#18
)" (
#45
) (
#76
)
2024-02-27 11:56:45 +01:00
lib.rs
Exllama v2 (
#1211
)
2023-11-25 22:38:38 +01:00
main.rs
Adjust warmup to all possible bucket sizes and decode batch size = 1 (
#113
)
2024-03-27 11:59:51 +01:00
queue.rs
Heap based router queue (
#63
) (
#88
)
2024-02-29 10:56:26 +01:00
server.rs
Control prefill and decode batch size separately (
#6
)
2024-01-02 18:21:01 +01:00
validation.rs
Disable watermark with FP8 quantization (
#114
)
2024-03-27 13:32:20 +01:00