mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-19 22:02:06 +00:00
* Choosing input/total tokens automatically based on available VRAM? * Update doc. * Remove generated files. * Trying to fix non chunking targets. * Attempt #2 * fix. * QuantLinear is rocm compatible. * Much simpler logic after the overhead. * Updating logic + non flash. * Revert doc text. * Simple updates. * Fix integration mt0 (transformers update).
26 lines
480 B
Plaintext
26 lines
480 B
Plaintext
.idea
|
|
target
|
|
router/tokenizer.json
|
|
*__pycache__*
|
|
|
|
backends/v2/src/client/pb
|
|
backends/v3/src/client/pb
|
|
backends/client/src/v2/pb
|
|
backends/client/src/v3/pb
|
|
|
|
# ROCm auto-generated files
|
|
*.hip
|
|
server/exllamav2
|
|
server/exllama_kernels/exllama_kernels/hip/
|
|
server/exllama_kernels/exllama_kernels/hip_func/
|
|
*_hip.cuh
|
|
server/exllama_kernels/exllama_kernels/hip_buffers.cuh
|
|
server/exllama_kernels/exllama_kernels/exllama_ext_hip.cpp
|
|
|
|
data/
|
|
load_tests/*.json
|
|
server/fbgemmm
|
|
|
|
.direnv/
|
|
.venv/
|