mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-06-19 15:52:08 +00:00
Tested with ``` CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq EXLLAMA_VERSION=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq CUDA_VISIBLE_DEVICES="0,1" text-generation-launcher --model-id TheBloke/Llama-2-7B-Chat-GPTQ --quantize gptq ``` all with good and identical results on MI210. --------- Co-authored-by: Felix Marty <felix@hf.co> Co-authored-by: OlivierDehaene <olivier@huggingface.co> Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
15 lines
350 B
Plaintext
15 lines
350 B
Plaintext
.idea
|
|
target
|
|
router/tokenizer.json
|
|
*__pycache__*
|
|
|
|
# ROCm auto-generated files
|
|
*.hip
|
|
server/exllamav2_kernels/exllamav2_kernels/hip/
|
|
server/exllama_kernels/exllama_kernels/hip/
|
|
server/exllama_kernels/exllama_kernels/hip_func/
|
|
*_hip.cuh
|
|
server/exllama_kernels/exllama_kernels/hip_buffers.cuh
|
|
server/exllama_kernels/exllama_kernels/exllama_ext_hip.cpp
|
|
|