mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-11-18 23:15:59 +00:00
Disable Cachix pushes (#3312)
* Disable Cachix pushes This is not safe until we have sandboxed builds. For TGI alone this might not be a huge issue, but with Cachix caching disabled in hf-nix, TGI CI would build all the packages and push it to our cache. * fix: bump docs --------- Co-authored-by: drbh <david.richard.holtz@gmail.com>
This commit is contained in:
parent
8801ba12cf
commit
06d9d88b95
2
.github/workflows/nix_build.yaml
vendored
2
.github/workflows/nix_build.yaml
vendored
@ -23,7 +23,7 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
name: huggingface
|
name: huggingface
|
||||||
# If you chose signing key for write access
|
# If you chose signing key for write access
|
||||||
authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
|
# authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
|
||||||
env:
|
env:
|
||||||
USER: github_runner
|
USER: github_runner
|
||||||
- name: Build
|
- name: Build
|
||||||
|
|||||||
2
.github/workflows/nix_cache.yaml
vendored
2
.github/workflows/nix_cache.yaml
vendored
@ -22,7 +22,7 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
name: huggingface
|
name: huggingface
|
||||||
# If you chose signing key for write access
|
# If you chose signing key for write access
|
||||||
authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
|
#authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
|
||||||
env:
|
env:
|
||||||
USER: github_runner
|
USER: github_runner
|
||||||
- name: Build impure devshell
|
- name: Build impure devshell
|
||||||
|
|||||||
4
.github/workflows/nix_tests.yaml
vendored
4
.github/workflows/nix_tests.yaml
vendored
@ -27,9 +27,11 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
name: huggingface
|
name: huggingface
|
||||||
# If you chose signing key for write access
|
# If you chose signing key for write access
|
||||||
authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
|
#authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
|
||||||
env:
|
env:
|
||||||
USER: github_runner
|
USER: github_runner
|
||||||
|
- name: Nix info
|
||||||
|
run: nix-shell -p nix-info --run "nix-info -m"
|
||||||
- name: Build
|
- name: Build
|
||||||
run: nix develop .#test --command echo "Ok"
|
run: nix develop .#test --command echo "Ok"
|
||||||
- name: Pre-commit tests.
|
- name: Pre-commit tests.
|
||||||
|
|||||||
@ -59,8 +59,6 @@ Options:
|
|||||||
|
|
||||||
Marlin kernels will be used automatically for GPTQ/AWQ models.
|
Marlin kernels will be used automatically for GPTQ/AWQ models.
|
||||||
|
|
||||||
[env: QUANTIZE=]
|
|
||||||
|
|
||||||
Possible values:
|
Possible values:
|
||||||
- awq: 4 bit quantization. Requires a specific AWQ quantized model: <https://hf.co/models?search=awq>. Should replace GPTQ models wherever possible because of the better latency
|
- awq: 4 bit quantization. Requires a specific AWQ quantized model: <https://hf.co/models?search=awq>. Should replace GPTQ models wherever possible because of the better latency
|
||||||
- compressed-tensors: Compressed tensors, which can be a mixture of different quantization methods
|
- compressed-tensors: Compressed tensors, which can be a mixture of different quantization methods
|
||||||
@ -73,6 +71,8 @@ Options:
|
|||||||
- bitsandbytes-fp4: Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model
|
- bitsandbytes-fp4: Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model
|
||||||
- fp8: [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above This dtype has native ops should be the fastest if available. This is currently not the fastest because of local unpacking + padding to satisfy matrix multiplication limitations
|
- fp8: [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above This dtype has native ops should be the fastest if available. This is currently not the fastest because of local unpacking + padding to satisfy matrix multiplication limitations
|
||||||
|
|
||||||
|
[env: QUANTIZE=]
|
||||||
|
|
||||||
```
|
```
|
||||||
## SPECULATE
|
## SPECULATE
|
||||||
```shell
|
```shell
|
||||||
@ -457,14 +457,14 @@ Options:
|
|||||||
--usage-stats <USAGE_STATS>
|
--usage-stats <USAGE_STATS>
|
||||||
Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on
|
Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on
|
||||||
|
|
||||||
[env: USAGE_STATS=]
|
|
||||||
[default: on]
|
|
||||||
|
|
||||||
Possible values:
|
Possible values:
|
||||||
- on: Default option, usage statistics are collected anonymously
|
- on: Default option, usage statistics are collected anonymously
|
||||||
- off: Disables all collection of usage statistics
|
- off: Disables all collection of usage statistics
|
||||||
- no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
|
- no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
|
||||||
|
|
||||||
|
[env: USAGE_STATS=]
|
||||||
|
[default: on]
|
||||||
|
|
||||||
```
|
```
|
||||||
## PAYLOAD_LIMIT
|
## PAYLOAD_LIMIT
|
||||||
```shell
|
```shell
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user