Disable Cachix pushes (#3312)

* Disable Cachix pushes This is not safe until we have sandboxed builds. For TGI alone this might not be a huge issue, but with Cachix caching disabled in hf-nix, TGI CI would build all the packages and push it to our cache. * fix: bump docs --------- Co-authored-by: drbh <david.richard.holtz@gmail.com>
2025-11-18 23:15:59 +00:00 · 2025-08-26 19:27:57 +02:00 · 2025-08-26 19:27:57 +02:00 · 06d9d88b95
commit 06d9d88b95
parent 8801ba12cf
4 changed files with 10 additions and 8 deletions
--- a/.github/workflows/nix_build.yaml
+++ b/.github/workflows/nix_build.yaml
@ -23,7 +23,7 @@ jobs:
      with:
        name: huggingface
        # If you chose signing key for write access
-        authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
+        # authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
      env:
        USER: github_runner
    - name: Build
--- a/.github/workflows/nix_cache.yaml
+++ b/.github/workflows/nix_cache.yaml
@ -22,7 +22,7 @@ jobs:
        with:
          name: huggingface
          # If you chose signing key for write access
-          authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
+          #authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
        env:
          USER: github_runner
      - name: Build impure devshell
--- a/.github/workflows/nix_tests.yaml
+++ b/.github/workflows/nix_tests.yaml
@ -27,9 +27,11 @@ jobs:
      with:
        name: huggingface
        # If you chose signing key for write access
-        authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
+        #authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
      env:
        USER: github_runner
+    - name: Nix info
+      run: nix-shell -p nix-info --run "nix-info -m"
    - name: Build
      run: nix develop .#test --command echo "Ok"
    - name: Pre-commit tests.
--- a/docs/source/reference/launcher.md
+++ b/docs/source/reference/launcher.md
@ -59,8 +59,6 @@ Options:
          
          Marlin kernels will be used automatically for GPTQ/AWQ models.

-          [env: QUANTIZE=]
-
          Possible values:
          - awq:                4 bit quantization. Requires a specific AWQ quantized model: <https://hf.co/models?search=awq>. Should replace GPTQ models wherever possible because of the better latency
          - compressed-tensors: Compressed tensors, which can be a mixture of different quantization methods
@ -73,6 +71,8 @@ Options:
          - bitsandbytes-fp4:   Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model
          - fp8:                [FP8](https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/) (e4m3) works on H100 and above This dtype has native ops should be the fastest if available. This is currently not the fastest because of local unpacking + padding to satisfy matrix multiplication limitations
          
+          [env: QUANTIZE=]
+
 ```
 ## SPECULATE
 ```shell
@ -457,14 +457,14 @@ Options:
      --usage-stats <USAGE_STATS>
          Control if anonymous usage stats are collected. Options are "on", "off" and "no-stack" Defaul is on

-          [env: USAGE_STATS=]
-          [default: on]
-
          Possible values:
          - on:       Default option, usage statistics are collected anonymously
          - off:      Disables all collection of usage statistics
          - no-stack: Doesn't send the error stack trace or error type, but allows sending a crash event
          
+          [env: USAGE_STATS=]
+          [default: on]
+
 ```
 ## PAYLOAD_LIMIT
 ```shell