mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 11:54:52 +00:00
add documentation for 4bit quantization options
This commit is contained in:
parent
4e0d8b2efb
commit
c52a5d4456
@ -239,6 +239,8 @@ You can also quantize the weights with bitsandbytes to reduce the VRAM requireme
|
||||
make run-bloom-quantize # Requires 8xA100 40GB
|
||||
```
|
||||
|
||||
4bit quantization is available using the [NF4 and FP4 data types from bitsandbytes](https://arxiv.org/pdf/2305.14314.pdf). It can be enabled by providing `--quantize bitsandbytes-nf4` or `--quantize bitsandbytes-fp4` as a command line argument to `text-generation-launcher`.
|
||||
|
||||
## Develop
|
||||
|
||||
```shell
|
||||
|
@ -104,7 +104,8 @@ struct Args {
|
||||
num_shard: Option<usize>,
|
||||
|
||||
/// Whether you want the model to be quantized. This will use `bitsandbytes` for
|
||||
/// quantization on the fly, or `gptq`.
|
||||
/// quantization on the fly, or `gptq`. 4bit quantization is available through
|
||||
/// `bitsandbytes` by providing the `bitsandbytes-fp4` or `bitsandbytes-nf4` options.
|
||||
#[clap(long, env, value_enum)]
|
||||
quantize: Option<Quantization>,
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user