Commit Graph

182 Commits

Author SHA1 Message Date
OlivierDehaene
17bc841b1b
feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene
c720555adc
v0.3.0 (#72) 2023-02-16 17:28:29 +01:00
OlivierDehaene
7b3d460d21
fix(launcher): copy current env vars to subprocesses (#70)
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene
68455353f5
feat(launcher): add disable_custom_kernels arg (#67) 2023-02-15 16:23:45 +01:00
OlivierDehaene
c5a4a1faf3
feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size (#60) 2023-02-08 17:53:33 +01:00
OlivierDehaene
2fe5e1b30e
V0.2.1 (#58) 2023-02-07 15:40:25 +01:00
OlivierDehaene
4acc42a605
fix(server): better handling of inference mode (#57) 2023-02-07 15:38:22 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text (#50)
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
7b870e1e18
feat(router): use background task to manage request queue (#52)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene
775115e3a5
feat(server): allow the server to use a local weight cache (#49) 2023-02-01 16:22:10 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox (#39) 2023-01-31 18:53:56 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support (#41) 2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" (#40)
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support (#36)
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
struct Details {
    finish_reason: String,
    generated_tokens: u32,
    seed: Option<u64>,
}

struct StreamResponse {
    token: Token,
    generated_text: Option<String>,
    details: Option<Details>,
}

struct ErrorResponse {
    error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder (#26) 2023-01-20 12:24:39 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros (#24)
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
fcc2c5fcbf
feat(launcher): Log server stdout (#19)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
OlivierDehaene
611e21cb13
fix(server): Fix stop sequences (#11) 2022-12-16 16:03:39 +01:00
OlivierDehaene
3e2e6240b8
feat(launcher): Add integration tests (#9) 2022-12-16 11:29:36 +01:00
OlivierDehaene
4236e41b0d feat(server): Improved doc 2022-11-07 12:53:56 +01:00
OlivierDehaene
cea6051eff feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard 2022-11-04 18:31:08 +01:00
OlivierDehaene
b3b7ea0d74 feat: Use json formatter by default in docker image 2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis 2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9 feat(server): Support bitsandbytes 2022-10-27 14:25:29 +02:00
Nicolas Patry
c8ce9b2515
feat(server): Use safetensors
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
OlivierDehaene
c837893370 feat(router): Add max_waiting_tokens 2022-10-21 16:40:05 +02:00
Olivier Dehaene
f16f2f5ae1 v0.1.0 2022-10-20 19:14:44 +02:00