Commit Graph

12 Commits

Author SHA1 Message Date
Karol Damaszke
32acdd55b4
Add grammar support ()
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-05-20 11:16:34 +02:00
drbh
56670398f3 fix: handle batches with and without grammars ()
This PR correctly handles batches with a mixture of constrained and non
constrained generations.

Currently if batch contains mixed generations the generation will throw
an error because it will incorrectly attempt to constrain a request with
an empty grammar.

We now handled `None` grammars and only apply the mask if needed

Fixes:
https://github.com/huggingface/text-generation-inference/issues/1643
2024-04-25 14:06:48 +03:00
drbh
ab074c81b7 fix: improve tool type, bump pydantic and outlines ()
This PR resolves a couple

- [X] adjusts the tool response to align with openai's tools response
type
- [X] bumps pydantic to `2.6.4` in all apps (resolves dependency issue
when running tests)
- [X] bump `outlines` version and fix import for new name
2024-04-25 12:34:55 +03:00
drbh
d4aebbd10a fix: correctly index into mask when applying grammar ()
This PR fixes how the grammar mask is index when generating text and
adds a new test to ensure the grammars work with non flash models
2024-04-25 10:16:16 +03:00
OlivierDehaene
2ac1b55c95 v1.4.1 () 2024-04-24 15:42:59 +03:00
OlivierDehaene
31b5e37f49 chore: add pre-commit () 2024-04-24 15:32:02 +03:00
drbh
55acb86f42 Outlines guided generation ()
This WIP PR starts to add grammar support via outlines, currently this
PR supports very simple regex grammars and does not optimize for
precompiling or caching grammar fsm's.

todo:
- [X] add simple outlines guidance to `NextTokenChooser`
- [X] update protos for grammar
- [X] update generation params API
- [X] constrain simple grammar
- [ ] support parsing more complex grammar into fsm
- [ ] support all outline support grammar types
- [ ] explore optimizations to avoid recompiling grammars

guided request
```bash
curl -s 'http://localhost:3000/generate' \
--header 'Content-Type: application/json' \
--data-raw '{
    "inputs": "make an email for david: \n",
    "parameters": {
        "max_new_tokens": 6,
        "grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
    }
}' | jq
```
response
```json
{
  "generated_text": "david@example.com"
}
```

unguided request
```bash
curl -s 'http://localhost:3000/generate' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": "make an email for david: \n",
    "parameters": {
        "max_new_tokens": 6
    }
}' | jq
```
response
```json
{
  "generated_text": "    email = 'david"
}
```
2024-04-24 14:57:37 +03:00
OlivierDehaene
f1d8da3ba6 feat(server): add frequency penalty () 2024-04-24 08:43:50 +00:00
regisss
cc744ba426 Add changes from Optimum Habana's TGI folder 2023-12-05 11:12:16 +01:00
Nick Hill
e4b26aa10b
fix(server): avoid errors for very small top_p values ()
See https://github.com/huggingface/transformers/pull/24111

I didn't add validation to the `__init__` method since it's not done for
other values/warpers.
2023-07-04 20:11:33 +02:00
OlivierDehaene
53aa9194c8
fix(server): fix warpers on CPU ()
Closes 
2023-06-20 11:06:10 +02:00
OlivierDehaene
62f91f78ac
feat(server): support vectorized warpers in flash causal lm ()
Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
2023-05-26 12:30:27 +02:00