text-generation-inference/router/src
drbh d5ed4c110b fix: adjust logprob response logic (#1682)
This PR fixes a bug with `ChatCompletionLogprobs` where if
`top_tokens.len() == 0` empty results were returned.

```bash
 curl http://localhost:3000/v1/chat/completions \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "model": "tgi",
  "logprobs": true,
  "messages": [
    {
      "role": "user",
      "content": "What is deep learning?"
    }
  ],
  "stream": false,
  "max_tokens": 20
}'
```

response


```json
{"id":"","object":"text_completion","created":1711588522,"model":"google/gemma-2b-it","system_fingerprint":"1.4.4-native","choices":[{"index":0,"message":{"role":"assistant","content":"**Deep learning** is a subset of machine learning (ML) that emphasizes the creation of **artificial"},"logprobs":{"content":[{"token":"**","logprob":-0.22558594,"top_logprobs":[]},{"token":"Deep","logprob":-0.0014877319,"top_logprobs":[]},{"token":" learning","logprob":-0.12695312,"top_logprobs":[]},{"token":"**","logprob":-0.055664062,"top_logprobs":[]},{"token":" is","logprob":-0.00090026855,"top_logprobs":[]},{"token":" a","logprob":-0.006072998,"top_logprobs":[]},{"token":" subset","logprob":-2.25,"top_logprobs":[]},{"token":" of","logprob":-0.00031089783,"top_logprobs":[]},{"token":" machine","logprob":-0.091308594,"top_logprobs":[]},{"token":" learning","logprob":-0.00002348423,"top_logprobs":[]},{"token":" (","logprob":-1.671875,"top_logprobs":[]},{"token":"ML","logprob":-0.00040626526,"top_logprobs":[]},{"token":")","logprob":-0.00016212463,"top_logprobs":[]},{"token":" that","logprob":-0.13769531,"top_logprobs":[]},{"token":" emphasizes","logprob":-4.03125,"top_logprobs":[]},{"token":" the","logprob":-0.2890625,"top_logprobs":[]},{"token":" creation","logprob":-3.109375,"top_logprobs":[]},{"token":" of","logprob":-0.00024032593,"top_logprobs":[]},{"token":" **","logprob":-1.2265625,"top_logprobs":[]},{"token":"artificial","logprob":-0.10546875,"top_logprobs":[]}]},"finish_reason":"length"}],"usage":{"prompt_tokens":15,"completion_tokens":20,"total_tokens":35}}
```
2024-04-25 14:06:44 +03:00
..
health.rs Outlines guided generation (#1539) 2024-04-24 14:57:37 +03:00
infer.rs feat: bump minijina and add test for core templates (#1626) 2024-04-25 12:32:23 +03:00
lib.rs fix: adjust logprob response logic (#1682) 2024-04-25 14:06:44 +03:00
main.rs v1.4.2 (#1585) 2024-04-24 18:10:10 +03:00
queue.rs Outlines guided generation (#1539) 2024-04-24 14:57:37 +03:00
server.rs fix: adjust logprob response logic (#1682) 2024-04-25 14:06:44 +03:00
validation.rs Inline images for multimodal models. (#1666) 2024-04-25 12:35:51 +03:00