diff --git a/README.md b/README.md index 2bbb6583..279b03fe 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,17 @@
@@ -18,6 +19,10 @@ A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co) to power LLMs api-inference widgets. +## Note +This is a fork of [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference) +before the restrictive license change. We will maintain this fork under the Apache 2.0 license. All contribution are welcome. + ## Table of contents - [Features](#features) @@ -42,7 +47,7 @@ to power LLMs api-inference widgets. - Serve the most popular Large Language Models with a simple launcher - Tensor Parallelism for faster inference on multiple GPUs - Token streaming using Server-Sent Events (SSE) -- [Continuous batching of incoming requests](https://github.com/huggingface/text-generation-inference/tree/main/router) for increased total throughput +- [Continuous batching of incoming requests](https://github.com/deepinfra/text-generation-inference/tree/main/router) for increased total throughput - Optimized transformers code for inference using [flash-attention](https://github.com/HazyResearch/flash-attention) and [Paged Attention](https://github.com/vllm-project/vllm) on the most popular architectures - Quantization with [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323) - [Safetensors](https://github.com/huggingface/safetensors) weight loading @@ -132,7 +137,7 @@ print(text) ### API documentation You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. -The Swagger UI is also available at: [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference). +The Swagger UI is also available at: [https://deepinfra.github.io/text-generation-inference](https://deepinfra.github.io/text-generation-inference). ### Using a private or gated model