Commit Graph

  • 7098f37ddd
    Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-22 23:30:22 +0300
  • 7dcd953969
    Initial commit Merve Noyan 2023-08-22 23:26:08 +0300
  • ee19513cf7
    Initial commit Merve Noyan 2023-08-22 23:02:17 +0300
  • 98afdbbc1d
    Add to toctree Merve Noyan 2023-08-22 22:04:38 +0300
  • 3a522aa6e1
    Explained HBM & SRAM Merve Noyan 2023-08-22 21:49:23 +0300
  • 7037d0259f
    Update flash_attention.md Merve Noyan 2023-08-22 21:43:52 +0300
  • 3fe2836a54
    paged attention initial commit Merve Noyan 2023-08-22 21:18:50 +0300
  • e7ec2ff282 added all the files rsnm2 2023-08-22 18:12:24 +0000
  • e7f3eac8d5 intermediate results from server rsnm2 2023-08-22 18:12:03 +0000
  • 172d262adf
    Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:26 +0300
  • 7ee207f75c
    Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:20 +0300
  • abc4fda615
    Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 17:53:14 +0300
  • b8f7a0c690 minor cosmetic change rsnm2 2023-08-22 14:19:27 +0000
  • 27baaeffe0
    Update tensor_parallelism.md Merve Noyan 2023-08-22 14:50:16 +0300
  • 1077cfc7d1 Do not allow Pydantic 3 Jelle Zijlstra 2023-08-22 04:38:58 -0700
  • 856af1c03a Support Pydantic 2 Jelle Zijlstra 2023-08-22 04:35:36 -0700
  • 095b6d9178
    Added to toctree Merve Noyan 2023-08-22 14:20:29 +0300
  • 048d44cfcd
    Added paper Merve Noyan 2023-08-22 14:19:17 +0300
  • df8330194f
    Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-22 13:57:33 +0300
  • 608c5c93b2 Expose ignore_eos_token to client caiyesd 2023-08-12 16:14:05 +0000
  • cef8ca442f added readme rsnm2 2023-08-22 04:07:46 +0000
  • ffbc64cd7c added ipynb rsnm2 2023-08-22 04:07:11 +0000
  • 010739bba1 updated ipynb for simulating interacting with the server rsnm2 2023-08-22 04:06:52 +0000
  • a972a7870f re-added files rsnm2 2023-08-22 02:59:48 +0000
  • 6a739e5142 made service for deepsparse rsnm2 2023-08-22 02:59:18 +0000
  • 7c394a9214 restructured directory rsnm2 2023-08-21 19:32:33 +0000
  • fec0b1dce5 baseline implementation complete, working on sample server for example rsnm2 2023-08-21 19:27:07 +0000
  • 8e52a7fb3a
    Merge branch 'huggingface:main' into main Marcus Dunn 2023-08-21 08:53:08 -0700
  • f8565cd915 finished deepsparse_model.py implementation rsnm2 2023-08-21 15:17:19 +0000
  • ecbc0a0f4a
    Add ignore_eos_token Chirag Jain 2023-08-21 14:06:17 +0530
  • 2035f3b7bc
    Update docs/source/conceptual/flash_attention.md Merve Noyan 2023-08-21 11:31:30 +0300
  • 181dcb6219
    Nits Merve Noyan 2023-08-21 11:14:05 +0300
  • 4bac76241d
    Update server.rs self-generating-docs Merve Noyan 2023-08-21 11:10:57 +0300
  • 00cd4d0b2f
    Update server.rs Merve Noyan 2023-08-21 11:09:45 +0300
  • 6541d8d8d9
    Update server.rs Merve Noyan 2023-08-21 11:08:38 +0300
  • 09fee2f6fb
    fix Merve Noyan 2023-08-21 10:43:15 +0300
  • bb8c24f5b7
    Update docs/source/conceptual/tensor_parallelism.md Merve Noyan 2023-08-21 10:39:14 +0300
  • 08bf10ca17
    initial commit Merve Noyan 2023-08-21 00:23:10 +0300
  • 2fa5e31839
    Update server.rs Merve Noyan 2023-08-20 23:13:29 +0300
  • 11d5c603ee
    Add to toctree Merve Noyan 2023-08-20 23:02:06 +0300
  • 03fda99ee1 updated to include interally managed kv-cache rsnm2 2023-08-20 13:50:18 +0000
  • 79ea7ff02f rewrote pipeline with simpler interface, working on adding to tgi now rsnm2 2023-08-20 02:23:44 +0000
  • 4446d0f838
    Create tensor_parallelism.md Merve Noyan 2023-08-20 01:42:56 +0300
  • 2416cc66cf
    Remove redundant content Omar Sanseviero 2023-08-19 18:17:35 +0200
  • 3318a66a6d finished launch readme rsnm2 2023-08-19 13:56:17 +0000
  • 5f2ea449f3 confirmed everything is working and installed rsnm2 2023-08-19 13:54:11 +0000
  • d93e2ab5b6 Adding small benchmark script. Nicolas Patry 2023-08-18 17:28:31 +0000
  • 8f1d266e69
    Update consuming_tgi.md Merve Noyan 2023-08-18 17:14:52 +0300
  • 3a2a13ecd5
    Added diff and dark/light mode for demo Merve Noyan 2023-08-18 17:13:31 +0300
  • 117425564e
    Added space and replaced screenshots with llama Merve Noyan 2023-08-18 16:13:20 +0300
  • e49ecbf4e5
    Update docs/source/basic_tutorials/consuming_tgi.md Merve Noyan 2023-08-18 16:03:23 +0300
  • 08593dc180
    Addressed Omar's comments Merve Noyan 2023-08-18 14:05:33 +0300
  • cf43528538 remove stream since its a separate PR improve-docs philschmid 2023-08-18 12:57:36 +0200
  • 7b349f9b13
    Update docs/source/basic_tutorials/request_parameters.md Philipp Schmid 2023-08-18 12:54:57 +0200
  • 50f0d3827f Show how to do streaming with JavaScript osanseviero 2023-08-18 11:48:58 +0200
  • 2c6e07395f remove image philschmid 2023-08-18 09:27:48 +0200
  • 52cacff4a4 fix wording philschmid 2023-08-18 09:27:13 +0200
  • eccb8a0099 fix library philschmid 2023-08-18 09:16:26 +0200
  • 69c3d79a1c add docs philschmid 2023-08-18 09:13:39 +0200
  • a361cd2b53
    Update streaming.md Omar Sanseviero 2023-08-17 17:27:06 +0200
  • 2248dd8e18
    Apply suggestions from code review Omar Sanseviero 2023-08-17 17:25:45 +0200
  • 452f8f3c2b
    Update consuming_tgi.md Merve Noyan 2023-08-17 16:31:11 +0300
  • 6c699d86bf Backport to transformers==4.31. Nicolas Patry 2023-08-17 07:28:14 +0000
  • bfa62eb0d8 Upgrading versions. Nicolas Patry 2023-08-17 07:03:18 +0000
  • 308ab7d5b9 Update integration test + shard them Nicolas Patry 2023-08-17 06:41:28 +0000
  • a16ea64f78 fix lol VictorSanh 2023-08-16 21:04:12 +0000
  • 7e20b8cb50 "Fix" for rw-1b. Nicolas Patry 2023-08-16 19:58:30 +0000
  • 06ebc7220b Fix osanseviero 2023-08-16 18:23:01 +0200
  • 138ffa2a92 Remove unexistent gif osanseviero 2023-08-16 18:17:48 +0200
  • a0b2a09cf3 Update gifs + add dark one osanseviero 2023-08-16 18:07:34 +0200
  • f3266b8a4a Switch async client example to use stream osanseviero 2023-08-16 17:17:15 +0200
  • 3dfa7d33eb
    Apply suggestions from code review Omar Sanseviero 2023-08-16 17:14:35 +0200
  • 603b3f1020 Adding integration test for idefics. Nicolas Patry 2023-08-16 14:57:08 +0000
  • aa6b7aaf25 Wording osanseviero 2023-08-16 16:55:19 +0200
  • 4ca853e1de Fix height of iframe osanseviero 2023-08-16 16:42:09 +0200
  • c9b76c6715 Fix ToC osanseviero 2023-08-16 16:31:58 +0200
  • 24ea07cd6d Using markdown to send image. Nicolas Patry 2023-08-16 14:26:26 +0000
  • 64cbf288e4 Add streaming guide osanseviero 2023-08-16 16:21:19 +0200
  • 5ba141e5d9 cleaning and fixes VictorSanh 2023-08-16 03:43:43 +0000
  • 24cdcbc995 Merge branch 'main' into new-main Geoffrey Angus 2023-08-15 16:17:12 -0700
  • 996416c814 changed call to tokenizer from encode to __call__ marcusdunn 2023-08-15 16:14:56 -0700
  • 1d18dbd47e add test Geoffrey Angus 2023-08-15 16:03:37 -0700
  • ae5beb9d7b add logs Geoffrey Angus 2023-08-15 15:37:54 -0700
  • ab0937b90c works end-to-end Geoffrey Angus 2023-08-15 15:20:57 -0700
  • c70dea3802 added missing imports of SequenceBiasLogitsProcessor and typings.Dict marcusdunn 2023-08-15 15:13:07 -0700
  • 25c48f5679 added a tokenizer to HeterogeneousNextTokenChooser marcusdunn 2023-08-15 15:12:32 -0700
  • 03975ccb3e Cleanup osanseviero 2023-08-16 00:04:09 +0200
  • 16a679390e Add mention that it just works with api-hosted models osanseviero 2023-08-15 23:53:32 +0200
  • f1fb15f4ae Add mention of async client osanseviero 2023-08-15 23:52:09 +0200
  • f65d703de4
    Merge branch 'huggingface:main' into main Marcus Dunn 2023-08-15 14:52:04 -0700
  • 11400bd8dc Changes to InferenceClient osanseviero 2023-08-15 23:48:53 +0200
  • bbca2211ce removed visibility modifier for parse_key_val marcusdunn 2023-08-15 14:05:57 -0700
  • f97b07cf68 removed dbg! marcusdunn 2023-08-15 14:04:49 -0700
  • ed2efe3dd9 fixed clap arg parsing. marcusdunn 2023-08-15 14:03:27 -0700
  • 9b3be8f79b added logit_bias to benchmarks. marcusdunn 2023-08-15 13:48:39 -0700
  • 034e39185f run server run-dev and plumbs through adapter-id Geoffrey Angus 2023-08-15 13:43:00 -0700
  • a06b681673 added logit_bias to python client marcusdunn 2023-08-15 13:35:19 -0700
  • 8666df0f41 Fixing watermark. Nicolas Patry 2023-08-15 21:21:57 +0200
  • 782b3e5d86
    Update README.md Adarsh Shirawalmath 2023-08-15 22:42:07 +0530
  • 8cda9ca2f7 Defaulting to bf16. Nicolas Patry 2023-08-15 18:38:42 +0200