Default Branch

8f8819795f · Fixing CI (#3184) · Updated 2025-04-18 11:07:18 +00:00

Branches

5b340a5ffd · Dump work. · Updated 2023-11-30 22:05:51 +00:00    Leaf

863
8

8a7771a33c · Use cuda devel instead · Updated 2023-10-11 08:40:06 +00:00    Leaf

890
2

56de96abe9 · missing arg · Updated 2023-10-05 13:14:17 +00:00    Leaf

893
3
dev

c35f39cf83 · Add AWQ quantization inference support (#1019) · Updated 2023-09-25 07:58:02 +00:00    Leaf

925
1

33958e0989 · Start. · Updated 2023-09-11 18:25:49 +00:00    Leaf

930
1

4bac76241d · Update server.rs · Updated 2023-08-21 08:10:57 +00:00    Leaf

954
5

cf43528538 · remove stream since its a separate PR · Updated 2023-08-18 10:57:36 +00:00    Leaf

956
6

89a4e723d2 · Attempting to fix torch leak. · Updated 2023-08-12 07:06:49 +00:00    Leaf

991
1

4a9615e8ff · Add to ToC · Updated 2023-08-11 13:05:10 +00:00    Leaf

994
2

43ed6c217a · Dummy commit · Updated 2023-08-10 08:33:52 +00:00    Leaf

977
1

4ddb6681ac · Add workflow to upload documentation · Updated 2023-08-08 05:49:45 +00:00    Leaf

980
1

e994ad1172 · Added InferenceClient · Updated 2023-08-02 14:57:01 +00:00    Leaf

994
11

7766fee9b1 · fix typo for dynamic rotary (#745) · Updated 2023-07-31 16:58:46 +00:00    Leaf

988
0
Included

f555dabca8 · Putting back header inclusion (seems unused but still) · Updated 2023-07-20 15:46:51 +00:00    Leaf

1015
21

bfa3920aec · BNB 4bits. · Updated 2023-07-12 12:42:43 +00:00    Leaf

1041
7

db4efbf4bc · fix(server): T5 weights names. (#582) · Updated 2023-07-12 08:01:42 +00:00    Leaf

1037
0
Included

a4fd6905d8 · fmt · Updated 2023-06-23 13:01:05 +00:00    Leaf

1064
2

dca0fe2585 · Adding GPTQ integration tests. · Updated 2023-06-19 12:14:17 +00:00    Leaf

1067
19

17837b1e51 · Adding docs about GPTQ usage. · Updated 2023-06-15 17:41:04 +00:00    Leaf

1067
19

fb0840944c · Reducing number of reps while autotuning. · Updated 2023-06-06 11:56:10 +00:00    Leaf

1118
9