You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,8 +11,6 @@
11
11
12
12
13
13
## News 🔥
14
-
-[2025/08/08] Now we support [gpt-oss-20b](./apps/Android/MnnLlmChat/README.md#releases).
15
-
-[2025/08/05] MNN Chat Android is availabe in [GooglePlay](https://play.google.com/store/apps/details?id=com.alibaba.mnnllm.android.release) !
16
14
-[2025/06/11] New App MNN TaoAvatar released, you can talk with 3DAvatar offline with LLM, ASR, TTS, A2BS and NNR models all run local on your device!! [MNN TaoAvatar](./apps/Android/Mnn3dAvatar/README.md)
Copy file name to clipboardExpand all lines: docs/transformers/llm.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,17 +106,23 @@ optional arguments:
106
106
mnn quant bit, 4 or 8, default is 4.
107
107
--quant_block QUANT_BLOCK
108
108
mnn quant block, 0 mean channle-wise, default is 128.
109
+
--visual_quant_bit VISUAL_QUANT_BIT
110
+
mnn visual model quant bit, 4 or 8, default is setting in utils/vision.py by different vit model.
111
+
--visual_quant_block VISUAL_QUANT_BLOCK
112
+
mnn visual model quant block, 0 mean channle-wise, default is setting in utils/vision.py by different vit model.
109
113
--lm_quant_bit LM_QUANT_BIT
110
114
mnn lm_head quant bit, 4 or 8, default is `quant_bit`.
111
115
--mnnconvert MNNCONVERT
112
116
local mnnconvert path, if invalid, using pymnn.
113
117
--ppl Whether or not to get all logits of input tokens.
114
118
--awq Whether or not to use awq quant.
115
119
--sym Whether or not to using symmetric quant (without zeropoint), defualt is False.
120
+
--visual_sym Whether or not to using symmetric quant (without zeropoint) for visual model, defualt is False.
116
121
--seperate_embed For lm and embed shared model, whether or not to sepearte embed to avoid quant, defualt is False, if True, embed weight will be seperate to embeddingbf16.bin.
117
122
--lora_split Whether or not export lora split, defualt is False.
0 commit comments