lpalbou commited on
Commit
6875952
·
verified ·
1 Parent(s): 7ac1c82

Update TI2V-5B q8 package to current mixed q8/BF16 policy

Browse files
README.md CHANGED
@@ -51,13 +51,14 @@ weights at BF16 runtime precision.
51
  Measured on 2026-06-04 with `mlx-gen 0.18.10` on an Apple M5 Max with 128 GiB unified memory.
52
 
53
  Validation profile: `1280x704`, 17 frames, 20 denoising steps, guidance `5`, 24 fps, seed `321`,
54
- explicit empty negative prompt.
 
55
 
56
- | Layout | Storage | Logical Model | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Output |
57
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | --- |
58
- | Upstream source snapshot | 31.9 GiB | 10.6 GiB | 102.7 GiB | 13.7 GiB | 58.5 GiB | 216.2 s | [base-source.mp4](validation/ti2v5b-clean/base-source.mp4) |
59
- | Prepared BF16 package | 21.2 GiB | 10.6 GiB | 102.6 GiB | 14.5 GiB | 58.5 GiB | 261.6 s | [prepared-bf16.mp4](validation/ti2v5b-clean/prepared-bf16.mp4) |
60
- | This mixed q8/BF16 package | 16.9 GiB | 6.3 GiB | 103.7 GiB | 13.8 GiB | 54.2 GiB | 243.4 s | [mixed-q8-bf16.mp4](validation/ti2v5b-clean/mixed-q8-bf16.mp4) |
61
 
62
  This package reduces storage, logical model bytes, active MLX model bytes, and MLX allocator peak in
63
  the validation profile. It did not reduce full-process physical peak memory in this profile because
@@ -66,9 +67,12 @@ transient video-generation allocations dominated the run.
66
  The source and prepared BF16 package produced byte-identical decoded MP4 frames. This mixed q8/BF16
67
  package stayed visually in the same family with mean frame MAE `1.66` versus source/BF16.
68
 
69
- `Storage` is the Hugging Face repository total. `Logical Model` is the loaded Wan transformer plus
70
- VAE tensor footprint measured from MLX arrays. `Full-Process Physical Peak` is Darwin
71
- `phys_footprint` sampled from model initialization through MP4 save and health validation.
 
 
 
72
 
73
  Validation assets:
74
 
 
51
  Measured on 2026-06-04 with `mlx-gen 0.18.10` on an Apple M5 Max with 128 GiB unified memory.
52
 
53
  Validation profile: `1280x704`, 17 frames, 20 denoising steps, guidance `5`, 24 fps, seed `321`,
54
+ explicit empty negative prompt. This is a large normal-cache profile, not a `--low-ram` profile and
55
+ not comparable to the A14B short low-RAM rows as a model-size memory statement.
56
 
57
+ | Layout | Storage | Wan MLX Model | MLX Active After Generation | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Output |
58
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- |
59
+ | Upstream source snapshot | 31.9 GiB | 10.6 GiB | 10.3 GiB | 102.7 GiB | 13.7 GiB | 58.5 GiB | 216.2 s | [base-source.mp4](validation/ti2v5b-clean/base-source.mp4) |
60
+ | Prepared BF16 package | 21.2 GiB | 10.6 GiB | 10.3 GiB | 102.6 GiB | 14.5 GiB | 58.5 GiB | 261.6 s | [prepared-bf16.mp4](validation/ti2v5b-clean/prepared-bf16.mp4) |
61
+ | This mixed q8/BF16 package | 16.9 GiB | 6.3 GiB | 6.1 GiB | 103.7 GiB | 13.8 GiB | 54.2 GiB | 243.4 s | [mixed-q8-bf16.mp4](validation/ti2v5b-clean/mixed-q8-bf16.mp4) |
62
 
63
  This package reduces storage, logical model bytes, active MLX model bytes, and MLX allocator peak in
64
  the validation profile. It did not reduce full-process physical peak memory in this profile because
 
67
  The source and prepared BF16 package produced byte-identical decoded MP4 frames. This mixed q8/BF16
68
  package stayed visually in the same family with mean frame MAE `1.66` versus source/BF16.
69
 
70
+ `Storage` is the Hugging Face repository total. `Wan MLX Model` is the loaded Wan transformer plus
71
+ VAE tensor footprint measured from MLX arrays; it excludes the UMT5 text encoder and video/save
72
+ buffers. `MLX Active After Generation` is the live MLX allocator footprint after `generate_video()`
73
+ returns, before cleanup. `Full-Process Physical Peak` is Darwin `phys_footprint` sampled from model
74
+ initialization through MP4 save and health validation. `Max RSS` can under-report Apple
75
+ unified-memory/Metal pressure, and `MLX Peak` is only the MLX allocator high-water mark.
76
 
77
  Validation assets:
78
 
transformer/0.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ba9f08cfc6b9e245483c0d91ee5088205f7ceaa302f3326c5e29b3c831b02b97
3
- size 2137450610
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3252d7ac118bc33774e11dee2fef4c2ff6e9a5b302348b860318488db51473d7
3
+ size 2143814575
transformer/1.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:16b8e7120b9ba90bc1976fd16bdb88b1a09f002f429bc5e8f335845ff10a4a49
3
- size 2144437513
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:113895039a17b78f4cfc408b75c5b90799de573ace006df7638182f8d488629d
3
+ size 2117693374
transformer/2.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:da4867a6c46f7fa1f6dcd2642ae2a7bb9520759bd56a2ed09396266322446d9f
3
- size 1034389766
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d03bae2c8a838a77f0d648b8d0c21799cefe5a7b19a3fc65c8e000bce927d3d
3
+ size 1138634137
transformer/model.safetensors.index.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "metadata": {
3
  "quantization_level": "8",
4
- "mflux_version": "0.18.6"
5
  },
6
  "weight_map": {
7
  "rope.freqs_cos": "0.safetensors",
@@ -9,24 +9,14 @@
9
  "patch_embedding.weight": "0.safetensors",
10
  "patch_embedding.bias": "0.safetensors",
11
  "condition_embedder.time_embedder.linear_1.weight": "0.safetensors",
12
- "condition_embedder.time_embedder.linear_1.scales": "0.safetensors",
13
- "condition_embedder.time_embedder.linear_1.biases": "0.safetensors",
14
  "condition_embedder.time_embedder.linear_1.bias": "0.safetensors",
15
  "condition_embedder.time_embedder.linear_2.weight": "0.safetensors",
16
- "condition_embedder.time_embedder.linear_2.scales": "0.safetensors",
17
- "condition_embedder.time_embedder.linear_2.biases": "0.safetensors",
18
  "condition_embedder.time_embedder.linear_2.bias": "0.safetensors",
19
  "condition_embedder.time_proj.weight": "0.safetensors",
20
- "condition_embedder.time_proj.scales": "0.safetensors",
21
- "condition_embedder.time_proj.biases": "0.safetensors",
22
  "condition_embedder.time_proj.bias": "0.safetensors",
23
  "condition_embedder.text_embedder.linear_1.weight": "0.safetensors",
24
- "condition_embedder.text_embedder.linear_1.scales": "0.safetensors",
25
- "condition_embedder.text_embedder.linear_1.biases": "0.safetensors",
26
  "condition_embedder.text_embedder.linear_1.bias": "0.safetensors",
27
  "condition_embedder.text_embedder.linear_2.weight": "0.safetensors",
28
- "condition_embedder.text_embedder.linear_2.scales": "0.safetensors",
29
- "condition_embedder.text_embedder.linear_2.biases": "0.safetensors",
30
  "condition_embedder.text_embedder.linear_2.bias": "0.safetensors",
31
  "blocks.0.attn1.to_q.weight": "0.safetensors",
32
  "blocks.0.attn1.to_q.scales": "0.safetensors",
@@ -567,26 +557,26 @@
567
  "blocks.11.attn2.to_q.scales": "0.safetensors",
568
  "blocks.11.attn2.to_q.biases": "0.safetensors",
569
  "blocks.11.attn2.to_q.bias": "0.safetensors",
570
- "blocks.11.attn2.to_k.weight": "0.safetensors",
571
- "blocks.11.attn2.to_k.scales": "0.safetensors",
572
- "blocks.11.attn2.to_k.biases": "0.safetensors",
573
- "blocks.11.attn2.to_k.bias": "0.safetensors",
574
- "blocks.11.attn2.to_v.weight": "0.safetensors",
575
- "blocks.11.attn2.to_v.scales": "0.safetensors",
576
- "blocks.11.attn2.to_v.biases": "0.safetensors",
577
- "blocks.11.attn2.to_v.bias": "0.safetensors",
578
- "blocks.11.attn2.to_out.0.weight": "0.safetensors",
579
- "blocks.11.attn2.to_out.0.scales": "0.safetensors",
580
- "blocks.11.attn2.to_out.0.biases": "0.safetensors",
581
- "blocks.11.attn2.to_out.0.bias": "0.safetensors",
582
- "blocks.11.attn2.norm_q.weight": "0.safetensors",
583
- "blocks.11.attn2.norm_k.weight": "0.safetensors",
584
- "blocks.11.norm2.weight": "0.safetensors",
585
- "blocks.11.norm2.bias": "0.safetensors",
586
- "blocks.11.ffn.net.0.weight": "0.safetensors",
587
- "blocks.11.ffn.net.0.scales": "0.safetensors",
588
- "blocks.11.ffn.net.0.biases": "0.safetensors",
589
- "blocks.11.ffn.net.0.bias": "0.safetensors",
590
  "blocks.11.ffn.net.1.weight": "1.safetensors",
591
  "blocks.11.ffn.net.1.scales": "1.safetensors",
592
  "blocks.11.ffn.net.1.biases": "1.safetensors",
@@ -1147,19 +1137,19 @@
1147
  "blocks.23.attn2.norm_k.weight": "1.safetensors",
1148
  "blocks.23.norm2.weight": "1.safetensors",
1149
  "blocks.23.norm2.bias": "1.safetensors",
1150
- "blocks.23.ffn.net.0.weight": "1.safetensors",
1151
- "blocks.23.ffn.net.0.scales": "1.safetensors",
1152
- "blocks.23.ffn.net.0.biases": "1.safetensors",
1153
- "blocks.23.ffn.net.0.bias": "1.safetensors",
1154
- "blocks.23.ffn.net.1.weight": "1.safetensors",
1155
- "blocks.23.ffn.net.1.scales": "1.safetensors",
1156
- "blocks.23.ffn.net.1.biases": "1.safetensors",
1157
- "blocks.23.ffn.net.1.bias": "1.safetensors",
1158
- "blocks.23.scale_shift_table": "1.safetensors",
1159
- "blocks.24.attn1.to_q.weight": "1.safetensors",
1160
- "blocks.24.attn1.to_q.scales": "1.safetensors",
1161
- "blocks.24.attn1.to_q.biases": "1.safetensors",
1162
- "blocks.24.attn1.to_q.bias": "1.safetensors",
1163
  "blocks.24.attn1.to_k.weight": "2.safetensors",
1164
  "blocks.24.attn1.to_k.scales": "2.safetensors",
1165
  "blocks.24.attn1.to_k.biases": "2.safetensors",
@@ -1439,8 +1429,6 @@
1439
  "blocks.29.ffn.net.1.bias": "2.safetensors",
1440
  "blocks.29.scale_shift_table": "2.safetensors",
1441
  "proj_out.weight": "2.safetensors",
1442
- "proj_out.scales": "2.safetensors",
1443
- "proj_out.biases": "2.safetensors",
1444
  "proj_out.bias": "2.safetensors",
1445
  "scale_shift_table": "2.safetensors"
1446
  }
 
1
  {
2
  "metadata": {
3
  "quantization_level": "8",
4
+ "mflux_version": "0.18.9"
5
  },
6
  "weight_map": {
7
  "rope.freqs_cos": "0.safetensors",
 
9
  "patch_embedding.weight": "0.safetensors",
10
  "patch_embedding.bias": "0.safetensors",
11
  "condition_embedder.time_embedder.linear_1.weight": "0.safetensors",
 
 
12
  "condition_embedder.time_embedder.linear_1.bias": "0.safetensors",
13
  "condition_embedder.time_embedder.linear_2.weight": "0.safetensors",
 
 
14
  "condition_embedder.time_embedder.linear_2.bias": "0.safetensors",
15
  "condition_embedder.time_proj.weight": "0.safetensors",
 
 
16
  "condition_embedder.time_proj.bias": "0.safetensors",
17
  "condition_embedder.text_embedder.linear_1.weight": "0.safetensors",
 
 
18
  "condition_embedder.text_embedder.linear_1.bias": "0.safetensors",
19
  "condition_embedder.text_embedder.linear_2.weight": "0.safetensors",
 
 
20
  "condition_embedder.text_embedder.linear_2.bias": "0.safetensors",
21
  "blocks.0.attn1.to_q.weight": "0.safetensors",
22
  "blocks.0.attn1.to_q.scales": "0.safetensors",
 
557
  "blocks.11.attn2.to_q.scales": "0.safetensors",
558
  "blocks.11.attn2.to_q.biases": "0.safetensors",
559
  "blocks.11.attn2.to_q.bias": "0.safetensors",
560
+ "blocks.11.attn2.to_k.weight": "1.safetensors",
561
+ "blocks.11.attn2.to_k.scales": "1.safetensors",
562
+ "blocks.11.attn2.to_k.biases": "1.safetensors",
563
+ "blocks.11.attn2.to_k.bias": "1.safetensors",
564
+ "blocks.11.attn2.to_v.weight": "1.safetensors",
565
+ "blocks.11.attn2.to_v.scales": "1.safetensors",
566
+ "blocks.11.attn2.to_v.biases": "1.safetensors",
567
+ "blocks.11.attn2.to_v.bias": "1.safetensors",
568
+ "blocks.11.attn2.to_out.0.weight": "1.safetensors",
569
+ "blocks.11.attn2.to_out.0.scales": "1.safetensors",
570
+ "blocks.11.attn2.to_out.0.biases": "1.safetensors",
571
+ "blocks.11.attn2.to_out.0.bias": "1.safetensors",
572
+ "blocks.11.attn2.norm_q.weight": "1.safetensors",
573
+ "blocks.11.attn2.norm_k.weight": "1.safetensors",
574
+ "blocks.11.norm2.weight": "1.safetensors",
575
+ "blocks.11.norm2.bias": "1.safetensors",
576
+ "blocks.11.ffn.net.0.weight": "1.safetensors",
577
+ "blocks.11.ffn.net.0.scales": "1.safetensors",
578
+ "blocks.11.ffn.net.0.biases": "1.safetensors",
579
+ "blocks.11.ffn.net.0.bias": "1.safetensors",
580
  "blocks.11.ffn.net.1.weight": "1.safetensors",
581
  "blocks.11.ffn.net.1.scales": "1.safetensors",
582
  "blocks.11.ffn.net.1.biases": "1.safetensors",
 
1137
  "blocks.23.attn2.norm_k.weight": "1.safetensors",
1138
  "blocks.23.norm2.weight": "1.safetensors",
1139
  "blocks.23.norm2.bias": "1.safetensors",
1140
+ "blocks.23.ffn.net.0.weight": "2.safetensors",
1141
+ "blocks.23.ffn.net.0.scales": "2.safetensors",
1142
+ "blocks.23.ffn.net.0.biases": "2.safetensors",
1143
+ "blocks.23.ffn.net.0.bias": "2.safetensors",
1144
+ "blocks.23.ffn.net.1.weight": "2.safetensors",
1145
+ "blocks.23.ffn.net.1.scales": "2.safetensors",
1146
+ "blocks.23.ffn.net.1.biases": "2.safetensors",
1147
+ "blocks.23.ffn.net.1.bias": "2.safetensors",
1148
+ "blocks.23.scale_shift_table": "2.safetensors",
1149
+ "blocks.24.attn1.to_q.weight": "2.safetensors",
1150
+ "blocks.24.attn1.to_q.scales": "2.safetensors",
1151
+ "blocks.24.attn1.to_q.biases": "2.safetensors",
1152
+ "blocks.24.attn1.to_q.bias": "2.safetensors",
1153
  "blocks.24.attn1.to_k.weight": "2.safetensors",
1154
  "blocks.24.attn1.to_k.scales": "2.safetensors",
1155
  "blocks.24.attn1.to_k.biases": "2.safetensors",
 
1429
  "blocks.29.ffn.net.1.bias": "2.safetensors",
1430
  "blocks.29.scale_shift_table": "2.safetensors",
1431
  "proj_out.weight": "2.safetensors",
 
 
1432
  "proj_out.bias": "2.safetensors",
1433
  "scale_shift_table": "2.safetensors"
1434
  }
vae/0.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5de7cc01b4737345a64908c281a46b95a1b5a97ef0942f60df6ff1c7e851beb
3
- size 1409401417
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ba84d3014ef6a77ebdbbdc16c339054bdb298c37a992f0f3926fe6c5d6d4769
3
+ size 1409401388
vae/model.safetensors.index.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "metadata": {
3
- "quantization_level": "8",
4
- "mflux_version": "0.18.6"
5
  },
6
  "weight_map": {
7
  "encoder.conv_in.conv3d.weight": "0.safetensors",
@@ -88,10 +88,6 @@
88
  "encoder.norm_out.weight": "0.safetensors",
89
  "encoder.conv_out.conv3d.weight": "0.safetensors",
90
  "encoder.conv_out.conv3d.bias": "0.safetensors",
91
- "quant_conv.conv3d.weight": "0.safetensors",
92
- "quant_conv.conv3d.bias": "0.safetensors",
93
- "post_quant_conv.conv3d.weight": "0.safetensors",
94
- "post_quant_conv.conv3d.bias": "0.safetensors",
95
  "decoder.conv_in.conv3d.weight": "0.safetensors",
96
  "decoder.conv_in.conv3d.bias": "0.safetensors",
97
  "decoder.mid_block.resnets.0.norm1.weight": "0.safetensors",
@@ -199,6 +195,10 @@
199
  "decoder.up_blocks.3.resnets.2.conv2.conv3d.bias": "0.safetensors",
200
  "decoder.norm_out.weight": "0.safetensors",
201
  "decoder.conv_out.conv3d.weight": "0.safetensors",
202
- "decoder.conv_out.conv3d.bias": "0.safetensors"
 
 
 
 
203
  }
204
  }
 
1
  {
2
  "metadata": {
3
+ "quantization_level": "None",
4
+ "mflux_version": "0.18.9"
5
  },
6
  "weight_map": {
7
  "encoder.conv_in.conv3d.weight": "0.safetensors",
 
88
  "encoder.norm_out.weight": "0.safetensors",
89
  "encoder.conv_out.conv3d.weight": "0.safetensors",
90
  "encoder.conv_out.conv3d.bias": "0.safetensors",
 
 
 
 
91
  "decoder.conv_in.conv3d.weight": "0.safetensors",
92
  "decoder.conv_in.conv3d.bias": "0.safetensors",
93
  "decoder.mid_block.resnets.0.norm1.weight": "0.safetensors",
 
195
  "decoder.up_blocks.3.resnets.2.conv2.conv3d.bias": "0.safetensors",
196
  "decoder.norm_out.weight": "0.safetensors",
197
  "decoder.conv_out.conv3d.weight": "0.safetensors",
198
+ "decoder.conv_out.conv3d.bias": "0.safetensors",
199
+ "quant_conv.conv3d.weight": "0.safetensors",
200
+ "quant_conv.conv3d.bias": "0.safetensors",
201
+ "post_quant_conv.conv3d.weight": "0.safetensors",
202
+ "post_quant_conv.conv3d.bias": "0.safetensors"
203
  }
204
  }