gkalstn0 commited on
Commit
9437a91
·
1 Parent(s): 6e3f286

Add quality comparison videos + 50-step benchmark (#5)

Browse files

- Add quality comparison videos + 50-step benchmark (9a614b190c9fb4b5a39935ebe21f73dba96358fb)
- Remove source mp4s from PR (keep only quality_row*.mp4 used in README) (f1b9a9c1982b793f0d50dfbd32f7f19d224a91b1)
- Switch quality embeds to WebP (mp4 stripped by README sanitizer) (3989e5415b98e8dc4fe1b091c4ac03e036ea02bd)

.gitattributes CHANGED
@@ -44,3 +44,5 @@ motifv-2b-dev-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
44
  motifv-2b-dev-BF16.gguf filter=lfs diff=lfs merge=lfs -text
45
  motifv-2b-dev-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
46
  motifv-2b-dev-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
44
  motifv-2b-dev-BF16.gguf filter=lfs diff=lfs merge=lfs -text
45
  motifv-2b-dev-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
46
  motifv-2b-dev-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
47
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
48
+ *.webp filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -39,30 +39,34 @@ These files are intended for use with the `diffusers` library and allow you to r
39
  **Prerequisites:** PyTorch with CUDA support must be installed first. See [pytorch.org](https://pytorch.org/get-started/locally/) for your CUDA version.
40
 
41
  ```bash
42
- pip install "transformers>=5.5.4" accelerate ftfy einops sentencepiece regex Pillow imageio imageio-ffmpeg
43
  pip install git+https://github.com/waitingcheung/diffusers.git@feat/motif-video
44
- pip install gguf
45
  ```
46
 
47
- > **Note:** `einops` is required for optimal performance. Without it, inference speed degrades by ~2x.
48
- >
49
- > GGUF support for Motif-Video requires a development branch of `diffusers` (PR [#13551](https://github.com/huggingface/diffusers/pull/13551)) and will be available in a future official release.
50
-
51
  ## Usage
52
 
53
  ```python
54
  import torch
55
  from diffusers import (
56
  AdaptiveProjectedGuidance,
 
57
  GGUFQuantizationConfig,
58
  MotifVideoPipeline,
59
  MotifVideoTransformer3DModel,
60
  )
61
- from diffusers.training_utils import set_seed
62
  from diffusers.utils import export_to_video
63
  from huggingface_hub import hf_hub_download
64
 
65
- # Configure the guider (Adaptive Projected Guidance)
 
 
 
 
 
 
 
 
 
66
  guider = AdaptiveProjectedGuidance(
67
  guidance_scale=8.0,
68
  adaptive_projected_guidance_rescale=12.0,
@@ -71,10 +75,8 @@ guider = AdaptiveProjectedGuidance(
71
  normalization_dims="spatial",
72
  )
73
 
74
- # Choose quantization variant
75
  variant = "Q4_K_M" # options: Q4_0, Q4_1, Q4_K_M, Q5_0, Q5_1, Q5_K_M, Q6_K, Q8_0, BF16
76
 
77
- # Download GGUF file and load quantized transformer
78
  ckpt_path = hf_hub_download(
79
  "Motif-Technologies/Motif-Video-2B-GGUF",
80
  filename=f"motifv-2b-dev-{variant}.gguf",
@@ -88,7 +90,6 @@ transformer = MotifVideoTransformer3DModel.from_single_file(
88
  torch_dtype=torch.bfloat16,
89
  )
90
 
91
- # Load the full pipeline with the quantized transformer
92
  pipe = MotifVideoPipeline.from_pretrained(
93
  "Motif-Technologies/Motif-Video-2B",
94
  revision="diffusers-integration",
@@ -96,40 +97,82 @@ pipe = MotifVideoPipeline.from_pretrained(
96
  guider=guider,
97
  transformer=transformer,
98
  )
99
- pipe = pipe.to("cuda")
100
 
101
- # Generate a video
102
- prompt = "A cat walking on a sunny beach"
103
- set_seed(0)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  output = pipe(
105
  prompt=prompt,
 
106
  height=736,
107
  width=1280,
108
- num_frames=33,
109
  num_inference_steps=50,
 
 
 
110
  )
111
  export_to_video(output.frames[0], "output.mp4", fps=24)
112
  ```
113
 
114
  ## Benchmark
115
 
116
- Measured on NVIDIA H200, 1280x736, 121 frames, 4 steps:
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- | Variant | Speed (s/it) | Peak VRAM (GB) | VRAM Saved vs BF16 |
119
- |---------|-------------|----------------|-------------------|
120
- | Q4_0 | 23.22 | 20.35 | 2.62 GB |
121
- | Q4_1 | 23.22 | 20.46 | 2.51 GB |
122
- | Q4_K_M | 23.27 | 20.40 | 2.57 GB |
123
- | Q5_0 | 23.32 | 20.58 | 2.39 GB |
124
- | Q5_1 | 23.28 | 20.69 | 2.28 GB |
125
- | Q5_K_M | 23.30 | 20.60 | 2.37 GB |
126
- | Q6_K | 23.28 | 20.82 | 2.15 GB |
127
- | Q8_0 | 23.17 | 21.26 | 1.71 GB |
128
- | BF16 | 23.16 | 22.97 | — |
129
 
130
  **Key findings:**
131
- - Speed is identical across all quantizations (~23.2 s/it) — no dequantization overhead.
132
- - VRAM savings scale with quantization level: Q4 saves ~2.6 GB, Q8 saves ~1.7 GB.
 
 
 
 
 
 
 
 
 
 
133
 
134
  ## Notes
135
 
 
39
  **Prerequisites:** PyTorch with CUDA support must be installed first. See [pytorch.org](https://pytorch.org/get-started/locally/) for your CUDA version.
40
 
41
  ```bash
42
+ pip install "transformers>=5.5.4" accelerate ftfy einops sentencepiece regex Pillow imageio imageio-ffmpeg gguf
43
  pip install git+https://github.com/waitingcheung/diffusers.git@feat/motif-video
 
44
  ```
45
 
 
 
 
 
46
  ## Usage
47
 
48
  ```python
49
  import torch
50
  from diffusers import (
51
  AdaptiveProjectedGuidance,
52
+ DPMSolverMultistepScheduler,
53
  GGUFQuantizationConfig,
54
  MotifVideoPipeline,
55
  MotifVideoTransformer3DModel,
56
  )
 
57
  from diffusers.utils import export_to_video
58
  from huggingface_hub import hf_hub_download
59
 
60
+
61
+ # DPMSolver++ subclass that ignores pipeline-supplied sigmas and builds its own flow-matching schedule.
62
+ class FlowDPMSolver(DPMSolverMultistepScheduler):
63
+ def set_timesteps(self, num_inference_steps=None, device=None,
64
+ sigmas=None, mu=None, timesteps=None):
65
+ if sigmas is not None and num_inference_steps is None:
66
+ num_inference_steps = len(sigmas)
67
+ super().set_timesteps(num_inference_steps=num_inference_steps, device=device, timesteps=timesteps)
68
+
69
+
70
  guider = AdaptiveProjectedGuidance(
71
  guidance_scale=8.0,
72
  adaptive_projected_guidance_rescale=12.0,
 
75
  normalization_dims="spatial",
76
  )
77
 
 
78
  variant = "Q4_K_M" # options: Q4_0, Q4_1, Q4_K_M, Q5_0, Q5_1, Q5_K_M, Q6_K, Q8_0, BF16
79
 
 
80
  ckpt_path = hf_hub_download(
81
  "Motif-Technologies/Motif-Video-2B-GGUF",
82
  filename=f"motifv-2b-dev-{variant}.gguf",
 
90
  torch_dtype=torch.bfloat16,
91
  )
92
 
 
93
  pipe = MotifVideoPipeline.from_pretrained(
94
  "Motif-Technologies/Motif-Video-2B",
95
  revision="diffusers-integration",
 
97
  guider=guider,
98
  transformer=transformer,
99
  )
 
100
 
101
+ # Replace default Euler scheduler with DPMSolver++ (flow matching).
102
+ flow_shift = 15.0 # bias sampling toward earlier (high-noise) sigmas.
103
+ pipe.scheduler = FlowDPMSolver(
104
+ num_train_timesteps=pipe.scheduler.config.get("num_train_timesteps", 1000),
105
+ algorithm_type="dpmsolver++",
106
+ solver_order=2,
107
+ prediction_type="flow_prediction",
108
+ use_flow_sigmas=True,
109
+ flow_shift=flow_shift,
110
+ )
111
+
112
+ pipe.enable_model_cpu_offload()
113
+
114
+ prompt = (
115
+ "A woman standing in a sunlit field as flower petals swirl around her in slow motion. "
116
+ "Each petal floats gently through the golden light, casting tiny shadows. "
117
+ "Her hair moves like water, and time seems to stand still."
118
+ )
119
+ negative_prompt = (
120
+ "text overlay, graphic overlay, watermark, logo, subtitles, timestamp, "
121
+ "broadcast graphics, UI elements, random letters, frozen pose, rigid, static expression, "
122
+ "jerky motion, mechanical motion, discontinuous motion, flat framing, depthless, dull lighting, "
123
+ "monotone, crushed shadows, blown-out highlights, shifting background, fading background, "
124
+ "poor continuity, identity drift, deformation, flickering, ghosting, smearing, duplication, "
125
+ "mutated proportions, inconsistent clothing, flat colors, desaturated, tonally compressed, "
126
+ "poor background separation, exposure shift, uneven brightness, color balance shift"
127
+ )
128
+
129
+ generator = torch.Generator(device="cuda").manual_seed(42)
130
  output = pipe(
131
  prompt=prompt,
132
+ negative_prompt=negative_prompt,
133
  height=736,
134
  width=1280,
135
+ num_frames=121,
136
  num_inference_steps=50,
137
+ generator=generator,
138
+ frame_rate=24,
139
+ use_linear_quadratic_schedule=False,
140
  )
141
  export_to_video(output.frames[0], "output.mp4", fps=24)
142
  ```
143
 
144
  ## Benchmark
145
 
146
+ Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps:
147
+
148
+ | Variant | Speed (s/it) | Peak alloc (GB) | Peak rsv (GB) | Total (s) | VRAM saved vs BF16 (rsv) |
149
+ |---------|-------------|-----------------|----------------|-----------|--------------------------|
150
+ | BF16 | 23.22 | 14.78 | 24.93 | 1176.1 | — |
151
+ | Q8_0 | 23.24 | 13.10 | 23.14 | 1177.0 | 1.79 |
152
+ | Q6_K | 23.34 | 12.62 | 22.72 | 1181.7 | 2.21 |
153
+ | Q5_K_M | 23.37 | 12.39 | 22.45 | 1183.0 | 2.48 |
154
+ | Q5_1 | 23.35 | 12.47 | 22.66 | 1182.4 | 2.27 |
155
+ | Q5_0 | 23.35 | 12.37 | 22.55 | 1181.9 | 2.38 |
156
+ | Q4_K_M | 23.34 | 12.19 | 22.22 | 1181.5 | 2.71 |
157
+ | Q4_1 | 23.29 | 12.26 | 22.26 | 1179.2 | 2.67 |
158
+ | Q4_0 | 23.31 | 12.14 | 22.18 | 1179.8 | 2.75 |
159
 
160
+ - **Peak alloc** = peak GPU memory occupied by live tensors (model weights + activations), via `torch.cuda.max_memory_allocated`.
161
+ - **Peak rsv** = peak GPU memory reserved by PyTorch's caching allocator (alloc + cached free blocks), via `torch.cuda.max_memory_reserved`. Use this as the effective VRAM footprint when planning headroom.
 
 
 
 
 
 
 
 
 
162
 
163
  **Key findings:**
164
+ - Speed near-identical across all quantizations (~23.2~23.4 s/it) — no dequantization overhead.
165
+ - VRAM savings scale with quant level: Q4 saves ~2.7 GB, Q8 saves ~1.8 GB (reserved).
166
+
167
+ ## Quality Comparison
168
+
169
+ Same prompt and seed across all variants (1280x736, 121 frames, 50 steps, NVIDIA H200). BF16 baseline at top, quantized variants paired below (4-bit → 8-bit). Each video is rendered at 1/2 resolution (640x368 per cell) at the original 24 fps.
170
+
171
+ ![BF16](assets/quality_row1.webp)
172
+ ![Q4_0 / Q4_1](assets/quality_row2.webp)
173
+ ![Q4_K_M / Q5_0](assets/quality_row3.webp)
174
+ ![Q5_1 / Q5_K_M](assets/quality_row4.webp)
175
+ ![Q6_K / Q8_0](assets/quality_row5.webp)
176
 
177
  ## Notes
178
 
assets/quality_row1.webp ADDED

Git LFS Details

  • SHA256: b646395d784b992b4d5f5d79d00b4d3f9f5c528ebbac6a23320bf0760ccfbc8a
  • Pointer size: 132 Bytes
  • Size of remote file: 3.15 MB
assets/quality_row2.webp ADDED

Git LFS Details

  • SHA256: 66cb7c8a0f08deb0ade7ee499038c86c3a4889f9d4bb7f8e136880a3c6dc5528
  • Pointer size: 132 Bytes
  • Size of remote file: 4.59 MB
assets/quality_row3.webp ADDED

Git LFS Details

  • SHA256: ece496ea8a79c551b225b542471c97843d0493cf184f9042c34a728d9c586e9a
  • Pointer size: 132 Bytes
  • Size of remote file: 5.87 MB
assets/quality_row4.webp ADDED

Git LFS Details

  • SHA256: f09c1c7af0d642fb5f28ff72ec8bdc5ee0034803fe2c00a9605b3ad8dd6dd93c
  • Pointer size: 132 Bytes
  • Size of remote file: 6.13 MB
assets/quality_row5.webp ADDED

Git LFS Details

  • SHA256: 05ee919677c66bcd6af3ee9de10ed1f6265380bc753258d06e21e3b3e878eb1b
  • Pointer size: 132 Bytes
  • Size of remote file: 6.04 MB