Initial Commit

Browse files

Files changed (9) hide show

.gitattributes +1 -0
CITATION.cff +20 -0
LICENSE +21 -0
README.md +146 -0
example_grid.jpg +3 -0
qwen-360-diffusion-int4-bf16-v1-b.safetensors +3 -0
qwen-360-diffusion-int4-bf16-v1.safetensors +3 -0
qwen-360-diffusion-int8-bf16-v1.safetensors +3 -0
run_qwen_image_nf4.py +124 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+example_grid.jpg filter=lfs diff=lfs merge=lfs -text

CITATION.cff ADDED Viewed

	@@ -0,0 +1,20 @@

+cff-version: 1.2.0
+message: "If you use this software, please cite it using the metadata in this file."
+title: "Qwen 360 Diffusion"
+authors:
+  - family-names: "Egan"
+    given-names: "Ben"
+  - name: "XWAVE"
+  - name: "Jimmy Carter"
+date-released: "2025-12-10"
+keywords:
+  - "text to image"
+  - "Qwen"
+  - "qwen image"
+  - "diffusion model"
+  - "equirectangular"
+  - "equirectangular projection"
+  - "360 image"
+  - "360 degree"
+url: "https://huggingface.co/ProGamerGov/qwen-360-diffusion"
+type: software

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Ben Egan, XWAVE, Jimmy Carter
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,146 @@

+---
+language:
+- en
+base_model:
+- Qwen/Qwen-Image
+pipeline_tag: text-to-image
+tags:
+- '360'
+- '360°'
+- '360-degree'
+- '360-image'
+- equirectangular
+- equirectangular-projection
+- image-generation
+- text-to-image
+---
+# Qwen 360 Diffusion
+![](https://huggingface.co/ProGamerGov/qwen-360-diffusion/resolve/main/example_grid.jpg)
+## General
+Qwen 360 Diffusion is a rank 128 LoRA built on top of a 20B parameter MMDiT (Multimodal Diffusion Transformer) model, designed to generate 360 degree equirectangular projection images from text descriptions.
+The model was trained from the [Qwen Image model](https://huggingface.co/Qwen/Qwen-Image) on an extremely diverse dataset composed of tens of thousands of equirectangular images, depicting landscapes, interiors, humans, animals, and objects. All images were resized to 2048x1024 before training.
+The model was also trained with a diverse dataset of normal photos for regularization, making the model a realism finetune when prompted correctly.
+Based on extensive testing, the model's capabilities vastly exceed all other currently available T2I 360 image generation models. Thus when given the right prompt, the model should be capable of producing almost anything you want.
+The model is designed to be capable of producing equirectangular images that can be used for non-VR purposes such as general imagery, photography, artwork, architecture, portraiture, and many other concepts.
+---
+## Usage
+To activate panoramic generation, include one of the following **trigger phrases** or some variation of one or more of the following trigger words in your prompt:
+> `"equirectangular"`, `"360 image"`, `"360 panorama"`, or `"360 degree panorama with equirectangular projection"`
+Note that even using a 360 viewer on your 2D device screen can create a feeling like you are actually inside the scene, known as a sense of 'presence' in psychology.
+### Recommended Settings
+- **Aspect ratio:** For best results use the `2:1` resolution of `2048×1024`. Using `1024×512`, `1536×768`, and other 2:1 ratios for text-to-image generation may cause the model to struggle with generating proper horizons.
+- **Prompt tips:** Include desired **medium or style**, such as _photograph_, _oil painting_, _illustration_, or _digital art_.
+- **360-specific considerations:** Remember that 360 images wrap around with no borders—the left edge connects to the right edge, while the top and bottom edges merge into a single point at the poles of the sphere.
+- **Human subject considerations:** For full body shots, specify the head/face and footwear (e.g., "wearing boots") or lack thereof to avoid incomplete or incorrectly distorted outputs.
+- **Equirectangular distortion:** Outputs show increasing horizontal stretching as you move vertically away from the center. These distortions are not visible when viewed in a 360 viewer.
+Once generated, you can upscale your panoramas for use as **photographs**, **artwork**, **skyboxes, virtual environments, VR experiences, VR therapy, or 3D scene backgrounds**—or as part of a **text-to-video-to-3D-world pipeline**. Note that the model is also designed to produce equirectangular images for non-VR usage as well.
+---
+### Notes
+#### FP8 inference
+When using FP8 quantization, for maximum visual fidelity it's **strongly recommended** to use the GGUF Q8 or int8 quantized versions of Qwen Image transformer models.
+If you are using transformer models with `fp8_e4m3fn` or `fp8_e5m2` precision, or low precision models trained with "accuracy-fixing" methods (e.g., `ostris/ai-toolkit`), they may cause **patch or grid artifacts** when used with the int8-trained LoRA model. Some have found this issue to be caused by directly downcasting to fp8 from fp16, without proper scaling and calibration.
+  → To avoid this, use the **lower-accuracy full-precision versions** of the model:
+  `qwen-360-diffusion-int4-bf16-v1.safetensors` or `qwen-360-diffusion-int4-bf16-v1-b.safetensors`.
+  - **Low-Precision Artifact Mitigation**
+    If artifacts still appear when using the int4-trained LoRA on a `fp8_e4m3fn` or `fp8_e5m2` transformer quant, they can often be reduced by:
+     - Adjusting the **LoRA weight**, and/or refining both **positive and negative prompts**.
+---
+## Additional Tools
+### HTML 360 Viewer
+To make the viewing and sharing of 360 images & video easier, I built a web browser based HTML 360 viewer that runs locally on your device. It works on desktop and mobile browsers, and has optional VR headset support.
+* You can try it out here on Github Pages: https://progamergov.github.io/html-360-viewer/
+    * Github code: https://github.com/ProGamerGov/html-360-viewer
+* You can append '`?url=`' followed by a link to your image in order to automatically load it into the 360 viewer, making sharing your 360 creations extremely easy.
+* Example: https://progamergov.github.io/html-360-viewer/?url=https://upload.wikimedia.org/wikipedia/commons/7/76/Dauderi.jpg
+### Recommended ComfyUI Nodes
+If you are a user of [ComfyUI](https://github.com/comfyanonymous/ComfyUI), then these sets of nodes can be useful for working with 360 images & videos.
+* ComfyUI_preview360panorama
+    * For viewing 360s inside of ComfyUI (may be slower than my web browser viewer).
+    * Link: https://github.com/ProGamerGov/ComfyUI_preview360panorama
+* ComfyUI_pytorch360convert
+    * For editing 360s, seam fixing, view rotation, and masking potential artifacts.
+    * Link: https://github.com/ProGamerGov/ComfyUI_pytorch360convert
+* ComfyUI_pytorch360convert_video
+    * For generating sweep videos that rotate around the scene.
+    * Link: https://github.com/ProGamerGov/ComfyUI_pytorch360convert_video
+For those using diffusers and other libraries, you can make use of the [pytorch360convert](https://github.com/ProGamerGov/pytorch360convert) library when working with 360 media.
+---
+### Diffusers Example
+Example script using [Diffusers](https://github.com/huggingface/diffusers) can be found [here](https://huggingface.co/ProGamerGov/qwen-360-diffusion/blob/main/run_qwen_image_nf4.py).
+---
+## Limitations
+A large portion of training data has the viewer at 90 degrees to the direction of gravity, and thus rotating outputs may be required to achieve different vertical angles.
+---
+## Contributors
+- [Ben Egan](https://github.com/ProGamerGov)
+- [XWAVE](https://twitter.com/XWAVEart)
+- [Jimmy Carter](https://huggingface.co/jimmycarter)
+## Citation Information
+BibTeX
+```
+@software{Egan_Qwen_360_Diffusion_2025,
+  author = {Egan, Ben and {XWAVE} and {Jimmy Carter}},
+  license = {MIT},
+  month = aug,
+  title = {{Qwen 360 Diffusion}},
+  url = {https://huggingface.co/ProGamerGov/qwen-360-diffusion},
+  year = {2025}
+}
+```
+APA
+```
+Egan, B., XWAVE, & Jimmy Carter. (2025). Qwen 360 Diffusion [Computer software]. https://huggingface.co/ProGamerGov/qwen-360-diffusion
+```
+Please refer to the [CITATION.cff](https://huggingface.co/ProGamerGov/qwen-360-diffusion/blob/main/CITATION.cff) for more information on how to cite this dataset.

example_grid.jpg ADDED Viewed

Git LFS Details

SHA256: 8e750628e7d1fe87bd65dbf222e9b06753266494d3668fb81c6f0c2e7e6bbd72
Pointer size: 133 Bytes
Size of remote file: 15.8 MB

qwen-360-diffusion-int4-bf16-v1-b.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf710d6e6e67545e32f979069ba3aaade158326278daf80b8afd251e22fc8ba7
+size 755039336

qwen-360-diffusion-int4-bf16-v1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0cee2e49e9976a374112225e3766c5d2b758bd98fc54733fc90771c6dccbecc
+size 755039336

qwen-360-diffusion-int8-bf16-v1.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1dc0d8c164e92bda96a7ab4d55a846e7c36cec369ba37ee85d19ef1707210de3
+size 377552312

run_qwen_image_nf4.py ADDED Viewed

	@@ -0,0 +1,124 @@

+from PIL import Image
+import torch
+import numpy as np
+from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
+from transformers import Qwen2_5_VLForConditionalGeneration
+from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
+from diffusers import QwenImagePipeline, QwenImageTransformer2DModel, QwenImageInpaintPipeline
+prompt = "equirectangular, a woman and a man sitting at a cafe, the woman has red hair and she's wearing purple sweater with a black scarf and a white hat, the man is sitting on the other side of the table and he's wearing a white shirt with a purple scarf and red hat, both of them are sipping their coffee while in the table there's some cake slices on their respective plates, each with forks and knives at each side."
+negative_prompt = ""
+output_filename = "qwen_bnb_nf4.png"
+width, height = 2048, 1024
+true_cfg_scale = 4.0
+num_inference_steps = 25
+seed = 42
+lora_model_id = "jimmycarter/qwen-3d-epoch-7"
+lora_filename = "pytorch_lora_weights.safetensors"
+model_id = "diffusers/qwen-image-nf4"
+torch_dtype = torch.bfloat16
+device = "cuda"
+fix_seam = True
+inpaint_strength, seam_width = 0.5, 0.10
+def shift_equirect(img):
+    """Horizontal 50% shift using torch.roll."""
+    t = torch.from_numpy(np.array(img)).permute(2, 0, 1).float() / 255.0
+    t = torch.roll(t, shifts=(0, t.shape[2] // 2), dims=(1, 2))
+    return Image.fromarray((t.permute(1, 2, 0).numpy() * 255).astype(np.uint8))
+def create_seam_mask(w, h, frac=0.10):
+    """Create vertical seam mask as PIL Image (center seam)."""
+    mask = torch.zeros((h, w))
+    seam_w = max(1, int(w * frac))
+    c = w // 2
+    mask[:, c - seam_w // 2:c + seam_w // 2] = 1.0
+    return Image.fromarray((mask.numpy() * 255).astype("uint8"), "L")
+def load_pipeline(text_encoder, transformer, mode="t2i"):
+    pip_class = QwenImagePipeline if mode == "t2i" else QwenImageInpaintPipeline
+    pipe = pip_class.from_pretrained(
+        model_id, transformer=transformer, text_encoder=text_encoder, torch_dtype=torch_dtype
+    )
+    pipe.load_lora_weights(lora_model_id, weight_name=lora_filename)
+    pipe.enable_model_cpu_offload()
+    pipe.enable_vae_tiling()
+    pipe.transformer.compile_repeated_blocks(
+        fullgraph=True, dynamic=True
+    )
+    return pipe
+def main():
+    quantization_config = DiffusersBitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type="nf4",
+        bnb_4bit_compute_dtype=torch.bfloat16,
+        llm_int8_skip_modules=["transformer_blocks.0.img_mod"],
+    )
+    transformer = QwenImageTransformer2DModel.from_pretrained(
+        model_id,
+        subfolder="transformer",
+        quantization_config=quantization_config,
+        torch_dtype=torch_dtype,
+    ).to("cpu")
+    quantization_config = TransformersBitsAndBytesConfig(
+        load_in_4bit=True,
+        bnb_4bit_quant_type="nf4",
+        bnb_4bit_compute_dtype=torch.bfloat16,
+    )
+    text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+        model_id,
+        subfolder="text_encoder",
+        quantization_config=quantization_config,
+        torch_dtype=torch_dtype,
+    ).to("cpu")
+    generator = torch.Generator(device=device).manual_seed(seed)
+    pipe = load_pipeline(text_encoder, transformer, mode="t2i")
+    image = pipe(
+        prompt=prompt,
+        negative_prompt=negative_prompt,
+        width=width,
+        height=height,
+        num_inference_steps=num_inference_steps,
+        true_cfg_scale=true_cfg_scale,
+        generator=generator,
+    ).images[0]
+    image.save(output_filename)
+    if fix_seam:
+        del pipe
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        shifted = shift_equirect(image)  # roll 50% to expose seam
+        mask = create_seam_mask(width, height, frac=seam_width)
+        pipe = load_pipeline(text_encoder, transformer, mode="i2i")
+        image_fixed = pipe(
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            image=shifted,
+            mask_image=mask,
+            strength=inpaint_strength,
+            width=width,
+            height=height,
+            num_inference_steps=num_inference_steps,
+            true_cfg_scale=true_cfg_scale,
+            generator=generator,
+        ).images[0]
+        image_fixed = shift_equirect(image_fixed)
+        image_fixed.save(output_filename.replace(".png", "_seamfix.png"))
+main()