Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition
Abstract
Qwen-Image-Layered decomposes images into semantically disentangled RGBA layers using a diffusion model, enabling independent editing of each layer and improving decomposition quality and consistency.
Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose Qwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling inherent editability, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and (3) a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer. Furthermore, to address the scarcity of high-quality multilayer training images, we build a pipeline to extract and annotate multilayer images from Photoshop documents (PSD). Experiments demonstrate that our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing. Our code and models are released on https://github.com/QwenLM/Qwen-Image-Layered{https://github.com/QwenLM/Qwen-Image-Layered}
Community
Recent visual generative models often struggle with consistency during image editing due to the entangled nature of raster images, where all visual content is fused into a single canvas. In contrast, professional design tools employ layered representations, allowing isolated edits while preserving consistency. Motivated by this, we propose \textbf{Qwen-Image-Layered}, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers, enabling \textbf{inherent editability}, where each RGBA layer can be independently manipulated without affecting other content. To support variable-length decomposition, we introduce three key components: (1) an RGBA-VAE to unify the latent representations of RGB and RGBA images; (2) a VLD-MMDiT (Variable Layers Decomposition MMDiT) architecture capable of decomposing a variable number of image layers; and (3) a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer. Furthermore, to address the scarcity of high-quality multilayer training images, we build a pipeline to extract and annotate multilayer images from Photoshop documents (PSD). Experiments demonstrate that our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing.
arXiv lens breakdown of this paper π https://arxivlens.com/PaperView/Details/qwen-image-layered-towards-inherent-editability-via-layer-decomposition-9194-7a40c6da
- Executive Summary
- Detailed Breakdown
- Practical Applications
When will the code and model be released?
In the paper they point to a repo that does not exist π https://github.com/QwenLM/QwenImage-Layered
The repository exists and it's available at the following location https://github.com/QwenLM/Qwen-Image-Layered (it's just a typo in the paper).
how to use it in figma or photoshop
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OmniPSD: Layered PSD Generation with Diffusion Transformer (2025)
- Controllable Layer Decomposition for Reversible Multi-Layer Image Generation (2025)
- TAUE: Training-free Noise Transplant and Cultivation Diffusion Model (2025)
- Illustrator's Depth: Monocular Layer Index Prediction for Image Decomposition (2025)
- From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition (2025)
- V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties (2025)
- MatPedia: A Universal Generative Foundation for High-Fidelity Material Synthesis (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 7
Browse 7 models citing this paperDatasets citing this paper 0
No dataset linking this paper
