Complexity-Diffusion VAE

Variational Autoencoder for Complexity-Diffusion image generation pipeline.

Architecture

89M parameters | 256x256 images | 4-channel latent space

Encoder

z=E(x)∈R32Γ—32Γ—4z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}

Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).

Decoder

x^=D(z)∈R256Γ—256Γ—3\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}

Loss Function

L=Lrecon+Ξ²β‹…DKL(q(z∣x)βˆ₯p(z))+Ξ»β‹…Lperceptual\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}

Where:

  • $\mathcal{L}_{\text{recon}} = |x - \hat{x}|_1$ (L1 reconstruction)
  • $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
  • $\mathcal{L}_{\text{perceptual}}$ uses VGG features

Config

Parameter Value
Image size 256x256
Latent dim 4
Base channels 128
Channel mult [1, 2, 4, 4]
Res blocks 2

Usage

from safetensors.torch import load_file
from complexity_diffusion.vae import ComplexityVAE

# Load
state_dict = load_file("model.safetensors")
vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4)
vae.load_state_dict(state_dict)

# Encode
latents = vae.encode(images)  # [B, 4, 32, 32]

# Decode
reconstructed = vae.decode(latents)  # [B, 3, 256, 256]

Training

Trained on WikiArt (81K images) for 15K steps with:

  • Batch size: 16
  • Learning rate: 1e-4
  • Mixed precision: bf16

Training Curves

Training Curves

Part of Complexity Deep Ecosystem

This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging:

  • INL Dynamics for stable latent space training
  • Token-Routed architecture for efficient processing

Links

License

CC BY-NC 4.0 - Attribution-NonCommercial

Commercial use requires explicit permission from the author.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support