Add README updates and images with Git LFS

Files changed (5) hide show

.gitattributes +1 -0
README.md +157 -67
bagel_labs_logo.png +3 -0
generated_images.png +3 -0
training_architecture.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -6,108 +6,198 @@ tags:
 - multi-expert
 - dit
 - laion
 ---
-# Paris
-A multi-expert diffusion model trained with dynamic expert routing on LAION-Aesthetic.
-## Model Description
-This model uses **8 specialized DiT experts** with a learned router that dynamically selects the best expert for each generation based on the noisy latent and timestep.
-- **Architecture**: dit-XL/2 with 8 experts
-- **Router**: dit-based routing network
-- **Hidden Size**: 1152
-- **Layers**: 28
-- **Attention Heads**: 16
-- **Parameters per Expert**: ~0M
-- **Total Parameters**: ~3M
-- **Text Conditioning**: ✓ (CLIP ViT-L/14)
-- **Training Dataset**: LAION-Aesthetic
-## Model Structure
-```
-Paris/
-├── config.json              # High-level model configuration
-├── model_index.json         # Pipeline component index
-├── expert_0/                # Specialized expert models
-│   ├── config.json
-│   └── diffusion_pytorch_model.safetensors
-├── expert_1/ ... expert_7/
-├── router/                  # Dynamic routing network
-│   ├── config.json
-│   └── pytorch_model.safetensors
-├── vae/                     # VAE (sd-vae-ft-mse)
-├── text_encoder/            # CLIP text encoder
-├── tokenizer/               # CLIP tokenizer
-└── inference_pipeline.py    # Custom inference code
-```
-## Usage
 ```python
-from inference_pipeline import DDMPipeline
 # Load the pipeline
-pipeline = DDMPipeline.from_pretrained("paris")
 # Generate images
 images = pipeline(
     prompt="A beautiful sunset over Paris, oil painting style",
     num_inference_steps=50,
     guidance_scale=7.5,
-    num_images=4
-)
-# Save images
-for i, img in enumerate(images):
-    img.save(f"output_{i}.png")
 ```
-## Training Details
-- **Base Model**: DiT-XL/2 pretrained on ImageNet
-- **Batch Size**: 16 per expert
-- **Learning Rate**: 2e-05
-- **Image Size**: 256x256 (32x32 latent space)
-- **VAE**: SD VAE (8x downsampling)
-- **Text Encoder**: CLIP ViT-L/14
-- **EMA**: True
-- **Mixed Precision**: True
-### Multi-Expert Architecture
-Each expert specializes in different visual styles/content through dynamic routing:
-- The router network analyzes the noisy latent and timestep
-- Selects the most appropriate expert for denoising
-- Enables better quality and diversity compared to single models
-## Examples
-Coming soon! Check back for generated examples.
-## Limitations
-- Trained on LAION-Aesthetic which may contain biases
-- Best results at 256x256 resolution
-- Requires GPU for inference (8GB+ VRAM recommended)
-## Citation
 ```bibtex
-@misc{paris,
-  author = {Your Name},
-  title = {Paris: Multi-Expert Diffusion Model},
-  year = {2024},
-  publisher = {HuggingFace},
-  url = {https://huggingface.co/paris}
 }
 ```
-## License
-MIT License

 - multi-expert
 - dit
 - laion
+- distributed
+- decentralized
+- flow-matching
 ---
+<div align="center">
+<img src="bagel_labs_logo.png" alt="Bagel Labs" width="120"/>
+# Paris: A Decentralized Trained Open-Weight Diffusion Model
+<a href="https://huggingface.co/bageldotcom/paris">
+  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Like%20this-model-yellow?style=for-the-badge" alt="Like on Hugging Face">
+</a>
+<a href="https://github.com/bageldotcom/paris">
+  <img src="https://img.shields.io/github/stars/bageldotcom/paris?style=for-the-badge&logo=github&label=Star%20on%20GitHub" alt="Star on GitHub">
+</a>
+<a href="https://github.com/bageldotcom/Paris/blob/main/paper.pdf">
+  <img src="https://img.shields.io/badge/📄%20Read-Technical%20Report-red?style=for-the-badge" alt="Read Technical Report">
+</a>
+</div>
+<br>
+The world's first diffusion model trained entirely through decentralized computation. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization, achieving superior parallelism efficiency over traditional methods while using 14× less data and 16× less compute than baselines. [Read our technical report](https://github.com/bageldotcom/Paris/blob/main/paper.pdf) to learn more.
+# Key Characteristics
+- 8 independently trained expert diffusion models (605M parameters each, 4.84B total)
+- No gradient synchronization, parameter sharing, or activation exchange among nodes during training
+- Lightweight transformer router (~158M parameters) for dynamic expert selection
+- 11M LAION-Aesthetic images across 120 A40 GPU-days
+- 14× less training data than prior decentralized baselines
+- 16× less compute than prior decentralized baselines
+- Competitive generation quality (FID 12.45)
+- Open weights for research and commercial use under MIT license
+---
+# Examples
+![Paris Generation Examples](generated_images.png)
+*Text-conditioned image generation samples using Paris across diverse prompts and visual styles*
+---
+# Architecture Details
+| Component | Specification |
+|-----------|--------------|
+| **Model Scale** | DiT-XL/2 |
+| **Parameters per Expert** | 605M |
+| **Total Expert Parameters** | 4.84B (8 experts) |
+| **Router Parameters** | ~158M |
+| **Hidden Dimensions** | 1152 |
+| **Transformer Layers** | 28 |
+| **Attention Heads** | 16 |
+| **Patch Size** | 2×2 (latent space) |
+| **Latent Resolution** | 32×32×4 |
+| **Image Resolution** | 256×256 |
+| **Text Conditioning** | CLIP ViT-L/14 |
+| **VAE** | sd-vae-ft-mse (8× downsampling) |
+---
+# Training Approach
+Paris implements fully decentralized training where:
+- Each expert trains independently on a semantically coherent data partition (DINOv2-based clustering)
+- No gradient synchronization, parameter sharing, or activation exchange between experts during training
+- Experts trained asynchronously across AWS, GCP, local clusters, and Runpod instances at different speeds
+- Router trained post-hoc on full dataset for expert selection during inference
+- Complete computational independence eliminates requirements for specialized interconnects (InfiniBand, NVLink)
+![Training Architecture](training_architecture.png)
+*Paris training phase showing complete asynchronous isolation across heterogeneous compute clusters. Unlike traditional parallelization strategies (Data/Pipeline/Model Parallelism), Paris requires zero communication during training.*
+This zero-communication approach enables training on fragmented compute resources without specialized interconnects, eliminating the dedicated GPU cluster requirement of traditional diffusion model training.
+**Comparison with Traditional Parallelization**
+| **Strategy** | **Synchronization** | **Straggler Impact** | **Topology Requirements** |
+|--------------|---------------------|---------------------|---------------------------|
+| Data Parallel | Periodic all-reduce | Slowest worker blocks iteration | Latency-sensitive cluster |
+| Model Parallel | Sequential layer transfers | Slowest layer blocks pipeline | Linear pipeline |
+| Pipeline Parallel | Stage-to-stage per microbatch | Bubble overhead from slowest stage | Linear pipeline |
+| **Paris** | **No synchronization** | **No blocking** | **Arbitrary** |
+---
+# Usage
 ```python
+from diffusers import DiffusionPipeline
+import torch
 # Load the pipeline
+pipeline = DiffusionPipeline.from_pretrained(
+    "bageldotcom/paris",
+    torch_dtype=torch.float16,
+    use_safetensors=True
+)
+pipeline.to("cuda")
 # Generate images
 images = pipeline(
     prompt="A beautiful sunset over Paris, oil painting style",
     num_inference_steps=50,
     guidance_scale=7.5,
+    height=256,
+    width=256
+).images
+images[0].save("output.png")
 ```
+### Routing Strategies
+- **`top-1`** (default): Single best expert per step. Fastest inference, competitive quality.
+- **`top-2`**: Weighted ensemble of top-2 experts. Often best quality, 2× inference cost.
+- **`full-ensemble`**: All 8 experts weighted by router. Highest compute (8× cost).
+---
+# Performance Metrics
+**Multi-Expert vs. Monolithic on LAION-Art (DiT-B/2)**
+| **Inference Strategy** | **FID-50K ↓** |
+|------------------------|---------------|
+| Monolithic (single model) | 29.64 |
+| Paris Top-1 | 30.60 |
+| **Paris Top-2** | **22.60** |
+| Paris Full Ensemble | 47.89 |
+*Top-2 routing achieves 7.04 FID improvement over monolithic baseline, validating that targeted expert collaboration outperforms both single models and naive ensemble averaging.*
+---
+# Training Details
+**Hyperparameters (DiT-XL/2)**
+| **Parameter** | **Value** |
+|---------------|-----------|
+| Dataset | LAION-Aesthetic (11M images) |
+| Clustering | DINOv2 semantic features |
+| Batch Size | 16 per expert (effective 32 with 2-step accumulation) |
+| Learning Rate | 2e-5 (AdamW, no scheduling) |
+| Training Steps | ~120k total across experts (asynchronous) |
+| EMA Decay | 0.9999 |
+| Mixed Precision | FP16 with automatic loss scaling |
+| Initialization | ImageNet-pretrained DiT-XL/2 |
+| Conditioning | AdaLN-Single (23% parameter reduction) |
+**Router Training**
+| **Parameter** | **Value** |
+|---------------|-----------|
+| Architecture | DiT-B (smaller than experts) |
+| Batch Size | 64 with 4-step accumulation (effective 256) |
+| Learning Rate | 5e-5 with cosine annealing (25 epochs) |
+| Loss | Cross-entropy on cluster assignments |
+| Training | Post-hoc on full dataset |
+---
+# Citation
 ```bibtex
+@misc{paris2025,
+  title={Paris: A Decentralized Trained Open-Weight Diffusion Model},
+  author={Jiang, Zhiying and Seraj, Raihan and Villagra, Marcos and Roy, Bidhan},
+  year={2025},
+  publisher={Bagel Labs},
+  url={https://huggingface.co/bageldotcom/paris}
 }
 ```
+---
+# License
+MIT License – Open for research and commercial use.
+<div align="center">
+Made with ❤️ by [Bagel Labs](https://bagel.com)
+</div>

bagel_labs_logo.png ADDED Viewed

Git LFS Details

SHA256: 05c05aac89f9eab593cda699e6e35ccc6461ef9c7c58316ae36533f545309042
Pointer size: 130 Bytes
Size of remote file: 90.6 kB

generated_images.png ADDED Viewed

Git LFS Details

SHA256: d1ccbd5a554519c8c3e0356afc2b14637cd43aaee3d5b96877ce1b2fc8856c93
Pointer size: 132 Bytes
Size of remote file: 2.46 MB

training_architecture.png ADDED Viewed

Git LFS Details

SHA256: a8ff5f814d3ae1688ba31b2cee77135c0401d46574f480745d17856fb3da467f
Pointer size: 131 Bytes
Size of remote file: 462 kB