cahlen
/

acid-ansi-lora

stable-diffusion-xl

Model card Files Files and versions

acid-ansi-lora / README.md

cahlen's picture

Upload folder using huggingface_hub

1cb24db verified about 2 months ago

|

history blame contribute delete

3.07 kB

	---
	library_name: diffusers
	license: openrail++
	base_model: stabilityai/stable-diffusion-xl-base-1.0
	tags:
	- stable-diffusion-xl
	- lora
	- text-to-image
	- ansi-art
	- pixel-art
	- retro
	- bbs
	pipeline_tag: text-to-image
	widget:
	- text: "acid-ansi-style, a menacing skull wreathed in flames on a black background"
	output:
	url: images/skull.png
	---

	# ACID ANSI LoRA

	A LoRA fine-tune of Stable Diffusion XL 1.0 that generates images in the style of classic ANSI art from the BBS era (1990s). Trained on rendered ANSI/RIP art from the ACiD Productions art packs.

	![Skull example](images/skull.png)

	## Usage

	```python
	from diffusers import StableDiffusionXLPipeline
	import torch

	pipe = StableDiffusionXLPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.float16,
	).to("cuda")

	pipe.load_lora_weights("cahlen/acid-ansi-lora")

	prompt = "acid-ansi-style, a menacing skull wreathed in flames on a black background"
	negative_prompt = "blurry, photorealistic, photo, smooth gradients, 3d render, watermark"

	image = pipe(
	prompt,
	negative_prompt=negative_prompt,
	num_inference_steps=30,
	guidance_scale=7.5,
	cross_attention_kwargs={"scale": 0.7},
	).images[0]

	image.save("output.png")
	```

	## Trigger Word

	`acid-ansi-style` — prepend this to your prompts.

	## Recommended Settings

	\| Parameter \| Value \|
	\|---\|---\|
	\| Inference steps \| 30 \|
	\| Guidance scale \| 7.5 \|
	\| LoRA scale \| 0.6 - 0.8 (0.7 recommended) \|
	\| Negative prompt \| blurry, photorealistic, photo, smooth gradients, 3d render, watermark \|

	### LoRA Scale

	The LoRA scale controls style strength at inference time. Lower values (0.5-0.6) preserve more of the base model's composition while higher values (0.8-1.0) push harder into the ANSI style but may cause repetition artifacts. 0.7 is the recommended default.

	## Samples

	All samples generated at step 3500, LoRA scale 0.7, seed 42.

	\| Skull \| Dragon \|
	\|---\|---\|
	\| ![skull](images/skull.png) \| ![dragon](images/dragon.png) \|

	\| BBS Login \| Cityscape \|
	\|---\|---\|
	\| ![bbs](images/bbs.png) \| ![cityscape](images/cityscape.png) \|

	### Baseline Comparison (No LoRA)

	\| Without LoRA \| With LoRA (scale 0.7) \|
	\|---\|---\|
	\| ![baseline](images/baseline_skull.png) \| ![with lora](images/skull.png) \|

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| stabilityai/stable-diffusion-xl-base-1.0 \|
	\| LoRA rank \| 32 \|
	\| LoRA alpha \| 32 \|
	\| Optimizer \| Prodigy (lr=1.0, constant scheduler) \|
	\| Training steps \| 3500 \|
	\| Batch size \| 1 (gradient accumulation 4) \|
	\| Precision \| bf16 \|
	\| SNR gamma \| 5 \|
	\| Noise offset \| 0.05 \|
	\| Caption dropout \| 0.15 \|
	\| Training framework \| SimpleTuner \|

	### Training Data

	276 images derived from 50 curated ANSI/RIP art renders from ACiD Productions art packs (1993-1996):
	- Nearest-neighbor upscaling to preserve blocky pixel edges
	- Multi-crop extraction for tall/narrow ANSI art (overlapping 1440px sections)
	- VGA 16-color palette quantization as data augmentation
	- Instance prompt only (no per-image captions)