ZP_Test

Team
community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

sayakpaulΒ 
posted an update 6 months ago
view post
Post
2071
Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🀯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community πŸ€—

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast
  • 3 replies
Β·
sayakpaulΒ 
posted an update 8 months ago
view post
Post
2925
Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.

So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.

Give it a go here:
https://lnkd.in/gf8Pi4-2
  • 2 replies
Β·
sayakpaulΒ 
posted an update 8 months ago
view post
Post
1864
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code β™₯️

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
MrDragonFoxΒ 
posted an update 9 months ago
view post
Post
4523
as a few of you know - i am working on a rather more elaborate-tts that can produce more interesting sounds in context of rp

early sneak peak is here -

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress

should be able to express fine even in a sfw context

probably the last release for a few weeks as i go back to the data pipeline and improve there ..

in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !

(small colab is provided on the model page if you dont have the gpu to run that your self)
MrDragonFoxΒ 
posted an update 9 months ago
view post
Post
5599
yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format