Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sayakpaul 
posted an update Jul 24
Post
1957
Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast

Raspbarry_Tensorflow_Robot
Raspberry Pi intelligent recognition robot based on Tensorflow.

Implement function

1.Use the Raspberry Pi GPIO port to control the forward, backward, left, right, and right movements of the robot and the movement of the robotic arm, and control the up and down movement of the camera through the stepper motor. (With non-blocking input)

2.Use Tensorflow for image recognition.

3.Use the RPI Cam web interface to transmit video information to the web end in real time.

4.Select the most likely content among the recognized content for voice output.

·

Want l this built perfect