Key-Locked Rank One Editing for Text-to-Image Personalizationhttps://research.nvidia.com/labs/par/Perfusion/
Perfusion is a text-to-image personalization method that creatively portrays personalized objects with significant changes in appearance while maintaining their identity. It introduces a novel mechanism called "Key-Locking" to maintain high visual fidelity and allow creative control, combine personalized concepts, and keep a small model size. Perfusion achieves this through dynamic rank-1 updates to the underlying text-to-image model and locks new concepts' cross-attention Keys to their superordinate category. It also uses a gated rank-1 approach to control the influence of learned concepts during inference and combine multiple concepts, enabling a trade-off between visual fidelity and textual alignment. With just a 100KB trained model, Perfusion covers different operating points across the Pareto front without additional training. It outperforms strong baselines in both qualitative and quantitative terms, enabling personalization even in one-shot settings and displaying personalized object interactions in novel ways. The method also allows for efficient control of visual-textual alignment and demonstrates different variations of key-locking. Additionally, Perfusion concepts trained with a vanilla diffusion model can generalize to fine-tuned variants.
Pytorch implementation : https://github.com/lucidrains/perfusion-pytorch