Composer: Creative and Controllable Image Synthesis with Composable Conditions

Composer: Creative and Controllable Image Synthesis with Composable Conditionshttps://damo-vilab.github.io/composer-page/

Composer introduces a generative model that addresses the limited controllability of existing large-scale models. By decomposing images into representative factors and utilizing a diffusion model to recompose them with these factors as conditions, Composer allows for flexible control over the output image while maintaining quality and creativity. It supports various levels of conditions, such as text description, depth map, sketch, and color histogram. Furthermore, Composer serves as a general framework for classical generative tasks without requiring retraining. The composition results demonstrate the ability to combine elements like text and depth, masked image and text, sketch, depth, and embedding, and more. Additionally, the model facilitates image variations, interpolations, reconfigurations, and region-specific editing, as well as tasks like image translation, style transfer, pose transfer, and virtual try-on.