Controllable Layered Image Generation for Real-World Editing

1UC Santa Cruz, 2Adobe Research

* This work was done when Jinrui Yang was a research intern at Adobe Research.


Overview of LASAGNA

alt text


LASAGNA supports three generation modes: background-conditioned foreground generation, foreground-conditioned background generation, and text-to-all layer generation, which flexibly handle different inputs and jointly synthesize coherent, high-quality composites, backgrounds, and transparent foregrounds with realistic visual effects.


LASAGNA Framework

alt text


We formulate the joint generation of composite images, backgrounds, and foregrounds as a flexible, layer-conditional denoising task. This single framework supports multiple workflows, including FG_Gen, BG_Gen, and Text2All. We use a unified input representation with learnable embeddings that distinguish different roles of visual latents (noise, BG, FG, and mask) across tasks, enabling the model to adapt its behavior under various generation settings. This allows a single attention-based model to flexibly process varied combinations of inputs and targets simultaneously.


Data Construction Pipeline

alt text


Starting with existing datasets, we implement a four-stage data construction pipeline leveraging off-the-shelf models with a custom-trained data curator. This yields a high-quality dataset as the foundation for subsequent model training.


Samples of \dataset{} and \benchmark{}

alt text


Each row shows a composite image, a clean background, and a foreground layer with visual effects, along with corresponding captions for all components.


Experimental Results


LASAGNA vs General Models

alt text


We compare LASAGNA with Flux.1 [1], Qwen-Image-Edit [2], and gpt-image-1[High] [3]. (a) Across three distinct generation tasks, LASAGNA consistently achieves superior inter-layer coherence and consistency. In contrast, competing models often fail to maintain these properties. (b) Moreover, by generating foregrounds with faithfully preserved visual effects, LASAGNA enables diverse post-generation editing operations on individual layers directly—a capability not supported by existing models.


LASAGNA vs Expert Model

alt text


We compare LASAGNA with LayerDiffuse [4]. In FG_Gen, our model produces object with appropriate size and position, along with realistic shadows consistent with the background. In BG_Gen and Text2All, our model produces visually consistent results across all layers. Furthermore, it can generate new foregrounds with corresponding visual effects, enabling flexible and realistic post-editing.


Layer Editing with Visual Effects

alt text


We demonstrate the benefits of explicit layer representations with visual effects by comparing three paradigms: Instruct Editing, Layer Editing, and Layer Editing with Visual Effects (LASAGNA). Across recoloring, spatial, and compositional editing tasks, the lack of explicit layer representations makes Instruct Editing prone to unintended changes and less responsive to spatial instructions, while Layer Editing with Visual Effects yields more coherent and photorealistic results.


Creative Applications Powered by LASAGNA

alt text


Diverse creative applications driven by our model. We leverage both Text2All and FG_Gen modes to jointly guide the synthesis, unlocking a broader range of editing possibilities and producing diverse, visually appealing results.


More Visualization Results


Background-conditioned Foreground Generation

alt text


Foreground-conditioned Background Generation

alt text


References

  1. [1] Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space. arXiv preprint arXiv:2506.15742, 2025.
  2. [2] Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report. arXiv preprint arXiv:2508.02324, 2025.
  3. [3] OpenAI. Gpt image 1. https://platform.openai.com/docs/models/gpt-image-1, 2025. Accessed: 2025-11-13.
  4. [4] Lvmin Zhang and Maneesh Agrawala. Transparent image layer diffusion using latent transparency. arXiv preprint arXiv:2402.17113, 2024.