While researching for a topic to fullfill my masters, I came across this paper ‘T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for
Text-to-Image Diffusion Models’
From my understanding, the author of this paper reflects that by providing aditional guidances to T2I model, the generated image is more acceptable the user.
For more details check Huggingface.co
For Code: T2i adapter code
At the end of the paper the author talks about the below limitation.
One limitation of our method is that in the case of multi-adapter
control, the combination of guidance features requires manual
adjustment.
Below is the code using MUltiAdapter. As seen in the code below, adapter_conditioning_scale=[0.8, 0.8], is manually set.
adapters = MultiAdapterExt(
[
T2IAdapter.from_pretrained("TencentARC/t2iadapter_keypose_sd14v1"),
T2IAdapter.from_pretrained("TencentARC/t2iadapter_depth_sd14v1"),
]
)
adapters = adapters.to(torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
adapter=adapters,
).to("cuda")
image = pipe(prompt, cond, ***adapter_conditioning_scale=[0.8, 0.8]***).images[0]
make_image_grid([cond_keypose, cond_depth, image], rows=1, cols=3)
My Question:
What ml technique can we use to find the best possible value for adapter_conditioning_scale?