Relative Content

Tag Archive for pythonencodingstable-diffusionlatent-diffusion

Replacing stable diffusion v2.1 text encoder with image encoder

I’m trying to replace the text encoder of Stable Diffusion with a corresponding image encoder, so that I can feed images instead of text. The stable diffusion hugging face documentation says that it uses pretrained text encoder from OpenCLIP ViT/H model. Since the text encoder and image encoder of CLIP share the same latent space, I can easily replace the text encoder with image encoder and the model should work fine without any further training.