latent diffusion – Unet: Sizes of tensors must match except in dimension 1
I’m training latent diffusion with audio encoding of shape batch 16 * channel 256 * n_frame 501 * n_frequency 6.
I’m training latent diffusion with audio encoding of shape batch 16 * channel 256 * n_frame 501 * n_frequency 6.