I am using the Vision Transformer as part of the CLIP model and I keep getting the following warning:
..site-packagestorchnnfunctional.py:5504: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..atensrcATennativetransformerscudasdp_utils.cpp:455.)
attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)
The code works, but I’m guessing that it’s not as fast as possible since there’s no FA.
I have tried running the ViT while trying to force FA using: with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.FLASH_ATTENTION):
and still got the same warning.
For reference, I’m using Windows 11 with Python 3.11.9 and torch 2.3.1+cu121.