Trying to use forward AD with _scaled_dot_product_flash_attention that does not support it because it has not been implemented yet.

@ezyang

As suggested in the error message, I am reporting this error so its implementation can be prioritized.

Trying to use forward AD with _scaled_dot_product_flash_attention that does not support it because it has not been implemented yet.

Trying to use forward AD with _scaled_dot_product_flash_attention_for_cpu that does not support it because it has not been implemented yet.

No response

torch.__version__: '2.3.1+cu121'

Provide feedback