You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does the joint matrix support the similar operation ?
bmma_sync
Waits until all warp lanes have executed bmma_sync, and then performs the warp-synchronous bit matrix multiply-accumulate operation D = (A op B) + C, where op consists of a logical operation bmmaBitOp followed by the accumulation defined by bmmaAccumulateOp. The available operations are:
bmmaBitOpXOR, a 128-bit XOR of a row in matrix_a with the 128-bit column of matrix_b
bmmaBitOpAND, a 128-bit AND of a row in matrix_a with the 128-bit column of matrix_b, available on devices with compute capability 8.0 and higher.
The accumulate op is always bmmaAccumulateOpPOPC which counts the number of set bits.
The text was updated successfully, but these errors were encountered:
Does the joint matrix support the similar operation ?
bmma_sync Waits until all warp lanes have executed bmma_sync, and then performs the warp-synchronous bit matrix multiply-accumulate operation D = (A op B) + C, where op consists of a logical operation bmmaBitOp followed by the accumulation defined by bmmaAccumulateOp. The available operations are:
bmmaBitOpXOR, a 128-bit XOR of a row in matrix_a with the 128-bit column of matrix_b
bmmaBitOpAND, a 128-bit AND of a row in matrix_a with the 128-bit column of matrix_b, available on devices with compute capability 8.0 and higher.
The accumulate op is always bmmaAccumulateOpPOPC which counts the number of set bits.
There is a draft impl for bmma here that is fully working within its branch: #5363
The point however is that
the only notable hardware that so far supports bitwise mma is nvidia, and it is still experimental. Therefore a oneapi extension that must support multiple hardware is currently not possible. Note also that the XOR operator you mentioned is deprecated and unsupported on the latest Nvidia hardware.
I'm not aware that any notable libraries currently support bmma (Although i suspect there may be since the last time I investigated was a long time ago now). It has some usage but it doesn't seem to be widely adopted as of this moment.
More generally I get the impression that although bmma has been shown to work for certain practical applications, generally the question of preferred quantized data types for inference (or backprop) has not been settled.
Hence as far as it concerns us it is pretty low priority. If users wish to play with it then they can do so via the above mentioned branch, or with latest hardware via inline ptx as advised by Nvidia docs.
Does the joint matrix support the similar operation ?
bmma_sync
Waits until all warp lanes have executed bmma_sync, and then performs the warp-synchronous bit matrix multiply-accumulate operation D = (A op B) + C, where op consists of a logical operation bmmaBitOp followed by the accumulation defined by bmmaAccumulateOp. The available operations are:
bmmaBitOpXOR, a 128-bit XOR of a row in matrix_a with the 128-bit column of matrix_b
bmmaBitOpAND, a 128-bit AND of a row in matrix_a with the 128-bit column of matrix_b, available on devices with compute capability 8.0 and higher.
The accumulate op is always bmmaAccumulateOpPOPC which counts the number of set bits.
The text was updated successfully, but these errors were encountered: