Available on x86-64 and target feature 
avx512vpclmulqdq,avx512f only.Expand description
Performs a carry-less multiplication of two 64-bit polynomials over the finite field GF(2^k) - in each of the 4 128-bit lanes.
The immediate byte is used for determining which halves of each lane a and b
should be used. Immediate bits other than 0 and 4 are ignored.
All lanes share immediate byte.