Available on x86-64 only.
Expand description
Platform-specific intrinsics for the x86_64 platform.
See the module documentation for more details.
Structs§
- CpuidResult x86 or x86-64 Result of thecpuidinstruction.
- __m128x86 or x86-64 128-bit wide set of fourf32types, x86-specific
- __m256x86 or x86-64 256-bit wide set of eightf32types, x86-specific
- __m512x86 or x86-64 512-bit wide set of sixteenf32types, x86-specific
- __m128dx86 or x86-64 128-bit wide set of twof64types, x86-specific
- __m128ix86 or x86-64 128-bit wide integer vector type, x86-specific
- __m256dx86 or x86-64 256-bit wide set of fourf64types, x86-specific
- __m256ix86 or x86-64 256-bit wide integer vector type, x86-specific
- __m512dx86 or x86-64 512-bit wide set of eightf64types, x86-specific
- __m512ix86 or x86-64 512-bit wide integer vector type, x86-specific
- __m128bh Experimental x86 or x86-64 128-bit wide set of eightu16types, x86-specific
- __m128hExperimental x86 or x86-64 128-bit wide set of 8f16types, x86-specific
- __m256bh Experimental x86 or x86-64 256-bit wide set of 16u16types, x86-specific
- __m256hExperimental x86 or x86-64 256-bit wide set of 16f16types, x86-specific
- __m512bh Experimental x86 or x86-64 512-bit wide set of 32u16types, x86-specific
- __m512hExperimental x86 or x86-64 512-bit wide set of 32f16types, x86-specific
- bf16Experimental x86 or x86-64 The BFloat16 type used in AVX-512 intrinsics.
Constants§
- _CMP_EQ_ OQ x86 or x86-64 Equal (ordered, non-signaling)
- _CMP_EQ_ OS x86 or x86-64 Equal (ordered, signaling)
- _CMP_EQ_ UQ x86 or x86-64 Equal (unordered, non-signaling)
- _CMP_EQ_ US x86 or x86-64 Equal (unordered, signaling)
- _CMP_FALSE_ OQ x86 or x86-64 False (ordered, non-signaling)
- _CMP_FALSE_ OS x86 or x86-64 False (ordered, signaling)
- _CMP_GE_ OQ x86 or x86-64 Greater-than-or-equal (ordered, non-signaling)
- _CMP_GE_ OS x86 or x86-64 Greater-than-or-equal (ordered, signaling)
- _CMP_GT_ OQ x86 or x86-64 Greater-than (ordered, non-signaling)
- _CMP_GT_ OS x86 or x86-64 Greater-than (ordered, signaling)
- _CMP_LE_ OQ x86 or x86-64 Less-than-or-equal (ordered, non-signaling)
- _CMP_LE_ OS x86 or x86-64 Less-than-or-equal (ordered, signaling)
- _CMP_LT_ OQ x86 or x86-64 Less-than (ordered, non-signaling)
- _CMP_LT_ OS x86 or x86-64 Less-than (ordered, signaling)
- _CMP_NEQ_ OQ x86 or x86-64 Not-equal (ordered, non-signaling)
- _CMP_NEQ_ OS x86 or x86-64 Not-equal (ordered, signaling)
- _CMP_NEQ_ UQ x86 or x86-64 Not-equal (unordered, non-signaling)
- _CMP_NEQ_ US x86 or x86-64 Not-equal (unordered, signaling)
- _CMP_NGE_ UQ x86 or x86-64 Not-greater-than-or-equal (unordered, non-signaling)
- _CMP_NGE_ US x86 or x86-64 Not-greater-than-or-equal (unordered, signaling)
- _CMP_NGT_ UQ x86 or x86-64 Not-greater-than (unordered, non-signaling)
- _CMP_NGT_ US x86 or x86-64 Not-greater-than (unordered, signaling)
- _CMP_NLE_ UQ x86 or x86-64 Not-less-than-or-equal (unordered, non-signaling)
- _CMP_NLE_ US x86 or x86-64 Not-less-than-or-equal (unordered, signaling)
- _CMP_NLT_ UQ x86 or x86-64 Not-less-than (unordered, non-signaling)
- _CMP_NLT_ US x86 or x86-64 Not-less-than (unordered, signaling)
- _CMP_ORD_ Q x86 or x86-64 Ordered (non-signaling)
- _CMP_ORD_ S x86 or x86-64 Ordered (signaling)
- _CMP_TRUE_ UQ x86 or x86-64 True (unordered, non-signaling)
- _CMP_TRUE_ US x86 or x86-64 True (unordered, signaling)
- _CMP_UNORD_ Q x86 or x86-64 Unordered (non-signaling)
- _CMP_UNORD_ S x86 or x86-64 Unordered (signaling)
- _MM_EXCEPT_ DENORM x86 or x86-64 See_mm_setcsr
- _MM_EXCEPT_ DIV_ ZERO x86 or x86-64 See_mm_setcsr
- _MM_EXCEPT_ INEXACT x86 or x86-64 See_mm_setcsr
- _MM_EXCEPT_ INVALID x86 or x86-64 See_mm_setcsr
- _MM_EXCEPT_ MASK x86 or x86-64 
- _MM_EXCEPT_ OVERFLOW x86 or x86-64 See_mm_setcsr
- _MM_EXCEPT_ UNDERFLOW x86 or x86-64 See_mm_setcsr
- _MM_FLUSH_ ZERO_ MASK x86 or x86-64 
- _MM_FLUSH_ ZERO_ OFF x86 or x86-64 See_mm_setcsr
- _MM_FLUSH_ ZERO_ ON x86 or x86-64 See_mm_setcsr
- _MM_FROUND_ CEIL x86 or x86-64 round up and do not suppress exceptions
- _MM_FROUND_ CUR_ DIRECTION x86 or x86-64 use MXCSR.RC; seevendor::_MM_SET_ROUNDING_MODE
- _MM_FROUND_ FLOOR x86 or x86-64 round down and do not suppress exceptions
- _MM_FROUND_ NEARBYINT x86 or x86-64 use MXCSR.RC and suppress exceptions; seevendor::_MM_SET_ROUNDING_MODE
- _MM_FROUND_ NINT x86 or x86-64 round to nearest and do not suppress exceptions
- _MM_FROUND_ NO_ EXC x86 or x86-64 suppress exceptions
- _MM_FROUND_ RAISE_ EXC x86 or x86-64 do not suppress exceptions
- _MM_FROUND_ RINT x86 or x86-64 use MXCSR.RC and do not suppress exceptions; seevendor::_MM_SET_ROUNDING_MODE
- _MM_FROUND_ TO_ NEAREST_ INT x86 or x86-64 round to nearest
- _MM_FROUND_ TO_ NEG_ INF x86 or x86-64 round down
- _MM_FROUND_ TO_ POS_ INF x86 or x86-64 round up
- _MM_FROUND_ TO_ ZERO x86 or x86-64 truncate
- _MM_FROUND_ TRUNC x86 or x86-64 truncate and do not suppress exceptions
- _MM_HINT_ ET0 x86 or x86-64 See_mm_prefetch.
- _MM_HINT_ ET1 x86 or x86-64 See_mm_prefetch.
- _MM_HINT_ NTA x86 or x86-64 See_mm_prefetch.
- _MM_HINT_ T0 x86 or x86-64 See_mm_prefetch.
- _MM_HINT_ T1 x86 or x86-64 See_mm_prefetch.
- _MM_HINT_ T2 x86 or x86-64 See_mm_prefetch.
- _MM_MASK_ DENORM x86 or x86-64 See_mm_setcsr
- _MM_MASK_ DIV_ ZERO x86 or x86-64 See_mm_setcsr
- _MM_MASK_ INEXACT x86 or x86-64 See_mm_setcsr
- _MM_MASK_ INVALID x86 or x86-64 See_mm_setcsr
- _MM_MASK_ MASK x86 or x86-64 
- _MM_MASK_ OVERFLOW x86 or x86-64 See_mm_setcsr
- _MM_MASK_ UNDERFLOW x86 or x86-64 See_mm_setcsr
- _MM_ROUND_ DOWN x86 or x86-64 See_mm_setcsr
- _MM_ROUND_ MASK x86 or x86-64 
- _MM_ROUND_ NEAREST x86 or x86-64 See_mm_setcsr
- _MM_ROUND_ TOWARD_ ZERO x86 or x86-64 See_mm_setcsr
- _MM_ROUND_ UP x86 or x86-64 See_mm_setcsr
- _SIDD_BIT_ MASK x86 or x86-64 Mask only: return the bit mask
- _SIDD_CMP_ EQUAL_ ANY x86 or x86-64 For each character ina, find if it is inb(Default)
- _SIDD_CMP_ EQUAL_ EACH x86 or x86-64 The strings defined byaandbare equal
- _SIDD_CMP_ EQUAL_ ORDERED x86 or x86-64 Search for the defined substring in the target
- _SIDD_CMP_ RANGES x86 or x86-64 For each character ina, determine ifb[0] <= c <= b[1] or b[1] <= c <= b[2]...
- _SIDD_LEAST_ SIGNIFICANT x86 or x86-64 Index only: return the least significant bit (Default)
- _SIDD_MASKED_ NEGATIVE_ POLARITY x86 or x86-64 Negates results only before the end of the string
- _SIDD_MASKED_ POSITIVE_ POLARITY x86 or x86-64 Do not negate results before the end of the string
- _SIDD_MOST_ SIGNIFICANT x86 or x86-64 Index only: return the most significant bit
- _SIDD_NEGATIVE_ POLARITY x86 or x86-64 Negates results
- _SIDD_POSITIVE_ POLARITY x86 or x86-64 Do not negate results (Default)
- _SIDD_SBYTE_ OPS x86 or x86-64 String contains signed 8-bit characters
- _SIDD_SWORD_ OPS x86 or x86-64 String contains unsigned 16-bit characters
- _SIDD_UBYTE_ OPS x86 or x86-64 String contains unsigned 8-bit characters (Default)
- _SIDD_UNIT_ MASK x86 or x86-64 Mask only: return the byte mask
- _SIDD_UWORD_ OPS x86 or x86-64 String contains unsigned 16-bit characters
- _XCR_XFEATURE_ ENABLED_ MASK x86 or x86-64 XFEATURE_ENABLED_MASKforXCR
- _MM_CMPINT_ EQ Experimental x86 or x86-64 Equal
- _MM_CMPINT_ FALSE Experimental x86 or x86-64 False
- _MM_CMPINT_ LE Experimental x86 or x86-64 Less-than-or-equal
- _MM_CMPINT_ LT Experimental x86 or x86-64 Less-than
- _MM_CMPINT_ NE Experimental x86 or x86-64 Not-equal
- _MM_CMPINT_ NLE Experimental x86 or x86-64 Not less-than-or-equal
- _MM_CMPINT_ NLT Experimental x86 or x86-64 Not less-than
- _MM_CMPINT_ TRUE Experimental x86 or x86-64 True
- _MM_MANT_ NORM_ 1_ 2 Experimental x86 or x86-64 interval [1, 2)
- _MM_MANT_ NORM_ P5_ 1 Experimental x86 or x86-64 interval [0.5, 1)
- _MM_MANT_ NORM_ P5_ 2 Experimental x86 or x86-64 interval [0.5, 2)
- _MM_MANT_ NORM_ P75_ 1P5 Experimental x86 or x86-64 interval [0.75, 1.5)
- _MM_MANT_ SIGN_ NAN Experimental x86 or x86-64 DEST = NaN if sign(SRC) = 1
- _MM_MANT_ SIGN_ SRC Experimental x86 or x86-64 sign = sign(SRC)
- _MM_MANT_ SIGN_ ZERO Experimental x86 or x86-64 sign = 0
- _MM_PERM_ AAAA Experimental x86 or x86-64 
- _MM_PERM_ AAAB Experimental x86 or x86-64 
- _MM_PERM_ AAAC Experimental x86 or x86-64 
- _MM_PERM_ AAAD Experimental x86 or x86-64 
- _MM_PERM_ AABA Experimental x86 or x86-64 
- _MM_PERM_ AABB Experimental x86 or x86-64 
- _MM_PERM_ AABC Experimental x86 or x86-64 
- _MM_PERM_ AABD Experimental x86 or x86-64 
- _MM_PERM_ AACA Experimental x86 or x86-64 
- _MM_PERM_ AACB Experimental x86 or x86-64 
- _MM_PERM_ AACC Experimental x86 or x86-64 
- _MM_PERM_ AACD Experimental x86 or x86-64 
- _MM_PERM_ AADA Experimental x86 or x86-64 
- _MM_PERM_ AADB Experimental x86 or x86-64 
- _MM_PERM_ AADC Experimental x86 or x86-64 
- _MM_PERM_ AADD Experimental x86 or x86-64 
- _MM_PERM_ ABAA Experimental x86 or x86-64 
- _MM_PERM_ ABAB Experimental x86 or x86-64 
- _MM_PERM_ ABAC Experimental x86 or x86-64 
- _MM_PERM_ ABAD Experimental x86 or x86-64 
- _MM_PERM_ ABBA Experimental x86 or x86-64 
- _MM_PERM_ ABBB Experimental x86 or x86-64 
- _MM_PERM_ ABBC Experimental x86 or x86-64 
- _MM_PERM_ ABBD Experimental x86 or x86-64 
- _MM_PERM_ ABCA Experimental x86 or x86-64 
- _MM_PERM_ ABCB Experimental x86 or x86-64 
- _MM_PERM_ ABCC Experimental x86 or x86-64 
- _MM_PERM_ ABCD Experimental x86 or x86-64 
- _MM_PERM_ ABDA Experimental x86 or x86-64 
- _MM_PERM_ ABDB Experimental x86 or x86-64 
- _MM_PERM_ ABDC Experimental x86 or x86-64 
- _MM_PERM_ ABDD Experimental x86 or x86-64 
- _MM_PERM_ ACAA Experimental x86 or x86-64 
- _MM_PERM_ ACAB Experimental x86 or x86-64 
- _MM_PERM_ ACAC Experimental x86 or x86-64 
- _MM_PERM_ ACAD Experimental x86 or x86-64 
- _MM_PERM_ ACBA Experimental x86 or x86-64 
- _MM_PERM_ ACBB Experimental x86 or x86-64 
- _MM_PERM_ ACBC Experimental x86 or x86-64 
- _MM_PERM_ ACBD Experimental x86 or x86-64 
- _MM_PERM_ ACCA Experimental x86 or x86-64 
- _MM_PERM_ ACCB Experimental x86 or x86-64 
- _MM_PERM_ ACCC Experimental x86 or x86-64 
- _MM_PERM_ ACCD Experimental x86 or x86-64 
- _MM_PERM_ ACDA Experimental x86 or x86-64 
- _MM_PERM_ ACDB Experimental x86 or x86-64 
- _MM_PERM_ ACDC Experimental x86 or x86-64 
- _MM_PERM_ ACDD Experimental x86 or x86-64 
- _MM_PERM_ ADAA Experimental x86 or x86-64 
- _MM_PERM_ ADAB Experimental x86 or x86-64 
- _MM_PERM_ ADAC Experimental x86 or x86-64 
- _MM_PERM_ ADAD Experimental x86 or x86-64 
- _MM_PERM_ ADBA Experimental x86 or x86-64 
- _MM_PERM_ ADBB Experimental x86 or x86-64 
- _MM_PERM_ ADBC Experimental x86 or x86-64 
- _MM_PERM_ ADBD Experimental x86 or x86-64 
- _MM_PERM_ ADCA Experimental x86 or x86-64 
- _MM_PERM_ ADCB Experimental x86 or x86-64 
- _MM_PERM_ ADCC Experimental x86 or x86-64 
- _MM_PERM_ ADCD Experimental x86 or x86-64 
- _MM_PERM_ ADDA Experimental x86 or x86-64 
- _MM_PERM_ ADDB Experimental x86 or x86-64 
- _MM_PERM_ ADDC Experimental x86 or x86-64 
- _MM_PERM_ ADDD Experimental x86 or x86-64 
- _MM_PERM_ BAAA Experimental x86 or x86-64 
- _MM_PERM_ BAAB Experimental x86 or x86-64 
- _MM_PERM_ BAAC Experimental x86 or x86-64 
- _MM_PERM_ BAAD Experimental x86 or x86-64 
- _MM_PERM_ BABA Experimental x86 or x86-64 
- _MM_PERM_ BABB Experimental x86 or x86-64 
- _MM_PERM_ BABC Experimental x86 or x86-64 
- _MM_PERM_ BABD Experimental x86 or x86-64 
- _MM_PERM_ BACA Experimental x86 or x86-64 
- _MM_PERM_ BACB Experimental x86 or x86-64 
- _MM_PERM_ BACC Experimental x86 or x86-64 
- _MM_PERM_ BACD Experimental x86 or x86-64 
- _MM_PERM_ BADA Experimental x86 or x86-64 
- _MM_PERM_ BADB Experimental x86 or x86-64 
- _MM_PERM_ BADC Experimental x86 or x86-64 
- _MM_PERM_ BADD Experimental x86 or x86-64 
- _MM_PERM_ BBAA Experimental x86 or x86-64 
- _MM_PERM_ BBAB Experimental x86 or x86-64 
- _MM_PERM_ BBAC Experimental x86 or x86-64 
- _MM_PERM_ BBAD Experimental x86 or x86-64 
- _MM_PERM_ BBBA Experimental x86 or x86-64 
- _MM_PERM_ BBBB Experimental x86 or x86-64 
- _MM_PERM_ BBBC Experimental x86 or x86-64 
- _MM_PERM_ BBBD Experimental x86 or x86-64 
- _MM_PERM_ BBCA Experimental x86 or x86-64 
- _MM_PERM_ BBCB Experimental x86 or x86-64 
- _MM_PERM_ BBCC Experimental x86 or x86-64 
- _MM_PERM_ BBCD Experimental x86 or x86-64 
- _MM_PERM_ BBDA Experimental x86 or x86-64 
- _MM_PERM_ BBDB Experimental x86 or x86-64 
- _MM_PERM_ BBDC Experimental x86 or x86-64 
- _MM_PERM_ BBDD Experimental x86 or x86-64 
- _MM_PERM_ BCAA Experimental x86 or x86-64 
- _MM_PERM_ BCAB Experimental x86 or x86-64 
- _MM_PERM_ BCAC Experimental x86 or x86-64 
- _MM_PERM_ BCAD Experimental x86 or x86-64 
- _MM_PERM_ BCBA Experimental x86 or x86-64 
- _MM_PERM_ BCBB Experimental x86 or x86-64 
- _MM_PERM_ BCBC Experimental x86 or x86-64 
- _MM_PERM_ BCBD Experimental x86 or x86-64 
- _MM_PERM_ BCCA Experimental x86 or x86-64 
- _MM_PERM_ BCCB Experimental x86 or x86-64 
- _MM_PERM_ BCCC Experimental x86 or x86-64 
- _MM_PERM_ BCCD Experimental x86 or x86-64 
- _MM_PERM_ BCDA Experimental x86 or x86-64 
- _MM_PERM_ BCDB Experimental x86 or x86-64 
- _MM_PERM_ BCDC Experimental x86 or x86-64 
- _MM_PERM_ BCDD Experimental x86 or x86-64 
- _MM_PERM_ BDAA Experimental x86 or x86-64 
- _MM_PERM_ BDAB Experimental x86 or x86-64 
- _MM_PERM_ BDAC Experimental x86 or x86-64 
- _MM_PERM_ BDAD Experimental x86 or x86-64 
- _MM_PERM_ BDBA Experimental x86 or x86-64 
- _MM_PERM_ BDBB Experimental x86 or x86-64 
- _MM_PERM_ BDBC Experimental x86 or x86-64 
- _MM_PERM_ BDBD Experimental x86 or x86-64 
- _MM_PERM_ BDCA Experimental x86 or x86-64 
- _MM_PERM_ BDCB Experimental x86 or x86-64 
- _MM_PERM_ BDCC Experimental x86 or x86-64 
- _MM_PERM_ BDCD Experimental x86 or x86-64 
- _MM_PERM_ BDDA Experimental x86 or x86-64 
- _MM_PERM_ BDDB Experimental x86 or x86-64 
- _MM_PERM_ BDDC Experimental x86 or x86-64 
- _MM_PERM_ BDDD Experimental x86 or x86-64 
- _MM_PERM_ CAAA Experimental x86 or x86-64 
- _MM_PERM_ CAAB Experimental x86 or x86-64 
- _MM_PERM_ CAAC Experimental x86 or x86-64 
- _MM_PERM_ CAAD Experimental x86 or x86-64 
- _MM_PERM_ CABA Experimental x86 or x86-64 
- _MM_PERM_ CABB Experimental x86 or x86-64 
- _MM_PERM_ CABC Experimental x86 or x86-64 
- _MM_PERM_ CABD Experimental x86 or x86-64 
- _MM_PERM_ CACA Experimental x86 or x86-64 
- _MM_PERM_ CACB Experimental x86 or x86-64 
- _MM_PERM_ CACC Experimental x86 or x86-64 
- _MM_PERM_ CACD Experimental x86 or x86-64 
- _MM_PERM_ CADA Experimental x86 or x86-64 
- _MM_PERM_ CADB Experimental x86 or x86-64 
- _MM_PERM_ CADC Experimental x86 or x86-64 
- _MM_PERM_ CADD Experimental x86 or x86-64 
- _MM_PERM_ CBAA Experimental x86 or x86-64 
- _MM_PERM_ CBAB Experimental x86 or x86-64 
- _MM_PERM_ CBAC Experimental x86 or x86-64 
- _MM_PERM_ CBAD Experimental x86 or x86-64 
- _MM_PERM_ CBBA Experimental x86 or x86-64 
- _MM_PERM_ CBBB Experimental x86 or x86-64 
- _MM_PERM_ CBBC Experimental x86 or x86-64 
- _MM_PERM_ CBBD Experimental x86 or x86-64 
- _MM_PERM_ CBCA Experimental x86 or x86-64 
- _MM_PERM_ CBCB Experimental x86 or x86-64 
- _MM_PERM_ CBCC Experimental x86 or x86-64 
- _MM_PERM_ CBCD Experimental x86 or x86-64 
- _MM_PERM_ CBDA Experimental x86 or x86-64 
- _MM_PERM_ CBDB Experimental x86 or x86-64 
- _MM_PERM_ CBDC Experimental x86 or x86-64 
- _MM_PERM_ CBDD Experimental x86 or x86-64 
- _MM_PERM_ CCAA Experimental x86 or x86-64 
- _MM_PERM_ CCAB Experimental x86 or x86-64 
- _MM_PERM_ CCAC Experimental x86 or x86-64 
- _MM_PERM_ CCAD Experimental x86 or x86-64 
- _MM_PERM_ CCBA Experimental x86 or x86-64 
- _MM_PERM_ CCBB Experimental x86 or x86-64 
- _MM_PERM_ CCBC Experimental x86 or x86-64 
- _MM_PERM_ CCBD Experimental x86 or x86-64 
- _MM_PERM_ CCCA Experimental x86 or x86-64 
- _MM_PERM_ CCCB Experimental x86 or x86-64 
- _MM_PERM_ CCCC Experimental x86 or x86-64 
- _MM_PERM_ CCCD Experimental x86 or x86-64 
- _MM_PERM_ CCDA Experimental x86 or x86-64 
- _MM_PERM_ CCDB Experimental x86 or x86-64 
- _MM_PERM_ CCDC Experimental x86 or x86-64 
- _MM_PERM_ CCDD Experimental x86 or x86-64 
- _MM_PERM_ CDAA Experimental x86 or x86-64 
- _MM_PERM_ CDAB Experimental x86 or x86-64 
- _MM_PERM_ CDAC Experimental x86 or x86-64 
- _MM_PERM_ CDAD Experimental x86 or x86-64 
- _MM_PERM_ CDBA Experimental x86 or x86-64 
- _MM_PERM_ CDBB Experimental x86 or x86-64 
- _MM_PERM_ CDBC Experimental x86 or x86-64 
- _MM_PERM_ CDBD Experimental x86 or x86-64 
- _MM_PERM_ CDCA Experimental x86 or x86-64 
- _MM_PERM_ CDCB Experimental x86 or x86-64 
- _MM_PERM_ CDCC Experimental x86 or x86-64 
- _MM_PERM_ CDCD Experimental x86 or x86-64 
- _MM_PERM_ CDDA Experimental x86 or x86-64 
- _MM_PERM_ CDDB Experimental x86 or x86-64 
- _MM_PERM_ CDDC Experimental x86 or x86-64 
- _MM_PERM_ CDDD Experimental x86 or x86-64 
- _MM_PERM_ DAAA Experimental x86 or x86-64 
- _MM_PERM_ DAAB Experimental x86 or x86-64 
- _MM_PERM_ DAAC Experimental x86 or x86-64 
- _MM_PERM_ DAAD Experimental x86 or x86-64 
- _MM_PERM_ DABA Experimental x86 or x86-64 
- _MM_PERM_ DABB Experimental x86 or x86-64 
- _MM_PERM_ DABC Experimental x86 or x86-64 
- _MM_PERM_ DABD Experimental x86 or x86-64 
- _MM_PERM_ DACA Experimental x86 or x86-64 
- _MM_PERM_ DACB Experimental x86 or x86-64 
- _MM_PERM_ DACC Experimental x86 or x86-64 
- _MM_PERM_ DACD Experimental x86 or x86-64 
- _MM_PERM_ DADA Experimental x86 or x86-64 
- _MM_PERM_ DADB Experimental x86 or x86-64 
- _MM_PERM_ DADC Experimental x86 or x86-64 
- _MM_PERM_ DADD Experimental x86 or x86-64 
- _MM_PERM_ DBAA Experimental x86 or x86-64 
- _MM_PERM_ DBAB Experimental x86 or x86-64 
- _MM_PERM_ DBAC Experimental x86 or x86-64 
- _MM_PERM_ DBAD Experimental x86 or x86-64 
- _MM_PERM_ DBBA Experimental x86 or x86-64 
- _MM_PERM_ DBBB Experimental x86 or x86-64 
- _MM_PERM_ DBBC Experimental x86 or x86-64 
- _MM_PERM_ DBBD Experimental x86 or x86-64 
- _MM_PERM_ DBCA Experimental x86 or x86-64 
- _MM_PERM_ DBCB Experimental x86 or x86-64 
- _MM_PERM_ DBCC Experimental x86 or x86-64 
- _MM_PERM_ DBCD Experimental x86 or x86-64 
- _MM_PERM_ DBDA Experimental x86 or x86-64 
- _MM_PERM_ DBDB Experimental x86 or x86-64 
- _MM_PERM_ DBDC Experimental x86 or x86-64 
- _MM_PERM_ DBDD Experimental x86 or x86-64 
- _MM_PERM_ DCAA Experimental x86 or x86-64 
- _MM_PERM_ DCAB Experimental x86 or x86-64 
- _MM_PERM_ DCAC Experimental x86 or x86-64 
- _MM_PERM_ DCAD Experimental x86 or x86-64 
- _MM_PERM_ DCBA Experimental x86 or x86-64 
- _MM_PERM_ DCBB Experimental x86 or x86-64 
- _MM_PERM_ DCBC Experimental x86 or x86-64 
- _MM_PERM_ DCBD Experimental x86 or x86-64 
- _MM_PERM_ DCCA Experimental x86 or x86-64 
- _MM_PERM_ DCCB Experimental x86 or x86-64 
- _MM_PERM_ DCCC Experimental x86 or x86-64 
- _MM_PERM_ DCCD Experimental x86 or x86-64 
- _MM_PERM_ DCDA Experimental x86 or x86-64 
- _MM_PERM_ DCDB Experimental x86 or x86-64 
- _MM_PERM_ DCDC Experimental x86 or x86-64 
- _MM_PERM_ DCDD Experimental x86 or x86-64 
- _MM_PERM_ DDAA Experimental x86 or x86-64 
- _MM_PERM_ DDAB Experimental x86 or x86-64 
- _MM_PERM_ DDAC Experimental x86 or x86-64 
- _MM_PERM_ DDAD Experimental x86 or x86-64 
- _MM_PERM_ DDBA Experimental x86 or x86-64 
- _MM_PERM_ DDBB Experimental x86 or x86-64 
- _MM_PERM_ DDBC Experimental x86 or x86-64 
- _MM_PERM_ DDBD Experimental x86 or x86-64 
- _MM_PERM_ DDCA Experimental x86 or x86-64 
- _MM_PERM_ DDCB Experimental x86 or x86-64 
- _MM_PERM_ DDCC Experimental x86 or x86-64 
- _MM_PERM_ DDCD Experimental x86 or x86-64 
- _MM_PERM_ DDDA Experimental x86 or x86-64 
- _MM_PERM_ DDDB Experimental x86 or x86-64 
- _MM_PERM_ DDDC Experimental x86 or x86-64 
- _MM_PERM_ DDDD Experimental x86 or x86-64 
- _XABORT_CAPACITY Experimental x86 or x86-64 Transaction abort due to the transaction using too much memory.
- _XABORT_CONFLICT Experimental x86 or x86-64 Transaction abort due to a memory conflict with another thread.
- _XABORT_DEBUG Experimental x86 or x86-64 Transaction abort due to a debug trap.
- _XABORT_EXPLICIT Experimental x86 or x86-64 Transaction explicitly aborted with xabort. The parameter passed to xabort is available with_xabort_code(status).
- _XABORT_NESTED Experimental x86 or x86-64 Transaction abort in a inner nested transaction.
- _XABORT_RETRY Experimental x86 or x86-64 Transaction retry is possible.
- _XBEGIN_STARTED Experimental x86 or x86-64 Transaction successfully started.
Functions§
- _MM_GET_ ⚠EXCEPTION_ MASK Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_GET_ ⚠EXCEPTION_ STATE Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_GET_ ⚠FLUSH_ ZERO_ MODE Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_GET_ ⚠ROUNDING_ MODE Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_SET_ ⚠EXCEPTION_ MASK Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_SET_ ⚠EXCEPTION_ STATE Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_SET_ ⚠FLUSH_ ZERO_ MODE Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_SET_ ⚠ROUNDING_ MODE Deprecated (x86 or x86-64) and sseSee_mm_setcsr
- _MM_TRANSPOS ⚠E4_ PS (x86 or x86-64) and sseTranspose the 4x4 matrix formed by 4 rows of __m128 in place.
- __cpuid⚠x86 or x86-64 See__cpuid_count.
- __cpuid_ ⚠count x86 or x86-64 Returns the result of thecpuidinstruction for a givenleaf(EAX) andsub_leaf(ECX).
- __get_ ⚠cpuid_ max x86 or x86-64 Returns the highest-supportedleaf(EAX) and sub-leaf (ECX)cpuidvalues.
- __rdtscp ⚠x86 or x86-64 Reads the current value of the processor’s time-stamp counter and theIA32_TSC_AUX MSR.
- _addcarry_u32 ⚠x86 or x86-64 Adds unsigned 32-bit integersaandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 32-bit result inout, and the carry-out is returned (carry or overflow flag).
- Adds unsigned 64-bit integersaandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 64-bit result inout, and the carry-out is returned (carry or overflow flag).
- _addcarryx_u32 ⚠(x86 or x86-64) and adxAdds unsigned 32-bit integersaandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 32-bit result inout, and the carry-out is returned (carry or overflow flag).
- _addcarryx_u64 ⚠adxAdds unsigned 64-bit integersaandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 64-bit result inout, and the carry-out is returned (carry or overflow flag).
- _andn_u32 ⚠(x86 or x86-64) and bmi1Bitwise logicalANDof invertedawithb.
- _andn_u64 ⚠bmi1Bitwise logicalANDof invertedawithb.
- _bextr2_u32 ⚠(x86 or x86-64) and bmi1Extracts bits ofaspecified bycontrolinto the least significant bits of the result.
- _bextr2_u64 ⚠bmi1Extracts bits ofaspecified bycontrolinto the least significant bits of the result.
- _bextr_u32 ⚠(x86 or x86-64) and bmi1Extracts bits in range [start,start+length) fromainto the least significant bits of the result.
- _bextr_u64 ⚠bmi1Extracts bits in range [start,start+length) fromainto the least significant bits of the result.
- _bextri_u32 ⚠(x86 or x86-64) and tbmExtracts bits ofaspecified bycontrolinto the least significant bits of the result.
- _bextri_u64 ⚠tbmExtracts bits ofaspecified bycontrolinto the least significant bits of the result.
- _bittest⚠x86 or x86-64 Returns the bit in positionbof the memory addressed byp.
- Returns the bit in positionbof the memory addressed byp.
- _bittestandcomplement⚠x86 or x86-64 Returns the bit in positionbof the memory addressed byp, then inverts that bit.
- Returns the bit in positionbof the memory addressed byp, then inverts that bit.
- _bittestandreset⚠x86 or x86-64 Returns the bit in positionbof the memory addressed byp, then resets that bit to0.
- Returns the bit in positionbof the memory addressed byp, then resets that bit to0.
- _bittestandset⚠x86 or x86-64 Returns the bit in positionbof the memory addressed byp, then sets the bit to1.
- Returns the bit in positionbof the memory addressed byp, then sets the bit to1.
- _blcfill_u32 ⚠(x86 or x86-64) and tbmClears all bits below the least significant zero bit ofx.
- _blcfill_u64 ⚠tbmClears all bits below the least significant zero bit ofx.
- _blci_u32 ⚠(x86 or x86-64) and tbmSets all bits ofxto 1 except for the least significant zero bit.
- _blci_u64 ⚠tbmSets all bits ofxto 1 except for the least significant zero bit.
- _blcic_u32 ⚠(x86 or x86-64) and tbmSets the least significant zero bit ofxand clears all other bits.
- _blcic_u64 ⚠tbmSets the least significant zero bit ofxand clears all other bits.
- _blcmsk_u32 ⚠(x86 or x86-64) and tbmSets the least significant zero bit ofxand clears all bits above that bit.
- _blcmsk_u64 ⚠tbmSets the least significant zero bit ofxand clears all bits above that bit.
- _blcs_u32 ⚠(x86 or x86-64) and tbmSets the least significant zero bit ofx.
- _blcs_u64 ⚠tbmSets the least significant zero bit ofx.
- _blsfill_u32 ⚠(x86 or x86-64) and tbmSets all bits ofxbelow the least significant one.
- _blsfill_u64 ⚠tbmSets all bits ofxbelow the least significant one.
- _blsi_u32 ⚠(x86 or x86-64) and bmi1Extracts lowest set isolated bit.
- _blsi_u64 ⚠bmi1Extracts lowest set isolated bit.
- _blsic_u32 ⚠(x86 or x86-64) and tbmClears least significant bit and sets all other bits.
- _blsic_u64 ⚠tbmClears least significant bit and sets all other bits.
- _blsmsk_u32 ⚠(x86 or x86-64) and bmi1Gets mask up to lowest set bit.
- _blsmsk_u64 ⚠bmi1Gets mask up to lowest set bit.
- _blsr_u32 ⚠(x86 or x86-64) and bmi1Resets the lowest set bit ofx.
- _blsr_u64 ⚠bmi1Resets the lowest set bit ofx.
- _bswap⚠x86 or x86-64 Returns an integer with the reversed byte order of x
- Returns an integer with the reversed byte order of x
- _bzhi_u32 ⚠(x86 or x86-64) and bmi2Zeroes higher bits ofa>=index.
- _bzhi_u64 ⚠bmi2Zeroes higher bits ofa>=index.
- _fxrstor⚠(x86 or x86-64) and fxsrRestores theXMM,MMX,MXCSR, andx87FPU registers from the 512-byte-long 16-byte-aligned memory regionmem_addr.
- _fxrstor64⚠fxsrRestores theXMM,MMX,MXCSR, andx87FPU registers from the 512-byte-long 16-byte-aligned memory regionmem_addr.
- _fxsave⚠(x86 or x86-64) and fxsrSaves thex87FPU,MMXtechnology,XMM, andMXCSRregisters to the 512-byte-long 16-byte-aligned memory regionmem_addr.
- _fxsave64⚠fxsrSaves thex87FPU,MMXtechnology,XMM, andMXCSRregisters to the 512-byte-long 16-byte-aligned memory regionmem_addr.
- _lzcnt_u32 ⚠(x86 or x86-64) and lzcntCounts the leading most significant zero bits.
- _lzcnt_u64 ⚠lzcntCounts the leading most significant zero bits.
- _mm256_abs_ ⚠epi8 (x86 or x86-64) and avx2Computes the absolute values of packed 8-bit integers ina.
- _mm256_abs_ ⚠epi16 (x86 or x86-64) and avx2Computes the absolute values of packed 16-bit integers ina.
- _mm256_abs_ ⚠epi32 (x86 or x86-64) and avx2Computes the absolute values of packed 32-bit integers ina.
- _mm256_add_ ⚠epi8 (x86 or x86-64) and avx2Adds packed 8-bit integers inaandb.
- _mm256_add_ ⚠epi16 (x86 or x86-64) and avx2Adds packed 16-bit integers inaandb.
- _mm256_add_ ⚠epi32 (x86 or x86-64) and avx2Adds packed 32-bit integers inaandb.
- _mm256_add_ ⚠epi64 (x86 or x86-64) and avx2Adds packed 64-bit integers inaandb.
- _mm256_add_ ⚠pd (x86 or x86-64) and avxAdds packed double-precision (64-bit) floating-point elements inaandb.
- _mm256_add_ ⚠ps (x86 or x86-64) and avxAdds packed single-precision (32-bit) floating-point elements inaandb.
- _mm256_adds_ ⚠epi8 (x86 or x86-64) and avx2Adds packed 8-bit integers inaandbusing saturation.
- _mm256_adds_ ⚠epi16 (x86 or x86-64) and avx2Adds packed 16-bit integers inaandbusing saturation.
- _mm256_adds_ ⚠epu8 (x86 or x86-64) and avx2Adds packed unsigned 8-bit integers inaandbusing saturation.
- _mm256_adds_ ⚠epu16 (x86 or x86-64) and avx2Adds packed unsigned 16-bit integers inaandbusing saturation.
- _mm256_addsub_ ⚠pd (x86 or x86-64) and avxAlternatively adds and subtracts packed double-precision (64-bit) floating-point elements inato/from packed elements inb.
- _mm256_addsub_ ⚠ps (x86 or x86-64) and avxAlternatively adds and subtracts packed single-precision (32-bit) floating-point elements inato/from packed elements inb.
- _mm256_alignr_ ⚠epi8 (x86 or x86-64) and avx2Concatenates pairs of 16-byte blocks inaandbinto a 32-byte temporary result, shifts the result right bynbytes, and returns the low 16 bytes.
- _mm256_and_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of a packed double-precision (64-bit) floating-point elements inaandb.
- _mm256_and_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of packed single-precision (32-bit) floating-point elements inaandb.
- _mm256_and_ ⚠si256 (x86 or x86-64) and avx2Computes the bitwise AND of 256 bits (representing integer data) inaandb.
- _mm256_andnot_ ⚠pd (x86 or x86-64) and avxComputes the bitwise NOT of packed double-precision (64-bit) floating-point elements ina, and then AND withb.
- _mm256_andnot_ ⚠ps (x86 or x86-64) and avxComputes the bitwise NOT of packed single-precision (32-bit) floating-point elements inaand then AND withb.
- _mm256_andnot_ ⚠si256 (x86 or x86-64) and avx2Computes the bitwise NOT of 256 bits (representing integer data) inaand then AND withb.
- _mm256_avg_ ⚠epu8 (x86 or x86-64) and avx2Averages packed unsigned 8-bit integers inaandb.
- _mm256_avg_ ⚠epu16 (x86 or x86-64) and avx2Averages packed unsigned 16-bit integers inaandb.
- _mm256_blend_ ⚠epi16 (x86 or x86-64) and avx2Blends packed 16-bit integers fromaandbusing control maskIMM8.
- _mm256_blend_ ⚠epi32 (x86 or x86-64) and avx2Blends packed 32-bit integers fromaandbusing control maskIMM8.
- _mm256_blend_ ⚠pd (x86 or x86-64) and avxBlends packed double-precision (64-bit) floating-point elements fromaandbusing control maskimm8.
- _mm256_blend_ ⚠ps (x86 or x86-64) and avxBlends packed single-precision (32-bit) floating-point elements fromaandbusing control maskimm8.
- _mm256_blendv_ ⚠epi8 (x86 or x86-64) and avx2Blends packed 8-bit integers fromaandbusingmask.
- _mm256_blendv_ ⚠pd (x86 or x86-64) and avxBlends packed double-precision (64-bit) floating-point elements fromaandbusingcas a mask.
- _mm256_blendv_ ⚠ps (x86 or x86-64) and avxBlends packed single-precision (32-bit) floating-point elements fromaandbusingcas a mask.
- _mm256_broadcast_ ⚠pd (x86 or x86-64) and avxBroadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector.
- _mm256_broadcast_ ⚠ps (x86 or x86-64) and avxBroadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector.
- _mm256_broadcast_ ⚠sd (x86 or x86-64) and avxBroadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector.
- _mm256_broadcast_ ⚠ss (x86 or x86-64) and avxBroadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- _mm256_broadcastb_ ⚠epi8 (x86 or x86-64) and avx2Broadcasts the low packed 8-bit integer fromato all elements of the 256-bit returned value.
- _mm256_broadcastd_ ⚠epi32 (x86 or x86-64) and avx2Broadcasts the low packed 32-bit integer fromato all elements of the 256-bit returned value.
- _mm256_broadcastq_ ⚠epi64 (x86 or x86-64) and avx2Broadcasts the low packed 64-bit integer fromato all elements of the 256-bit returned value.
- _mm256_broadcastsd_ ⚠pd (x86 or x86-64) and avx2Broadcasts the low double-precision (64-bit) floating-point element fromato all elements of the 256-bit returned value.
- _mm256_broadcastsi128_ ⚠si256 (x86 or x86-64) and avx2Broadcasts 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value.
- _mm256_broadcastss_ ⚠ps (x86 or x86-64) and avx2Broadcasts the low single-precision (32-bit) floating-point element fromato all elements of the 256-bit returned value.
- _mm256_broadcastw_ ⚠epi16 (x86 or x86-64) and avx2Broadcasts the low packed 16-bit integer from a to all elements of the 256-bit returned value
- _mm256_bslli_ ⚠epi128 (x86 or x86-64) and avx2Shifts 128-bit lanes inaleft byimm8bytes while shifting in zeros.
- _mm256_bsrli_ ⚠epi128 (x86 or x86-64) and avx2Shifts 128-bit lanes inaright byimm8bytes while shifting in zeros.
- _mm256_castpd128_ ⚠pd256 (x86 or x86-64) and avxCasts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
- _mm256_castpd256_ ⚠pd128 (x86 or x86-64) and avxCasts vector of type __m256d to type __m128d.
- _mm256_castpd_ ⚠ps (x86 or x86-64) and avxCast vector of type __m256d to type __m256.
- _mm256_castpd_ ⚠si256 (x86 or x86-64) and avxCasts vector of type __m256d to type __m256i.
- _mm256_castps128_ ⚠ps256 (x86 or x86-64) and avxCasts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
- _mm256_castps256_ ⚠ps128 (x86 or x86-64) and avxCasts vector of type __m256 to type __m128.
- _mm256_castps_ ⚠pd (x86 or x86-64) and avxCast vector of type __m256 to type __m256d.
- _mm256_castps_ ⚠si256 (x86 or x86-64) and avxCasts vector of type __m256 to type __m256i.
- _mm256_castsi128_ ⚠si256 (x86 or x86-64) and avxCasts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.
- _mm256_castsi256_ ⚠pd (x86 or x86-64) and avxCasts vector of type __m256i to type __m256d.
- _mm256_castsi256_ ⚠ps (x86 or x86-64) and avxCasts vector of type __m256i to type __m256.
- _mm256_castsi256_ ⚠si128 (x86 or x86-64) and avxCasts vector of type __m256i to type __m128i.
- _mm256_ceil_ ⚠pd (x86 or x86-64) and avxRounds packed double-precision (64-bit) floating point elements inatoward positive infinity.
- _mm256_ceil_ ⚠ps (x86 or x86-64) and avxRounds packed single-precision (32-bit) floating point elements inatoward positive infinity.
- _mm256_cmp_ ⚠pd (x86 or x86-64) and avxCompares packed double-precision (64-bit) floating-point elements inaandbbased on the comparison operand specified byIMM5.
- _mm256_cmp_ ⚠ps (x86 or x86-64) and avxCompares packed single-precision (32-bit) floating-point elements inaandbbased on the comparison operand specified byIMM5.
- _mm256_cmpeq_ ⚠epi8 (x86 or x86-64) and avx2Compares packed 8-bit integers inaandbfor equality.
- _mm256_cmpeq_ ⚠epi16 (x86 or x86-64) and avx2Compares packed 16-bit integers inaandbfor equality.
- _mm256_cmpeq_ ⚠epi32 (x86 or x86-64) and avx2Compares packed 32-bit integers inaandbfor equality.
- _mm256_cmpeq_ ⚠epi64 (x86 or x86-64) and avx2Compares packed 64-bit integers inaandbfor equality.
- _mm256_cmpgt_ ⚠epi8 (x86 or x86-64) and avx2Compares packed 8-bit integers inaandbfor greater-than.
- _mm256_cmpgt_ ⚠epi16 (x86 or x86-64) and avx2Compares packed 16-bit integers inaandbfor greater-than.
- _mm256_cmpgt_ ⚠epi32 (x86 or x86-64) and avx2Compares packed 32-bit integers inaandbfor greater-than.
- _mm256_cmpgt_ ⚠epi64 (x86 or x86-64) and avx2Compares packed 64-bit integers inaandbfor greater-than.
- _mm256_cvtepi8_ ⚠epi16 (x86 or x86-64) and avx2Sign-extend 8-bit integers to 16-bit integers.
- _mm256_cvtepi8_ ⚠epi32 (x86 or x86-64) and avx2Sign-extend 8-bit integers to 32-bit integers.
- _mm256_cvtepi8_ ⚠epi64 (x86 or x86-64) and avx2Sign-extend 8-bit integers to 64-bit integers.
- _mm256_cvtepi16_ ⚠epi32 (x86 or x86-64) and avx2Sign-extend 16-bit integers to 32-bit integers.
- _mm256_cvtepi16_ ⚠epi64 (x86 or x86-64) and avx2Sign-extend 16-bit integers to 64-bit integers.
- _mm256_cvtepi32_ ⚠epi64 (x86 or x86-64) and avx2Sign-extend 32-bit integers to 64-bit integers.
- _mm256_cvtepi32_ ⚠pd (x86 or x86-64) and avxConverts packed 32-bit integers inato packed double-precision (64-bit) floating-point elements.
- _mm256_cvtepi32_ ⚠ps (x86 or x86-64) and avxConverts packed 32-bit integers inato packed single-precision (32-bit) floating-point elements.
- _mm256_cvtepu8_ ⚠epi16 (x86 or x86-64) and avx2Zero-extend unsigned 8-bit integers inato 16-bit integers.
- _mm256_cvtepu8_ ⚠epi32 (x86 or x86-64) and avx2Zero-extend the lower eight unsigned 8-bit integers inato 32-bit integers. The upper eight elements ofaare unused.
- _mm256_cvtepu8_ ⚠epi64 (x86 or x86-64) and avx2Zero-extend the lower four unsigned 8-bit integers inato 64-bit integers. The upper twelve elements ofaare unused.
- _mm256_cvtepu16_ ⚠epi32 (x86 or x86-64) and avx2Zeroes extend packed unsigned 16-bit integers inato packed 32-bit integers, and stores the results indst.
- _mm256_cvtepu16_ ⚠epi64 (x86 or x86-64) and avx2Zero-extend the lower four unsigned 16-bit integers inato 64-bit integers. The upper four elements ofaare unused.
- _mm256_cvtepu32_ ⚠epi64 (x86 or x86-64) and avx2Zero-extend unsigned 32-bit integers inato 64-bit integers.
- _mm256_cvtpd_ ⚠epi32 (x86 or x86-64) and avxConverts packed double-precision (64-bit) floating-point elements inato packed 32-bit integers.
- _mm256_cvtpd_ ⚠ps (x86 or x86-64) and avxConverts packed double-precision (64-bit) floating-point elements inato packed single-precision (32-bit) floating-point elements.
- _mm256_cvtph_ ⚠ps (x86 or x86-64) and f16cConverts the 8 x 16-bit half-precision float values in the 128-bit vectorainto 8 x 32-bit float values stored in a 256-bit wide vector.
- _mm256_cvtps_ ⚠epi32 (x86 or x86-64) and avxConverts packed single-precision (32-bit) floating-point elements inato packed 32-bit integers.
- _mm256_cvtps_ ⚠pd (x86 or x86-64) and avxConverts packed single-precision (32-bit) floating-point elements inato packed double-precision (64-bit) floating-point elements.
- _mm256_cvtps_ ⚠ph (x86 or x86-64) and f16cConverts the 8 x 32-bit float values in the 256-bit vectorainto 8 x 16-bit half-precision float values stored in a 128-bit wide vector.
- _mm256_cvtsd_ ⚠f64 (x86 or x86-64) and avxReturns the first element of the input vector of[4 x double].
- _mm256_cvtsi256_ ⚠si32 (x86 or x86-64) and avxReturns the first element of the input vector of[8 x i32].
- _mm256_cvtss_ ⚠f32 (x86 or x86-64) and avxReturns the first element of the input vector of[8 x float].
- _mm256_cvttpd_ ⚠epi32 (x86 or x86-64) and avxConverts packed double-precision (64-bit) floating-point elements inato packed 32-bit integers with truncation.
- _mm256_cvttps_ ⚠epi32 (x86 or x86-64) and avxConverts packed single-precision (32-bit) floating-point elements inato packed 32-bit integers with truncation.
- _mm256_div_ ⚠pd (x86 or x86-64) and avxComputes the division of each of the 4 packed 64-bit floating-point elements inaby the corresponding packed elements inb.
- _mm256_div_ ⚠ps (x86 or x86-64) and avxComputes the division of each of the 8 packed 32-bit floating-point elements inaby the corresponding packed elements inb.
- _mm256_dp_ ⚠ps (x86 or x86-64) and avxConditionally multiplies the packed single-precision (32-bit) floating-point elements inaandbusing the high 4 bits inimm8, sum the four products, and conditionally return the sum using the low 4 bits ofimm8.
- _mm256_extract_ ⚠epi8 (x86 or x86-64) and avx2Extracts an 8-bit integer froma, selected withINDEX. Returns a 32-bit integer containing the zero-extended integer data.
- _mm256_extract_ ⚠epi16 (x86 or x86-64) and avx2Extracts a 16-bit integer froma, selected withINDEX. Returns a 32-bit integer containing the zero-extended integer data.
- _mm256_extract_ ⚠epi32 (x86 or x86-64) and avxExtracts a 32-bit integer froma, selected withINDEX.
- Extracts a 64-bit integer froma, selected withINDEX.
- _mm256_extractf128_ ⚠pd (x86 or x86-64) and avxExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) froma, selected withimm8.
- _mm256_extractf128_ ⚠ps (x86 or x86-64) and avxExtracts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) froma, selected withimm8.
- _mm256_extractf128_ ⚠si256 (x86 or x86-64) and avxExtracts 128 bits (composed of integer data) froma, selected withimm8.
- _mm256_extracti128_ ⚠si256 (x86 or x86-64) and avx2Extracts 128 bits (of integer data) fromaselected withIMM1.
- _mm256_floor_ ⚠pd (x86 or x86-64) and avxRounds packed double-precision (64-bit) floating point elements inatoward negative infinity.
- _mm256_floor_ ⚠ps (x86 or x86-64) and avxRounds packed single-precision (32-bit) floating point elements inatoward negative infinity.
- _mm256_fmadd_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc.
- _mm256_fmadd_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc.
- _mm256_fmaddsub_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm256_fmaddsub_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm256_fmsub_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result.
- _mm256_fmsub_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result.
- _mm256_fmsubadd_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm256_fmsubadd_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm256_fnmadd_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc.
- _mm256_fnmadd_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc.
- _mm256_fnmsub_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result.
- _mm256_fnmsub_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result.
- _mm256_hadd_ ⚠epi16 (x86 or x86-64) and avx2Horizontally adds adjacent pairs of 16-bit integers inaandb.
- _mm256_hadd_ ⚠epi32 (x86 or x86-64) and avx2Horizontally adds adjacent pairs of 32-bit integers inaandb.
- _mm256_hadd_ ⚠pd (x86 or x86-64) and avxHorizontal addition of adjacent pairs in the two packed vectors of 4 64-bit floating pointsaandb. In the result, sums of elements fromaare returned in even locations, while sums of elements frombare returned in odd locations.
- _mm256_hadd_ ⚠ps (x86 or x86-64) and avxHorizontal addition of adjacent pairs in the two packed vectors of 8 32-bit floating pointsaandb. In the result, sums of elements fromaare returned in locations of indices 0, 1, 4, 5; while sums of elements frombare locations 2, 3, 6, 7.
- _mm256_hadds_ ⚠epi16 (x86 or x86-64) and avx2Horizontally adds adjacent pairs of 16-bit integers inaandbusing saturation.
- _mm256_hsub_ ⚠epi16 (x86 or x86-64) and avx2Horizontally subtract adjacent pairs of 16-bit integers inaandb.
- _mm256_hsub_ ⚠epi32 (x86 or x86-64) and avx2Horizontally subtract adjacent pairs of 32-bit integers inaandb.
- _mm256_hsub_ ⚠pd (x86 or x86-64) and avxHorizontal subtraction of adjacent pairs in the two packed vectors of 4 64-bit floating pointsaandb. In the result, sums of elements fromaare returned in even locations, while sums of elements frombare returned in odd locations.
- _mm256_hsub_ ⚠ps (x86 or x86-64) and avxHorizontal subtraction of adjacent pairs in the two packed vectors of 8 32-bit floating pointsaandb. In the result, sums of elements fromaare returned in locations of indices 0, 1, 4, 5; while sums of elements frombare locations 2, 3, 6, 7.
- _mm256_hsubs_ ⚠epi16 (x86 or x86-64) and avx2Horizontally subtract adjacent pairs of 16-bit integers inaandbusing saturation.
- _mm256_i32gather_ ⚠epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32gather_ ⚠epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32gather_ ⚠pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i32gather_ ⚠ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_i64gather_ ⚠ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm256_insert_ ⚠epi8 (x86 or x86-64) and avxCopiesato result, and inserts the 8-bit integeriinto result at the location specified byindex.
- _mm256_insert_ ⚠epi16 (x86 or x86-64) and avxCopiesato result, and inserts the 16-bit integeriinto result at the location specified byindex.
- _mm256_insert_ ⚠epi32 (x86 or x86-64) and avxCopiesato result, and inserts the 32-bit integeriinto result at the location specified byindex.
- Copiesato result, and insert the 64-bit integeriinto result at the location specified byindex.
- _mm256_insertf128_ ⚠pd (x86 or x86-64) and avxCopiesato result, then inserts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) frombinto result at the location specified byimm8.
- _mm256_insertf128_ ⚠ps (x86 or x86-64) and avxCopiesato result, then inserts 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) frombinto result at the location specified byimm8.
- _mm256_insertf128_ ⚠si256 (x86 or x86-64) and avxCopiesato result, then inserts 128 bits frombinto result at the location specified byimm8.
- _mm256_inserti128_ ⚠si256 (x86 or x86-64) and avx2Copiesatodst, then insert 128 bits (of integer data) frombat the location specified byIMM1.
- _mm256_lddqu_ ⚠si256 (x86 or x86-64) and avxLoads 256-bits of integer data from unaligned memory into result. This intrinsic may perform better than_mm256_loadu_si256when the data crosses a cache line boundary.
- _mm256_load_ ⚠pd (x86 or x86-64) and avxLoads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠ps (x86 or x86-64) and avxLoads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠si256 (x86 or x86-64) and avxLoads 256-bits of integer data from memory into result.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_loadu2_ ⚠m128 (x86 or x86-64) and avxLoads two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_loadu2_ ⚠m128d (x86 or x86-64) and avxLoads two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_loadu2_ ⚠m128i (x86 or x86-64) and avxLoads two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠pd (x86 or x86-64) and avxLoads 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into result.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠ps (x86 or x86-64) and avxLoads 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into result.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠si256 (x86 or x86-64) and avxLoads 256-bits of integer data from memory into result.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_madd_ ⚠epi16 (x86 or x86-64) and avx2Multiplies packed signed 16-bit integers inaandb, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers.
- _mm256_maddubs_ ⚠epi16 (x86 or x86-64) and avx2Vertically multiplies each unsigned 8-bit integer fromawith the corresponding signed 8-bit integer fromb, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers
- _mm256_mask_ ⚠i32gather_ epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32gather_ epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32gather_ pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i32gather_ ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_mask_ ⚠i64gather_ ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm256_maskload_ ⚠epi32 (x86 or x86-64) and avx2Loads packed 32-bit integers from memory pointed bymem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm256_maskload_ ⚠epi64 (x86 or x86-64) and avx2Loads packed 64-bit integers from memory pointed bymem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm256_maskload_ ⚠pd (x86 or x86-64) and avxLoads packed double-precision (64-bit) floating-point elements from memory into result usingmask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm256_maskload_ ⚠ps (x86 or x86-64) and avxLoads packed single-precision (32-bit) floating-point elements from memory into result usingmask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm256_maskstore_ ⚠epi32 (x86 or x86-64) and avx2Stores packed 32-bit integers fromainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm256_maskstore_ ⚠epi64 (x86 or x86-64) and avx2Stores packed 64-bit integers fromainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm256_maskstore_ ⚠pd (x86 or x86-64) and avxStores packed double-precision (64-bit) floating-point elements fromainto memory usingmask.
- _mm256_maskstore_ ⚠ps (x86 or x86-64) and avxStores packed single-precision (32-bit) floating-point elements fromainto memory usingmask.
- _mm256_max_ ⚠epi8 (x86 or x86-64) and avx2Compares packed 8-bit integers inaandb, and returns the packed maximum values.
- _mm256_max_ ⚠epi16 (x86 or x86-64) and avx2Compares packed 16-bit integers inaandb, and returns the packed maximum values.
- _mm256_max_ ⚠epi32 (x86 or x86-64) and avx2Compares packed 32-bit integers inaandb, and returns the packed maximum values.
- _mm256_max_ ⚠epu8 (x86 or x86-64) and avx2Compares packed unsigned 8-bit integers inaandb, and returns the packed maximum values.
- _mm256_max_ ⚠epu16 (x86 or x86-64) and avx2Compares packed unsigned 16-bit integers inaandb, and returns the packed maximum values.
- _mm256_max_ ⚠epu32 (x86 or x86-64) and avx2Compares packed unsigned 32-bit integers inaandb, and returns the packed maximum values.
- _mm256_max_ ⚠pd (x86 or x86-64) and avxCompares packed double-precision (64-bit) floating-point elements inaandb, and returns packed maximum values
- _mm256_max_ ⚠ps (x86 or x86-64) and avxCompares packed single-precision (32-bit) floating-point elements inaandb, and returns packed maximum values
- _mm256_min_ ⚠epi8 (x86 or x86-64) and avx2Compares packed 8-bit integers inaandb, and returns the packed minimum values.
- _mm256_min_ ⚠epi16 (x86 or x86-64) and avx2Compares packed 16-bit integers inaandb, and returns the packed minimum values.
- _mm256_min_ ⚠epi32 (x86 or x86-64) and avx2Compares packed 32-bit integers inaandb, and returns the packed minimum values.
- _mm256_min_ ⚠epu8 (x86 or x86-64) and avx2Compares packed unsigned 8-bit integers inaandb, and returns the packed minimum values.
- _mm256_min_ ⚠epu16 (x86 or x86-64) and avx2Compares packed unsigned 16-bit integers inaandb, and returns the packed minimum values.
- _mm256_min_ ⚠epu32 (x86 or x86-64) and avx2Compares packed unsigned 32-bit integers inaandb, and returns the packed minimum values.
- _mm256_min_ ⚠pd (x86 or x86-64) and avxCompares packed double-precision (64-bit) floating-point elements inaandb, and returns packed minimum values
- _mm256_min_ ⚠ps (x86 or x86-64) and avxCompares packed single-precision (32-bit) floating-point elements inaandb, and returns packed minimum values
- _mm256_movedup_ ⚠pd (x86 or x86-64) and avxDuplicate even-indexed double-precision (64-bit) floating-point elements froma, and returns the results.
- _mm256_movehdup_ ⚠ps (x86 or x86-64) and avxDuplicate odd-indexed single-precision (32-bit) floating-point elements froma, and returns the results.
- _mm256_moveldup_ ⚠ps (x86 or x86-64) and avxDuplicate even-indexed single-precision (32-bit) floating-point elements froma, and returns the results.
- _mm256_movemask_ ⚠epi8 (x86 or x86-64) and avx2Creates mask from the most significant bit of each 8-bit element ina, return the result.
- _mm256_movemask_ ⚠pd (x86 or x86-64) and avxSets each bit of the returned mask based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element ina.
- _mm256_movemask_ ⚠ps (x86 or x86-64) and avxSets each bit of the returned mask based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element ina.
- _mm256_mpsadbw_ ⚠epu8 (x86 or x86-64) and avx2Computes the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers inacompared to those inb, and stores the 16-bit results in dst. Eight SADs are performed for each 128-bit lane using one quadruplet fromband eight quadruplets froma. One quadruplet is selected frombstarting at on the offset specified inimm8. Eight quadruplets are formed from sequential 8-bit integers selected fromastarting at the offset specified inimm8.
- _mm256_mul_ ⚠epi32 (x86 or x86-64) and avx2Multiplies the low 32-bit integers from each packed 64-bit element inaandb
- _mm256_mul_ ⚠epu32 (x86 or x86-64) and avx2Multiplies the low unsigned 32-bit integers from each packed 64-bit element inaandb
- _mm256_mul_ ⚠pd (x86 or x86-64) and avxMultiplies packed double-precision (64-bit) floating-point elements inaandb.
- _mm256_mul_ ⚠ps (x86 or x86-64) and avxMultiplies packed single-precision (32-bit) floating-point elements inaandb.
- _mm256_mulhi_ ⚠epi16 (x86 or x86-64) and avx2Multiplies the packed 16-bit integers inaandb, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
- _mm256_mulhi_ ⚠epu16 (x86 or x86-64) and avx2Multiplies the packed unsigned 16-bit integers inaandb, producing intermediate 32-bit integers and returning the high 16 bits of the intermediate integers.
- _mm256_mulhrs_ ⚠epi16 (x86 or x86-64) and avx2Multiplies packed 16-bit integers inaandb, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and return bits[16:1].
- _mm256_mullo_ ⚠epi16 (x86 or x86-64) and avx2Multiplies the packed 16-bit integers inaandb, producing intermediate 32-bit integers, and returns the low 16 bits of the intermediate integers
- _mm256_mullo_ ⚠epi32 (x86 or x86-64) and avx2Multiplies the packed 32-bit integers inaandb, producing intermediate 64-bit integers, and returns the low 32 bits of the intermediate integers
- _mm256_or_ ⚠pd (x86 or x86-64) and avxComputes the bitwise OR packed double-precision (64-bit) floating-point elements inaandb.
- _mm256_or_ ⚠ps (x86 or x86-64) and avxComputes the bitwise OR packed single-precision (32-bit) floating-point elements inaandb.
- _mm256_or_ ⚠si256 (x86 or x86-64) and avx2Computes the bitwise OR of 256 bits (representing integer data) inaandb
- _mm256_packs_ ⚠epi16 (x86 or x86-64) and avx2Converts packed 16-bit integers fromaandbto packed 8-bit integers using signed saturation
- _mm256_packs_ ⚠epi32 (x86 or x86-64) and avx2Converts packed 32-bit integers fromaandbto packed 16-bit integers using signed saturation
- _mm256_packus_ ⚠epi16 (x86 or x86-64) and avx2Converts packed 16-bit integers fromaandbto packed 8-bit integers using unsigned saturation
- _mm256_packus_ ⚠epi32 (x86 or x86-64) and avx2Converts packed 32-bit integers fromaandbto packed 16-bit integers using unsigned saturation
- _mm256_permute2f128_ ⚠pd (x86 or x86-64) and avxShuffles 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) selected byimm8fromaandb.
- _mm256_permute2f128_ ⚠ps (x86 or x86-64) and avxShuffles 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) selected byimm8fromaandb.
- _mm256_permute2f128_ ⚠si256 (x86 or x86-64) and avxShuffles 128-bits (composed of integer data) selected byimm8fromaandb.
- _mm256_permute2x128_ ⚠si256 (x86 or x86-64) and avx2Shuffles 128-bits of integer data selected byimm8fromaandb.
- _mm256_permute4x64_ ⚠epi64 (x86 or x86-64) and avx2Permutes 64-bit integers fromausing control maskimm8.
- _mm256_permute4x64_ ⚠pd (x86 or x86-64) and avx2Shuffles 64-bit floating-point elements inaacross lanes using the control inimm8.
- _mm256_permute_ ⚠pd (x86 or x86-64) and avxShuffles double-precision (64-bit) floating-point elements inawithin 128-bit lanes using the control inimm8.
- _mm256_permute_ ⚠ps (x86 or x86-64) and avxShuffles single-precision (32-bit) floating-point elements inawithin 128-bit lanes using the control inimm8.
- _mm256_permutevar8x32_ ⚠epi32 (x86 or x86-64) and avx2Permutes packed 32-bit integers fromaaccording to the content ofb.
- _mm256_permutevar8x32_ ⚠ps (x86 or x86-64) and avx2Shuffles eight 32-bit floating-point elements inaacross lanes using the corresponding 32-bit integer index inidx.
- _mm256_permutevar_ ⚠pd (x86 or x86-64) and avxShuffles double-precision (64-bit) floating-point elements inawithin 256-bit lanes using the control inb.
- _mm256_permutevar_ ⚠ps (x86 or x86-64) and avxShuffles single-precision (32-bit) floating-point elements inawithin 128-bit lanes using the control inb.
- _mm256_rcp_ ⚠ps (x86 or x86-64) and avxComputes the approximate reciprocal of packed single-precision (32-bit) floating-point elements ina, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_round_ ⚠pd (x86 or x86-64) and avxRounds packed double-precision (64-bit) floating point elements inaaccording to the flagROUNDING. The value ofROUNDINGmay be as follows:
- _mm256_round_ ⚠ps (x86 or x86-64) and avxRounds packed single-precision (32-bit) floating point elements inaaccording to the flagROUNDING. The value ofROUNDINGmay be as follows:
- _mm256_rsqrt_ ⚠ps (x86 or x86-64) and avxComputes the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements ina, and returns the results. The maximum relative error for this approximation is less than 1.5*2^-12.
- _mm256_sad_ ⚠epu8 (x86 or x86-64) and avx2Computes the absolute differences of packed unsigned 8-bit integers inaandb, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of the 64-bit return value
- _mm256_set1_ ⚠epi8 (x86 or x86-64) and avxBroadcasts 8-bit integerato all elements of returned vector. This intrinsic may generate thevpbroadcastb.
- _mm256_set1_ ⚠epi16 (x86 or x86-64) and avxBroadcasts 16-bit integerato all elements of returned vector. This intrinsic may generate thevpbroadcastw.
- _mm256_set1_ ⚠epi32 (x86 or x86-64) and avxBroadcasts 32-bit integerato all elements of returned vector. This intrinsic may generate thevpbroadcastd.
- _mm256_set1_ ⚠epi64x (x86 or x86-64) and avxBroadcasts 64-bit integerato all elements of returned vector. This intrinsic may generate thevpbroadcastq.
- _mm256_set1_ ⚠pd (x86 or x86-64) and avxBroadcasts double-precision (64-bit) floating-point valueato all elements of returned vector.
- _mm256_set1_ ⚠ps (x86 or x86-64) and avxBroadcasts single-precision (32-bit) floating-point valueato all elements of returned vector.
- _mm256_set_ ⚠epi8 (x86 or x86-64) and avxSets packed 8-bit integers in returned vector with the supplied values.
- _mm256_set_ ⚠epi16 (x86 or x86-64) and avxSets packed 16-bit integers in returned vector with the supplied values.
- _mm256_set_ ⚠epi32 (x86 or x86-64) and avxSets packed 32-bit integers in returned vector with the supplied values.
- _mm256_set_ ⚠epi64x (x86 or x86-64) and avxSets packed 64-bit integers in returned vector with the supplied values.
- _mm256_set_ ⚠m128 (x86 or x86-64) and avxSets packed __m256 returned vector with the supplied values.
- _mm256_set_ ⚠m128d (x86 or x86-64) and avxSets packed __m256d returned vector with the supplied values.
- _mm256_set_ ⚠m128i (x86 or x86-64) and avxSets packed __m256i returned vector with the supplied values.
- _mm256_set_ ⚠pd (x86 or x86-64) and avxSets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values.
- _mm256_set_ ⚠ps (x86 or x86-64) and avxSets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values.
- _mm256_setr_ ⚠epi8 (x86 or x86-64) and avxSets packed 8-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ ⚠epi16 (x86 or x86-64) and avxSets packed 16-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ ⚠epi32 (x86 or x86-64) and avxSets packed 32-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ ⚠epi64x (x86 or x86-64) and avxSets packed 64-bit integers in returned vector with the supplied values in reverse order.
- _mm256_setr_ ⚠m128 (x86 or x86-64) and avxSets packed __m256 returned vector with the supplied values.
- _mm256_setr_ ⚠m128d (x86 or x86-64) and avxSets packed __m256d returned vector with the supplied values.
- _mm256_setr_ ⚠m128i (x86 or x86-64) and avxSets packed __m256i returned vector with the supplied values.
- _mm256_setr_ ⚠pd (x86 or x86-64) and avxSets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order.
- _mm256_setr_ ⚠ps (x86 or x86-64) and avxSets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order.
- _mm256_setzero_ ⚠pd (x86 or x86-64) and avxReturns vector of type __m256d with all elements set to zero.
- _mm256_setzero_ ⚠ps (x86 or x86-64) and avxReturns vector of type __m256 with all elements set to zero.
- _mm256_setzero_ ⚠si256 (x86 or x86-64) and avxReturns vector of type __m256i with all elements set to zero.
- _mm256_shuffle_ ⚠epi8 (x86 or x86-64) and avx2Shuffles bytes fromaaccording to the content ofb.
- _mm256_shuffle_ ⚠epi32 (x86 or x86-64) and avx2Shuffles 32-bit integers in 128-bit lanes ofausing the control inimm8.
- _mm256_shuffle_ ⚠pd (x86 or x86-64) and avxShuffles double-precision (64-bit) floating-point elements within 128-bit lanes using the control inimm8.
- _mm256_shuffle_ ⚠ps (x86 or x86-64) and avxShuffles single-precision (32-bit) floating-point elements inawithin 128-bit lanes using the control inimm8.
- _mm256_shufflehi_ ⚠epi16 (x86 or x86-64) and avx2Shuffles 16-bit integers in the high 64 bits of 128-bit lanes ofausing the control inimm8. The low 64 bits of 128-bit lanes ofaare copied to the output.
- _mm256_shufflelo_ ⚠epi16 (x86 or x86-64) and avx2Shuffles 16-bit integers in the low 64 bits of 128-bit lanes ofausing the control inimm8. The high 64 bits of 128-bit lanes ofaare copied to the output.
- _mm256_sign_ ⚠epi8 (x86 or x86-64) and avx2Negates packed 8-bit integers inawhen the corresponding signed 8-bit integer inbis negative, and returns the results. Results are zeroed out when the corresponding element inbis zero.
- _mm256_sign_ ⚠epi16 (x86 or x86-64) and avx2Negates packed 16-bit integers inawhen the corresponding signed 16-bit integer inbis negative, and returns the results. Results are zeroed out when the corresponding element inbis zero.
- _mm256_sign_ ⚠epi32 (x86 or x86-64) and avx2Negates packed 32-bit integers inawhen the corresponding signed 32-bit integer inbis negative, and returns the results. Results are zeroed out when the corresponding element inbis zero.
- _mm256_sll_ ⚠epi16 (x86 or x86-64) and avx2Shifts packed 16-bit integers inaleft bycountwhile shifting in zeros, and returns the result
- _mm256_sll_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaleft bycountwhile shifting in zeros, and returns the result
- _mm256_sll_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaleft bycountwhile shifting in zeros, and returns the result
- _mm256_slli_ ⚠epi16 (x86 or x86-64) and avx2Shifts packed 16-bit integers inaleft byIMM8while shifting in zeros, return the results;
- _mm256_slli_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaleft byIMM8while shifting in zeros, return the results;
- _mm256_slli_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaleft byIMM8while shifting in zeros, return the results;
- _mm256_slli_ ⚠si256 (x86 or x86-64) and avx2Shifts 128-bit lanes inaleft byimm8bytes while shifting in zeros.
- _mm256_sllv_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm256_sllv_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm256_sqrt_ ⚠pd (x86 or x86-64) and avxReturns the square root of packed double-precision (64-bit) floating point elements ina.
- _mm256_sqrt_ ⚠ps (x86 or x86-64) and avxReturns the square root of packed single-precision (32-bit) floating point elements ina.
- _mm256_sra_ ⚠epi16 (x86 or x86-64) and avx2Shifts packed 16-bit integers inaright bycountwhile shifting in sign bits.
- _mm256_sra_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright bycountwhile shifting in sign bits.
- _mm256_srai_ ⚠epi16 (x86 or x86-64) and avx2Shifts packed 16-bit integers inaright byIMM8while shifting in sign bits.
- _mm256_srai_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright byIMM8while shifting in sign bits.
- _mm256_srav_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright by the amount specified by the corresponding element incountwhile shifting in sign bits.
- _mm256_srl_ ⚠epi16 (x86 or x86-64) and avx2Shifts packed 16-bit integers inaright bycountwhile shifting in zeros.
- _mm256_srl_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright bycountwhile shifting in zeros.
- _mm256_srl_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaright bycountwhile shifting in zeros.
- _mm256_srli_ ⚠epi16 (x86 or x86-64) and avx2Shifts packed 16-bit integers inaright byIMM8while shifting in zeros
- _mm256_srli_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright byIMM8while shifting in zeros
- _mm256_srli_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaright byIMM8while shifting in zeros
- _mm256_srli_ ⚠si256 (x86 or x86-64) and avx2Shifts 128-bit lanes inaright byimm8bytes while shifting in zeros.
- _mm256_srlv_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm256_srlv_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm256_store_ ⚠pd (x86 or x86-64) and avxStores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) fromainto memory.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠ps (x86 or x86-64) and avxStores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) fromainto memory.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠si256 (x86 or x86-64) and avxStores 256-bits of integer data fromainto memory.mem_addrmust be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_storeu2_ ⚠m128 (x86 or x86-64) and avxStores the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) fromainto memory two different 128-bit locations.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_storeu2_ ⚠m128d (x86 or x86-64) and avxStores the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) fromainto memory two different 128-bit locations.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_storeu2_ ⚠m128i (x86 or x86-64) and avxStores the high and low 128-bit halves (each composed of integer data) fromainto memory two different 128-bit locations.hiaddrandloaddrdo not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠pd (x86 or x86-64) and avxStores 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) fromainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠ps (x86 or x86-64) and avxStores 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) fromainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠si256 (x86 or x86-64) and avxStores 256-bits of integer data fromainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm256_stream_ ⚠load_ si256 (x86 or x86-64) and avx2Load 256-bits of integer data from memory into dst using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm256_stream_ ⚠pd (x86 or x86-64) and avxMoves double-precision values from a 256-bit vector of[4 x double]to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm256_stream_ ⚠ps (x86 or x86-64) and avxMoves single-precision floating point values from a 256-bit vector of[8 x float]to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm256_stream_ ⚠si256 (x86 or x86-64) and avxMoves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm256_sub_ ⚠epi8 (x86 or x86-64) and avx2Subtract packed 8-bit integers inbfrom packed 8-bit integers ina
- _mm256_sub_ ⚠epi16 (x86 or x86-64) and avx2Subtract packed 16-bit integers inbfrom packed 16-bit integers ina
- _mm256_sub_ ⚠epi32 (x86 or x86-64) and avx2Subtract packed 32-bit integers inbfrom packed 32-bit integers ina
- _mm256_sub_ ⚠epi64 (x86 or x86-64) and avx2Subtract packed 64-bit integers inbfrom packed 64-bit integers ina
- _mm256_sub_ ⚠pd (x86 or x86-64) and avxSubtracts packed double-precision (64-bit) floating-point elements inbfrom packed elements ina.
- _mm256_sub_ ⚠ps (x86 or x86-64) and avxSubtracts packed single-precision (32-bit) floating-point elements inbfrom packed elements ina.
- _mm256_subs_ ⚠epi8 (x86 or x86-64) and avx2Subtract packed 8-bit integers inbfrom packed 8-bit integers inausing saturation.
- _mm256_subs_ ⚠epi16 (x86 or x86-64) and avx2Subtract packed 16-bit integers inbfrom packed 16-bit integers inausing saturation.
- _mm256_subs_ ⚠epu8 (x86 or x86-64) and avx2Subtract packed unsigned 8-bit integers inbfrom packed 8-bit integers inausing saturation.
- _mm256_subs_ ⚠epu16 (x86 or x86-64) and avx2Subtract packed unsigned 16-bit integers inbfrom packed 16-bit integers inausing saturation.
- _mm256_testc_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) inaandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm256_testc_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) inaandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm256_testc_ ⚠si256 (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing integer data) inaandb, and setZFto 1 if the result is zero, otherwise setZFto 0. Computes the bitwise NOT ofaand then AND withb, and setCFto 1 if the result is zero, otherwise setCFto 0. Return theCFvalue.
- _mm256_testnzc_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) inaandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm256_testnzc_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) inaandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm256_testnzc_ ⚠si256 (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing integer data) inaandb, and setZFto 1 if the result is zero, otherwise setZFto 0. Computes the bitwise NOT ofaand then AND withb, and setCFto 1 if the result is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm256_testz_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) inaandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm256_testz_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) inaandb, producing an intermediate 256-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm256_testz_ ⚠si256 (x86 or x86-64) and avxComputes the bitwise AND of 256 bits (representing integer data) inaandb, and setZFto 1 if the result is zero, otherwise setZFto 0. Computes the bitwise NOT ofaand then AND withb, and setCFto 1 if the result is zero, otherwise setCFto 0. Return theZFvalue.
- _mm256_undefined_ ⚠pd (x86 or x86-64) and avxReturns vector of type__m256dwith indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm256_undefined_ ⚠ps (x86 or x86-64) and avxReturns vector of type__m256with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm256_undefined_ ⚠si256 (x86 or x86-64) and avxReturns vector of type __m256i with with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm256_unpackhi_ ⚠epi8 (x86 or x86-64) and avx2Unpacks and interleave 8-bit integers from the high half of each 128-bit lane inaandb.
- _mm256_unpackhi_ ⚠epi16 (x86 or x86-64) and avx2Unpacks and interleave 16-bit integers from the high half of each 128-bit lane ofaandb.
- _mm256_unpackhi_ ⚠epi32 (x86 or x86-64) and avx2Unpacks and interleave 32-bit integers from the high half of each 128-bit lane ofaandb.
- _mm256_unpackhi_ ⚠epi64 (x86 or x86-64) and avx2Unpacks and interleave 64-bit integers from the high half of each 128-bit lane ofaandb.
- _mm256_unpackhi_ ⚠pd (x86 or x86-64) and avxUnpacks and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane inaandb.
- _mm256_unpackhi_ ⚠ps (x86 or x86-64) and avxUnpacks and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane inaandb.
- _mm256_unpacklo_ ⚠epi8 (x86 or x86-64) and avx2Unpacks and interleave 8-bit integers from the low half of each 128-bit lane ofaandb.
- _mm256_unpacklo_ ⚠epi16 (x86 or x86-64) and avx2Unpacks and interleave 16-bit integers from the low half of each 128-bit lane ofaandb.
- _mm256_unpacklo_ ⚠epi32 (x86 or x86-64) and avx2Unpacks and interleave 32-bit integers from the low half of each 128-bit lane ofaandb.
- _mm256_unpacklo_ ⚠epi64 (x86 or x86-64) and avx2Unpacks and interleave 64-bit integers from the low half of each 128-bit lane ofaandb.
- _mm256_unpacklo_ ⚠pd (x86 or x86-64) and avxUnpacks and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane inaandb.
- _mm256_unpacklo_ ⚠ps (x86 or x86-64) and avxUnpacks and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane inaandb.
- _mm256_xor_ ⚠pd (x86 or x86-64) and avxComputes the bitwise XOR of packed double-precision (64-bit) floating-point elements inaandb.
- _mm256_xor_ ⚠ps (x86 or x86-64) and avxComputes the bitwise XOR of packed single-precision (32-bit) floating-point elements inaandb.
- _mm256_xor_ ⚠si256 (x86 or x86-64) and avx2Computes the bitwise XOR of 256 bits (representing integer data) inaandb
- _mm256_zeroall ⚠(x86 or x86-64) and avxZeroes the contents of all XMM or YMM registers.
- _mm256_zeroupper ⚠(x86 or x86-64) and avxZeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
- _mm256_zextpd128_ ⚠pd256 (x86 or x86-64) and avxConstructs a 256-bit floating-point vector of[4 x double]from a 128-bit floating-point vector of[2 x double]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm256_zextps128_ ⚠ps256 (x86 or x86-64) and avxConstructs a 256-bit floating-point vector of[8 x float]from a 128-bit floating-point vector of[4 x float]. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm256_zextsi128_ ⚠si256 (x86 or x86-64) and avxConstructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero.
- _mm_abs_ ⚠epi8 (x86 or x86-64) and ssse3Computes the absolute value of packed 8-bit signed integers inaand return the unsigned results.
- _mm_abs_ ⚠epi16 (x86 or x86-64) and ssse3Computes the absolute value of each of the packed 16-bit signed integers inaand return the 16-bit unsigned integer
- _mm_abs_ ⚠epi32 (x86 or x86-64) and ssse3Computes the absolute value of each of the packed 32-bit signed integers inaand return the 32-bit unsigned integer
- _mm_add_ ⚠epi8 (x86 or x86-64) and sse2Adds packed 8-bit integers inaandb.
- _mm_add_ ⚠epi16 (x86 or x86-64) and sse2Adds packed 16-bit integers inaandb.
- _mm_add_ ⚠epi32 (x86 or x86-64) and sse2Adds packed 32-bit integers inaandb.
- _mm_add_ ⚠epi64 (x86 or x86-64) and sse2Adds packed 64-bit integers inaandb.
- _mm_add_ ⚠pd (x86 or x86-64) and sse2Adds packed double-precision (64-bit) floating-point elements inaandb.
- _mm_add_ ⚠ps (x86 or x86-64) and sseAdds packed single-precision (32-bit) floating-point elements inaandb.
- _mm_add_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the sum of the low elements ofaandb.
- _mm_add_ ⚠ss (x86 or x86-64) and sseAdds the first component ofaandb, the other components are copied froma.
- _mm_adds_ ⚠epi8 (x86 or x86-64) and sse2Adds packed 8-bit integers inaandbusing saturation.
- _mm_adds_ ⚠epi16 (x86 or x86-64) and sse2Adds packed 16-bit integers inaandbusing saturation.
- _mm_adds_ ⚠epu8 (x86 or x86-64) and sse2Adds packed unsigned 8-bit integers inaandbusing saturation.
- _mm_adds_ ⚠epu16 (x86 or x86-64) and sse2Adds packed unsigned 16-bit integers inaandbusing saturation.
- _mm_addsub_ ⚠pd (x86 or x86-64) and sse3Alternatively add and subtract packed double-precision (64-bit) floating-point elements inato/from packed elements inb.
- _mm_addsub_ ⚠ps (x86 or x86-64) and sse3Alternatively add and subtract packed single-precision (32-bit) floating-point elements inato/from packed elements inb.
- _mm_aesdec_ ⚠si128 (x86 or x86-64) and aesPerforms one round of an AES decryption flow on data (state) ina.
- _mm_aesdeclast_ ⚠si128 (x86 or x86-64) and aesPerforms the last round of an AES decryption flow on data (state) ina.
- _mm_aesenc_ ⚠si128 (x86 or x86-64) and aesPerforms one round of an AES encryption flow on data (state) ina.
- _mm_aesenclast_ ⚠si128 (x86 or x86-64) and aesPerforms the last round of an AES encryption flow on data (state) ina.
- _mm_aesimc_ ⚠si128 (x86 or x86-64) and aesPerforms theInvMixColumnstransformation ona.
- _mm_aeskeygenassist_ ⚠si128 (x86 or x86-64) and aesAssist in expanding the AES cipher key.
- _mm_alignr_ ⚠epi8 (x86 or x86-64) and ssse3Concatenate 16-byte blocks inaandbinto a 32-byte temporary result, shift the result right bynbytes, and returns the low 16 bytes.
- _mm_and_ ⚠pd (x86 or x86-64) and sse2Computes the bitwise AND of packed double-precision (64-bit) floating-point elements inaandb.
- _mm_and_ ⚠ps (x86 or x86-64) and sseBitwise AND of packed single-precision (32-bit) floating-point elements.
- _mm_and_ ⚠si128 (x86 or x86-64) and sse2Computes the bitwise AND of 128 bits (representing integer data) inaandb.
- _mm_andnot_ ⚠pd (x86 or x86-64) and sse2Computes the bitwise NOT ofaand then AND withb.
- _mm_andnot_ ⚠ps (x86 or x86-64) and sseBitwise AND-NOT of packed single-precision (32-bit) floating-point elements.
- _mm_andnot_ ⚠si128 (x86 or x86-64) and sse2Computes the bitwise NOT of 128 bits (representing integer data) inaand then AND withb.
- _mm_avg_ ⚠epu8 (x86 or x86-64) and sse2Averages packed unsigned 8-bit integers inaandb.
- _mm_avg_ ⚠epu16 (x86 or x86-64) and sse2Averages packed unsigned 16-bit integers inaandb.
- _mm_blend_ ⚠epi16 (x86 or x86-64) and sse4.1Blend packed 16-bit integers fromaandbusing the maskIMM8.
- _mm_blend_ ⚠epi32 (x86 or x86-64) and avx2Blends packed 32-bit integers fromaandbusing control maskIMM4.
- _mm_blend_ ⚠pd (x86 or x86-64) and sse4.1Blend packed double-precision (64-bit) floating-point elements fromaandbusing control maskIMM2
- _mm_blend_ ⚠ps (x86 or x86-64) and sse4.1Blend packed single-precision (32-bit) floating-point elements fromaandbusing maskIMM4
- _mm_blendv_ ⚠epi8 (x86 or x86-64) and sse4.1Blend packed 8-bit integers fromaandbusingmask
- _mm_blendv_ ⚠pd (x86 or x86-64) and sse4.1Blend packed double-precision (64-bit) floating-point elements fromaandbusingmask
- _mm_blendv_ ⚠ps (x86 or x86-64) and sse4.1Blend packed single-precision (32-bit) floating-point elements fromaandbusingmask
- _mm_broadcast_ ⚠ss (x86 or x86-64) and avxBroadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector.
- _mm_broadcastb_ ⚠epi8 (x86 or x86-64) and avx2Broadcasts the low packed 8-bit integer fromato all elements of the 128-bit returned value.
- _mm_broadcastd_ ⚠epi32 (x86 or x86-64) and avx2Broadcasts the low packed 32-bit integer fromato all elements of the 128-bit returned value.
- _mm_broadcastq_ ⚠epi64 (x86 or x86-64) and avx2Broadcasts the low packed 64-bit integer fromato all elements of the 128-bit returned value.
- _mm_broadcastsd_ ⚠pd (x86 or x86-64) and avx2Broadcasts the low double-precision (64-bit) floating-point element fromato all elements of the 128-bit returned value.
- _mm_broadcastsi128_ ⚠si256 (x86 or x86-64) and avx2Broadcasts 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value.
- _mm_broadcastss_ ⚠ps (x86 or x86-64) and avx2Broadcasts the low single-precision (32-bit) floating-point element fromato all elements of the 128-bit returned value.
- _mm_broadcastw_ ⚠epi16 (x86 or x86-64) and avx2Broadcasts the low packed 16-bit integer from a to all elements of the 128-bit returned value
- _mm_bslli_ ⚠si128 (x86 or x86-64) and sse2Shiftsaleft byIMM8bytes while shifting in zeros.
- _mm_bsrli_ ⚠si128 (x86 or x86-64) and sse2Shiftsaright byIMM8bytes while shifting in zeros.
- _mm_castpd_ ⚠ps (x86 or x86-64) and sse2Casts a 128-bit floating-point vector of[2 x double]into a 128-bit floating-point vector of[4 x float].
- _mm_castpd_ ⚠si128 (x86 or x86-64) and sse2Casts a 128-bit floating-point vector of[2 x double]into a 128-bit integer vector.
- _mm_castps_ ⚠pd (x86 or x86-64) and sse2Casts a 128-bit floating-point vector of[4 x float]into a 128-bit floating-point vector of[2 x double].
- _mm_castps_ ⚠si128 (x86 or x86-64) and sse2Casts a 128-bit floating-point vector of[4 x float]into a 128-bit integer vector.
- _mm_castsi128_ ⚠pd (x86 or x86-64) and sse2Casts a 128-bit integer vector into a 128-bit floating-point vector of[2 x double].
- _mm_castsi128_ ⚠ps (x86 or x86-64) and sse2Casts a 128-bit integer vector into a 128-bit floating-point vector of[4 x float].
- _mm_ceil_ ⚠pd (x86 or x86-64) and sse4.1Round the packed double-precision (64-bit) floating-point elements inaup to an integer value, and stores the results as packed double-precision floating-point elements.
- _mm_ceil_ ⚠ps (x86 or x86-64) and sse4.1Round the packed single-precision (32-bit) floating-point elements inaup to an integer value, and stores the results as packed single-precision floating-point elements.
- _mm_ceil_ ⚠sd (x86 or x86-64) and sse4.1Round the lower double-precision (64-bit) floating-point element inbup to an integer value, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copies the upper element fromato the upper element of the intrinsic result.
- _mm_ceil_ ⚠ss (x86 or x86-64) and sse4.1Round the lower single-precision (32-bit) floating-point element inbup to an integer value, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copies the upper 3 packed elements fromato the upper elements of the intrinsic result.
- _mm_clflush ⚠(x86 or x86-64) and sse2Invalidates and flushes the cache line that containspfrom all levels of the cache hierarchy.
- _mm_clmulepi64_ ⚠si128 (x86 or x86-64) and pclmulqdqPerforms a carry-less multiplication of two 64-bit polynomials over the finite field GF(2).
- _mm_cmp_ ⚠pd (x86 or x86-64) and avxCompares packed double-precision (64-bit) floating-point elements inaandbbased on the comparison operand specified byIMM5.
- _mm_cmp_ ⚠ps (x86 or x86-64) and avxCompares packed single-precision (32-bit) floating-point elements inaandbbased on the comparison operand specified byIMM5.
- _mm_cmp_ ⚠sd (x86 or x86-64) and avxCompares the lower double-precision (64-bit) floating-point element inaandbbased on the comparison operand specified byIMM5, store the result in the lower element of returned vector, and copies the upper element fromato the upper element of returned vector.
- _mm_cmp_ ⚠ss (x86 or x86-64) and avxCompares the lower single-precision (32-bit) floating-point element inaandbbased on the comparison operand specified byIMM5, store the result in the lower element of returned vector, and copies the upper 3 packed elements fromato the upper elements of returned vector.
- _mm_cmpeq_ ⚠epi8 (x86 or x86-64) and sse2Compares packed 8-bit integers inaandbfor equality.
- _mm_cmpeq_ ⚠epi16 (x86 or x86-64) and sse2Compares packed 16-bit integers inaandbfor equality.
- _mm_cmpeq_ ⚠epi32 (x86 or x86-64) and sse2Compares packed 32-bit integers inaandbfor equality.
- _mm_cmpeq_ ⚠epi64 (x86 or x86-64) and sse4.1Compares packed 64-bit integers inaandbfor equality
- _mm_cmpeq_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor equality.
- _mm_cmpeq_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input elements were equal, or0otherwise.
- _mm_cmpeq_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the equality comparison of the lower elements ofaandb.
- _mm_cmpeq_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for equality. The lowest 32 bits of the result will be0xffffffffif the two inputs are equal, or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpestra ⚠(x86 or x86-64) and sse4.2Compares packed strings inaandbwith lengthslaandlbusing the control inIMM8, and return1ifbdid not contain a null character and the resulting mask was zero, and0otherwise.
- _mm_cmpestrc ⚠(x86 or x86-64) and sse4.2Compares packed strings inaandbwith lengthslaandlbusing the control inIMM8, and return1if the resulting mask was non-zero, and0otherwise.
- _mm_cmpestri ⚠(x86 or x86-64) and sse4.2Compares packed stringsaandbwith lengthslaandlbusing the control inIMM8and return the generated index. Similar to_mm_cmpistriwith the exception that_mm_cmpistriimplicitly determines the length ofaandb.
- _mm_cmpestrm ⚠(x86 or x86-64) and sse4.2Compares packed strings inaandbwith lengthslaandlbusing the control inIMM8, and return the generated mask.
- _mm_cmpestro ⚠(x86 or x86-64) and sse4.2Compares packed strings inaandbwith lengthslaandlbusing the control inIMM8, and return bit0of the resulting bit mask.
- _mm_cmpestrs ⚠(x86 or x86-64) and sse4.2Compares packed strings inaandbwith lengthslaandlbusing the control inIMM8, and return1if any character in a was null, and0otherwise.
- _mm_cmpestrz ⚠(x86 or x86-64) and sse4.2Compares packed strings inaandbwith lengthslaandlbusing the control inIMM8, and return1if any character inbwas null, and0otherwise.
- _mm_cmpge_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor greater-than-or-equal.
- _mm_cmpge_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais greater than or equal to the corresponding element inb, or0otherwise.
- _mm_cmpge_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the greater-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmpge_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for greater than or equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is greater than or equalb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpgt_ ⚠epi8 (x86 or x86-64) and sse2Compares packed 8-bit integers inaandbfor greater-than.
- _mm_cmpgt_ ⚠epi16 (x86 or x86-64) and sse2Compares packed 16-bit integers inaandbfor greater-than.
- _mm_cmpgt_ ⚠epi32 (x86 or x86-64) and sse2Compares packed 32-bit integers inaandbfor greater-than.
- _mm_cmpgt_ ⚠epi64 (x86 or x86-64) and sse4.2Compares packed 64-bit integers inaandbfor greater-than, return the results.
- _mm_cmpgt_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor greater-than.
- _mm_cmpgt_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais greater than the corresponding element inb, or0otherwise.
- _mm_cmpgt_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the greater-than comparison of the lower elements ofaandb.
- _mm_cmpgt_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for greater than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is greater thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpistra ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8, and return1ifbdid not contain a null character and the resulting mask was zero, and0otherwise.
- _mm_cmpistrc ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8, and return1if the resulting mask was non-zero, and0otherwise.
- _mm_cmpistri ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8and return the generated index. Similar to_mm_cmpestriwith the exception that_mm_cmpestrirequires the lengths ofaandbto be explicitly specified.
- _mm_cmpistrm ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8, and return the generated mask.
- _mm_cmpistro ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8, and return bit0of the resulting bit mask.
- _mm_cmpistrs ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8, and returns1if any character inawas null, and0otherwise.
- _mm_cmpistrz ⚠(x86 or x86-64) and sse4.2Compares packed strings with implicit lengths inaandbusing the control inIMM8, and return1if any character inbwas null. and0otherwise.
- _mm_cmple_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor less-than-or-equal
- _mm_cmple_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais less than or equal to the corresponding element inb, or0otherwise.
- _mm_cmple_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the less-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmple_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for less than or equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is less than or equalb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmplt_ ⚠epi8 (x86 or x86-64) and sse2Compares packed 8-bit integers inaandbfor less-than.
- _mm_cmplt_ ⚠epi16 (x86 or x86-64) and sse2Compares packed 16-bit integers inaandbfor less-than.
- _mm_cmplt_ ⚠epi32 (x86 or x86-64) and sse2Compares packed 32-bit integers inaandbfor less-than.
- _mm_cmplt_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor less-than.
- _mm_cmplt_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais less than the corresponding element inb, or0otherwise.
- _mm_cmplt_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the less-than comparison of the lower elements ofaandb.
- _mm_cmplt_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for less than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is less thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpneq_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor not-equal.
- _mm_cmpneq_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input elements are not equal, or0otherwise.
- _mm_cmpneq_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the not-equal comparison of the lower elements ofaandb.
- _mm_cmpneq_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for inequality. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not equal tob.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpnge_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor not-greater-than-or-equal.
- _mm_cmpnge_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not greater than or equal to the corresponding element inb, or0otherwise.
- _mm_cmpnge_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the not-greater-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmpnge_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for not-greater-than-or-equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not greater than or equal tob.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpngt_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor not-greater-than.
- _mm_cmpngt_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not greater than the corresponding element inb, or0otherwise.
- _mm_cmpngt_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the not-greater-than comparison of the lower elements ofaandb.
- _mm_cmpngt_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for not-greater-than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not greater thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpnle_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor not-less-than-or-equal.
- _mm_cmpnle_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not less than or equal to the corresponding element inb, or0otherwise.
- _mm_cmpnle_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the not-less-than-or-equal comparison of the lower elements ofaandb.
- _mm_cmpnle_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for not-less-than-or-equal. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not less than or equal tob.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpnlt_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbfor not-less-than.
- _mm_cmpnlt_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. The result in the output vector will be0xffffffffif the input element inais not less than the corresponding element inb, or0otherwise.
- _mm_cmpnlt_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the not-less-than comparison of the lower elements ofaandb.
- _mm_cmpnlt_ ⚠ss (x86 or x86-64) and sseCompares the lowestf32of both inputs for not-less-than. The lowest 32 bits of the result will be0xffffffffifa.extract(0)is not less thanb.extract(0), or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpord_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbto see if neither isNaN.
- _mm_cmpord_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. Returns four floats that have one of two possible bit patterns. The element in the output vector will be0xffffffffif the input elements inaandbare ordered (i.e., neither of them is a NaN), or 0 otherwise.
- _mm_cmpord_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the result of comparing both of the lower elements ofaandbtoNaN. If neither are equal toNaNthen0xFFFFFFFFFFFFFFFFis used and0otherwise.
- _mm_cmpord_ ⚠ss (x86 or x86-64) and sseChecks if the lowestf32of both inputs are ordered. The lowest 32 bits of the result will be0xffffffffif neither ofa.extract(0)orb.extract(0)is a NaN, or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_cmpunord_ ⚠pd (x86 or x86-64) and sse2Compares corresponding elements inaandbto see if either isNaN.
- _mm_cmpunord_ ⚠ps (x86 or x86-64) and sseCompares each of the four floats inato the corresponding element inb. Returns four floats that have one of two possible bit patterns. The element in the output vector will be0xffffffffif the input elements inaandbare unordered (i.e., at least on of them is a NaN), or 0 otherwise.
- _mm_cmpunord_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the result of comparing both of the lower elements ofaandbtoNaN. If either is equal toNaNthen0xFFFFFFFFFFFFFFFFis used and0otherwise.
- _mm_cmpunord_ ⚠ss (x86 or x86-64) and sseChecks if the lowestf32of both inputs are unordered. The lowest 32 bits of the result will be0xffffffffif any ofa.extract(0)orb.extract(0)is a NaN, or0otherwise. The upper 96 bits of the result are the upper 96 bits ofa.
- _mm_comieq_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor equality.
- _mm_comieq_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if they are equal, or0otherwise.
- _mm_comige_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor greater-than-or-equal.
- _mm_comige_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais greater than or equal to the one fromb, or0otherwise.
- _mm_comigt_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor greater-than.
- _mm_comigt_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais greater than the one fromb, or0otherwise.
- _mm_comile_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor less-than-or-equal.
- _mm_comile_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais less than or equal to the one fromb, or0otherwise.
- _mm_comilt_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor less-than.
- _mm_comilt_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais less than the one fromb, or0otherwise.
- _mm_comineq_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor not-equal.
- _mm_comineq_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if they are not equal, or0otherwise.
- _mm_crc32_ ⚠u8 (x86 or x86-64) and sse4.2Starting with the initial value incrc, return the accumulated CRC32-C value for unsigned 8-bit integerv.
- _mm_crc32_ ⚠u16 (x86 or x86-64) and sse4.2Starting with the initial value incrc, return the accumulated CRC32-C value for unsigned 16-bit integerv.
- _mm_crc32_ ⚠u32 (x86 or x86-64) and sse4.2Starting with the initial value incrc, return the accumulated CRC32-C value for unsigned 32-bit integerv.
- _mm_crc32_ ⚠u64 sse4.2Starting with the initial value incrc, return the accumulated CRC32-C value for unsigned 64-bit integerv.
- _mm_cvt_ ⚠si2ss (x86 or x86-64) and sseAlias for_mm_cvtsi32_ss.
- _mm_cvt_ ⚠ss2si (x86 or x86-64) and sseAlias for_mm_cvtss_si32.
- _mm_cvtepi8_ ⚠epi16 (x86 or x86-64) and sse4.1Sign extend packed 8-bit integers inato packed 16-bit integers
- _mm_cvtepi8_ ⚠epi32 (x86 or x86-64) and sse4.1Sign extend packed 8-bit integers inato packed 32-bit integers
- _mm_cvtepi8_ ⚠epi64 (x86 or x86-64) and sse4.1Sign extend packed 8-bit integers in the low 8 bytes ofato packed 64-bit integers
- _mm_cvtepi16_ ⚠epi32 (x86 or x86-64) and sse4.1Sign extend packed 16-bit integers inato packed 32-bit integers
- _mm_cvtepi16_ ⚠epi64 (x86 or x86-64) and sse4.1Sign extend packed 16-bit integers inato packed 64-bit integers
- _mm_cvtepi32_ ⚠epi64 (x86 or x86-64) and sse4.1Sign extend packed 32-bit integers inato packed 64-bit integers
- _mm_cvtepi32_ ⚠pd (x86 or x86-64) and sse2Converts the lower two packed 32-bit integers inato packed double-precision (64-bit) floating-point elements.
- _mm_cvtepi32_ ⚠ps (x86 or x86-64) and sse2Converts packed 32-bit integers inato packed single-precision (32-bit) floating-point elements.
- _mm_cvtepu8_ ⚠epi16 (x86 or x86-64) and sse4.1Zeroes extend packed unsigned 8-bit integers inato packed 16-bit integers
- _mm_cvtepu8_ ⚠epi32 (x86 or x86-64) and sse4.1Zeroes extend packed unsigned 8-bit integers inato packed 32-bit integers
- _mm_cvtepu8_ ⚠epi64 (x86 or x86-64) and sse4.1Zeroes extend packed unsigned 8-bit integers inato packed 64-bit integers
- _mm_cvtepu16_ ⚠epi32 (x86 or x86-64) and sse4.1Zeroes extend packed unsigned 16-bit integers inato packed 32-bit integers
- _mm_cvtepu16_ ⚠epi64 (x86 or x86-64) and sse4.1Zeroes extend packed unsigned 16-bit integers inato packed 64-bit integers
- _mm_cvtepu32_ ⚠epi64 (x86 or x86-64) and sse4.1Zeroes extend packed unsigned 32-bit integers inato packed 64-bit integers
- _mm_cvtpd_ ⚠epi32 (x86 or x86-64) and sse2Converts packed double-precision (64-bit) floating-point elements inato packed 32-bit integers.
- _mm_cvtpd_ ⚠ps (x86 or x86-64) and sse2Converts packed double-precision (64-bit) floating-point elements inato packed single-precision (32-bit) floating-point elements
- _mm_cvtph_ ⚠ps (x86 or x86-64) and f16cConverts the 4 x 16-bit half-precision float values in the lowest 64-bit of the 128-bit vectorainto 4 x 32-bit float values stored in a 128-bit wide vector.
- _mm_cvtps_ ⚠epi32 (x86 or x86-64) and sse2Converts packed single-precision (32-bit) floating-point elements inato packed 32-bit integers.
- _mm_cvtps_ ⚠pd (x86 or x86-64) and sse2Converts packed single-precision (32-bit) floating-point elements inato packed double-precision (64-bit) floating-point elements.
- _mm_cvtps_ ⚠ph (x86 or x86-64) and f16cConverts the 4 x 32-bit float values in the 128-bit vectorainto 4 x 16-bit half-precision float values stored in the lowest 64-bit of a 128-bit vector.
- _mm_cvtsd_ ⚠f64 (x86 or x86-64) and sse2Returns the lower double-precision (64-bit) floating-point element ofa.
- _mm_cvtsd_ ⚠si32 (x86 or x86-64) and sse2Converts the lower double-precision (64-bit) floating-point element in a to a 32-bit integer.
- _mm_cvtsd_ ⚠si64 sse2Converts the lower double-precision (64-bit) floating-point element in a to a 64-bit integer.
- _mm_cvtsd_ ⚠si64x sse2Alias for_mm_cvtsd_si64
- _mm_cvtsd_ ⚠ss (x86 or x86-64) and sse2Converts the lower double-precision (64-bit) floating-point element inbto a single-precision (32-bit) floating-point element, store the result in the lower element of the return value, and copies the upper element fromato the upper element the return value.
- _mm_cvtsi32_ ⚠sd (x86 or x86-64) and sse2Returnsawith its lower element replaced bybafter converting it to anf64.
- _mm_cvtsi32_ ⚠si128 (x86 or x86-64) and sse2Returns a vector whose lowest element isaand all higher elements are0.
- _mm_cvtsi32_ ⚠ss (x86 or x86-64) and sseConverts a 32 bit integer to a 32 bit float. The result vector is the input vectorawith the lowest 32 bit float replaced by the converted integer.
- _mm_cvtsi64_ ⚠sd sse2Returnsawith its lower element replaced bybafter converting it to anf64.
- _mm_cvtsi64_ ⚠si128 sse2Returns a vector whose lowest element isaand all higher elements are0.
- _mm_cvtsi64_ ⚠ss sseConverts a 64 bit integer to a 32 bit float. The result vector is the input vectorawith the lowest 32 bit float replaced by the converted integer.
- _mm_cvtsi64x_ ⚠sd sse2Returnsawith its lower element replaced bybafter converting it to anf64.
- _mm_cvtsi64x_ ⚠si128 sse2Returns a vector whose lowest element isaand all higher elements are0.
- _mm_cvtsi128_ ⚠si32 (x86 or x86-64) and sse2Returns the lowest element ofa.
- _mm_cvtsi128_ ⚠si64 sse2Returns the lowest element ofa.
- _mm_cvtsi128_ ⚠si64x sse2Returns the lowest element ofa.
- _mm_cvtss_ ⚠f32 (x86 or x86-64) and sseExtracts the lowest 32 bit float from the input vector.
- _mm_cvtss_ ⚠sd (x86 or x86-64) and sse2Converts the lower single-precision (32-bit) floating-point element inbto a double-precision (64-bit) floating-point element, store the result in the lower element of the return value, and copies the upper element fromato the upper element the return value.
- _mm_cvtss_ ⚠si32 (x86 or x86-64) and sseConverts the lowest 32 bit float in the input vector to a 32 bit integer.
- _mm_cvtss_ ⚠si64 sseConverts the lowest 32 bit float in the input vector to a 64 bit integer.
- _mm_cvtt_ ⚠ss2si (x86 or x86-64) and sseAlias for_mm_cvttss_si32.
- _mm_cvttpd_ ⚠epi32 (x86 or x86-64) and sse2Converts packed double-precision (64-bit) floating-point elements inato packed 32-bit integers with truncation.
- _mm_cvttps_ ⚠epi32 (x86 or x86-64) and sse2Converts packed single-precision (32-bit) floating-point elements inato packed 32-bit integers with truncation.
- _mm_cvttsd_ ⚠si32 (x86 or x86-64) and sse2Converts the lower double-precision (64-bit) floating-point element inato a 32-bit integer with truncation.
- _mm_cvttsd_ ⚠si64 sse2Converts the lower double-precision (64-bit) floating-point element inato a 64-bit integer with truncation.
- _mm_cvttsd_ ⚠si64x sse2Alias for_mm_cvttsd_si64
- _mm_cvttss_ ⚠si32 (x86 or x86-64) and sseConverts the lowest 32 bit float in the input vector to a 32 bit integer with truncation.
- _mm_cvttss_ ⚠si64 sseConverts the lowest 32 bit float in the input vector to a 64 bit integer with truncation.
- _mm_div_ ⚠pd (x86 or x86-64) and sse2Divide packed double-precision (64-bit) floating-point elements inaby packed elements inb.
- _mm_div_ ⚠ps (x86 or x86-64) and sseDivides packed single-precision (32-bit) floating-point elements inaandb.
- _mm_div_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the result of diving the lower element ofaby the lower element ofb.
- _mm_div_ ⚠ss (x86 or x86-64) and sseDivides the first component ofbbya, the other components are copied froma.
- _mm_dp_ ⚠pd (x86 or x86-64) and sse4.1Returns the dot product of two __m128d vectors.
- _mm_dp_ ⚠ps (x86 or x86-64) and sse4.1Returns the dot product of two __m128 vectors.
- _mm_extract_ ⚠epi8 (x86 or x86-64) and sse4.1Extracts an 8-bit integer froma, selected withIMM8. Returns a 32-bit integer containing the zero-extended integer data.
- _mm_extract_ ⚠epi16 (x86 or x86-64) and sse2Returns theimm8element ofa.
- _mm_extract_ ⚠epi32 (x86 or x86-64) and sse4.1Extracts an 32-bit integer fromaselected withIMM8
- _mm_extract_ ⚠epi64 sse4.1Extracts an 64-bit integer fromaselected withIMM1
- _mm_extract_ ⚠ps (x86 or x86-64) and sse4.1Extracts a single-precision (32-bit) floating-point element froma, selected withIMM8. The returnedi32stores the float’s bit-pattern, and may be converted back to a floating point number via casting.
- _mm_extract_ ⚠si64 (x86 or x86-64) and sse4aExtracts the bit range specified byyfrom the lower 64 bits ofx.
- _mm_extracti_ ⚠si64 (x86 or x86-64) and sse4aExtracts the specified bits from the lower 64 bits of the 128-bit integer vector operand at the indexidxand of the lengthlen.
- _mm_floor_ ⚠pd (x86 or x86-64) and sse4.1Round the packed double-precision (64-bit) floating-point elements inadown to an integer value, and stores the results as packed double-precision floating-point elements.
- _mm_floor_ ⚠ps (x86 or x86-64) and sse4.1Round the packed single-precision (32-bit) floating-point elements inadown to an integer value, and stores the results as packed single-precision floating-point elements.
- _mm_floor_ ⚠sd (x86 or x86-64) and sse4.1Round the lower double-precision (64-bit) floating-point element inbdown to an integer value, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copies the upper element fromato the upper element of the intrinsic result.
- _mm_floor_ ⚠ss (x86 or x86-64) and sse4.1Round the lower single-precision (32-bit) floating-point element inbdown to an integer value, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copies the upper 3 packed elements fromato the upper elements of the intrinsic result.
- _mm_fmadd_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc.
- _mm_fmadd_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the intermediate result to packed elements inc.
- _mm_fmadd_ ⚠sd (x86 or x86-64) and fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and add the intermediate result to the lower element inc. Stores the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fmadd_ ⚠ss (x86 or x86-64) and fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and add the intermediate result to the lower element inc. Stores the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fmaddsub_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm_fmaddsub_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and alternatively add and subtract packed elements incto/from the intermediate result.
- _mm_fmsub_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result.
- _mm_fmsub_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the intermediate result.
- _mm_fmsub_ ⚠sd (x86 or x86-64) and fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and subtract the lower element incfrom the intermediate result. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fmsub_ ⚠ss (x86 or x86-64) and fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and subtract the lower element incfrom the intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fmsubadd_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm_fmsubadd_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and alternatively subtract and add packed elements incfrom/to the intermediate result.
- _mm_fnmadd_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc.
- _mm_fnmadd_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and add the negated intermediate result to packed elements inc.
- _mm_fnmadd_ ⚠sd (x86 or x86-64) and fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and add the negated intermediate result to the lower element inc. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fnmadd_ ⚠ss (x86 or x86-64) and fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and add the negated intermediate result to the lower element inc. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_fnmsub_ ⚠pd (x86 or x86-64) and fmaMultiplies packed double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result.
- _mm_fnmsub_ ⚠ps (x86 or x86-64) and fmaMultiplies packed single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result.
- _mm_fnmsub_ ⚠sd (x86 or x86-64) and fmaMultiplies the lower double-precision (64-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. Store the result in the lower element of the returned value, and copy the upper element fromato the upper elements of the result.
- _mm_fnmsub_ ⚠ss (x86 or x86-64) and fmaMultiplies the lower single-precision (32-bit) floating-point elements inaandb, and subtract packed elements incfrom the negated intermediate result. Store the result in the lower element of the returned value, and copy the 3 upper elements fromato the upper elements of the result.
- _mm_getcsr ⚠Deprecated (x86 or x86-64) and sseGets the unsigned 32-bit value of the MXCSR control and status register.
- _mm_hadd_ ⚠epi16 (x86 or x86-64) and ssse3Horizontally adds the adjacent pairs of values contained in 2 packed 128-bit vectors of[8 x i16].
- _mm_hadd_ ⚠epi32 (x86 or x86-64) and ssse3Horizontally adds the adjacent pairs of values contained in 2 packed 128-bit vectors of[4 x i32].
- _mm_hadd_ ⚠pd (x86 or x86-64) and sse3Horizontally adds adjacent pairs of double-precision (64-bit) floating-point elements inaandb, and pack the results.
- _mm_hadd_ ⚠ps (x86 or x86-64) and sse3Horizontally adds adjacent pairs of single-precision (32-bit) floating-point elements inaandb, and pack the results.
- _mm_hadds_ ⚠epi16 (x86 or x86-64) and ssse3Horizontally adds the adjacent pairs of values contained in 2 packed 128-bit vectors of[8 x i16]. Positive sums greater than 7FFFh are saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
- _mm_hsub_ ⚠epi16 (x86 or x86-64) and ssse3Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of[8 x i16].
- _mm_hsub_ ⚠epi32 (x86 or x86-64) and ssse3Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of[4 x i32].
- _mm_hsub_ ⚠pd (x86 or x86-64) and sse3Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements inaandb, and pack the results.
- _mm_hsub_ ⚠ps (x86 or x86-64) and sse3Horizontally adds adjacent pairs of single-precision (32-bit) floating-point elements inaandb, and pack the results.
- _mm_hsubs_ ⚠epi16 (x86 or x86-64) and ssse3Horizontally subtract the adjacent pairs of values contained in 2 packed 128-bit vectors of[8 x i16]. Positive differences greater than 7FFFh are saturated to 7FFFh. Negative differences less than 8000h are saturated to 8000h.
- _mm_i32gather_ ⚠epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32gather_ ⚠epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32gather_ ⚠pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i32gather_ ⚠ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_i64gather_ ⚠ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8.
- _mm_insert_ ⚠epi8 (x86 or x86-64) and sse4.1Returns a copy ofawith the 8-bit integer fromiinserted at a location specified byIMM8.
- _mm_insert_ ⚠epi16 (x86 or x86-64) and sse2Returns a new vector where theimm8element ofais replaced withi.
- _mm_insert_ ⚠epi32 (x86 or x86-64) and sse4.1Returns a copy ofawith the 32-bit integer fromiinserted at a location specified byIMM8.
- _mm_insert_ ⚠epi64 sse4.1Returns a copy ofawith the 64-bit integer fromiinserted at a location specified byIMM1.
- _mm_insert_ ⚠ps (x86 or x86-64) and sse4.1Select a single value inbto store at some position ina, Then zero elements according toIMM8.
- _mm_insert_ ⚠si64 (x86 or x86-64) and sse4aInserts the[length:0]bits ofyintoxatindex.
- _mm_inserti_ ⚠si64 (x86 or x86-64) and sse4aInserts thelenleast-significant bits from the lower 64 bits of the 128-bit integer vector operandyinto the lower 64 bits of the 128-bit integer vector operandxat the indexidxand of the lengthlen.
- _mm_lddqu_ ⚠si128 (x86 or x86-64) and sse3Loads 128-bits of integer data from unaligned memory. This intrinsic may perform better than_mm_loadu_si128when the data crosses a cache line boundary.
- _mm_lfence ⚠(x86 or x86-64) and sse2Performs a serializing operation on all load-from-memory instructions that were issued prior to this instruction.
- _mm_load1_ ⚠pd (x86 or x86-64) and sse2Loads a double-precision (64-bit) floating-point element from memory into both elements of returned vector.
- _mm_load1_ ⚠ps (x86 or x86-64) and sseConstruct a__m128by duplicating the value read frompinto all elements.
- _mm_load_ ⚠pd (x86 or x86-64) and sse2Loads 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into the returned vector.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_load_ ⚠pd1 (x86 or x86-64) and sse2Loads a double-precision (64-bit) floating-point element from memory into both elements of returned vector.
- _mm_load_ ⚠ps (x86 or x86-64) and sseLoads fourf32values from aligned memory into a__m128. If the pointer is not aligned to a 128-bit boundary (16 bytes) a general protection fault will be triggered (fatal program crash).
- _mm_load_ ⚠ps1 (x86 or x86-64) and sseAlias for_mm_load1_ps
- _mm_load_ ⚠sd (x86 or x86-64) and sse2Loads a 64-bit double-precision value to the low element of a 128-bit integer vector and clears the upper element.
- _mm_load_ ⚠si128 (x86 or x86-64) and sse2Loads 128-bits of integer data from memory into a new vector.
- _mm_load_ ⚠ss (x86 or x86-64) and sseConstruct a__m128with the lowest element read frompand the other elements set to zero.
- _mm_loaddup_ ⚠pd (x86 or x86-64) and sse3Loads a double-precision (64-bit) floating-point element from memory into both elements of return vector.
- _mm_loadh_ ⚠pd (x86 or x86-64) and sse2Loads a double-precision value into the high-order bits of a 128-bit vector of[2 x double]. The low-order bits are copied from the low-order bits of the first operand.
- _mm_loadl_ ⚠epi64 (x86 or x86-64) and sse2Loads 64-bit integer from memory into first element of returned vector.
- _mm_loadl_ ⚠pd (x86 or x86-64) and sse2Loads a double-precision value into the low-order bits of a 128-bit vector of[2 x double]. The high-order bits are copied from the high-order bits of the first operand.
- _mm_loadr_ ⚠pd (x86 or x86-64) and sse2Loads 2 double-precision (64-bit) floating-point elements from memory into the returned vector in reverse order.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_loadr_ ⚠ps (x86 or x86-64) and sseLoads fourf32values from aligned memory into a__m128in reverse order.
- _mm_loadu_ ⚠pd (x86 or x86-64) and sse2Loads 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into the returned vector.mem_addrdoes not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠ps (x86 or x86-64) and sseLoads fourf32values from memory into a__m128. There are no restrictions on memory alignment. For aligned memory_mm_load_psmay be faster.
- _mm_loadu_ ⚠si16 (x86 or x86-64) and sse2Loads unaligned 16-bits of integer data from memory into new vector.
- _mm_loadu_ ⚠si32 (x86 or x86-64) and sse2Loads unaligned 32-bits of integer data from memory into new vector.
- _mm_loadu_ ⚠si64 (x86 or x86-64) and sse2Loads unaligned 64-bits of integer data from memory into new vector.
- _mm_loadu_ ⚠si128 (x86 or x86-64) and sse2Loads 128-bits of integer data from memory into a new vector.
- _mm_madd_ ⚠epi16 (x86 or x86-64) and sse2Multiplies and then horizontally add signed 16 bit integers inaandb.
- _mm_maddubs_ ⚠epi16 (x86 or x86-64) and ssse3Multiplies corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination.
- _mm_mask_ ⚠i32gather_ epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32gather_ epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32gather_ pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i32gather_ ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ epi32 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ epi64 (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ pd (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_mask_ ⚠i64gather_ ps (x86 or x86-64) and avx2Returns values fromsliceat offsets determined byoffsets * scale, wherescaleshould be 1, 2, 4 or 8. If mask is set, load the value fromsrcin that position instead.
- _mm_maskload_ ⚠epi32 (x86 or x86-64) and avx2Loads packed 32-bit integers from memory pointed bymem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm_maskload_ ⚠epi64 (x86 or x86-64) and avx2Loads packed 64-bit integers from memory pointed bymem_addrusingmask(elements are zeroed out when the highest bit is not set in the corresponding element).
- _mm_maskload_ ⚠pd (x86 or x86-64) and avxLoads packed double-precision (64-bit) floating-point elements from memory into result usingmask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm_maskload_ ⚠ps (x86 or x86-64) and avxLoads packed single-precision (32-bit) floating-point elements from memory into result usingmask(elements are zeroed out when the high bit of the corresponding element is not set).
- _mm_maskmoveu_ ⚠si128 (x86 or x86-64) and sse2Conditionally store 8-bit integer elements fromainto memory usingmask.
- _mm_maskstore_ ⚠epi32 (x86 or x86-64) and avx2Stores packed 32-bit integers fromainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm_maskstore_ ⚠epi64 (x86 or x86-64) and avx2Stores packed 64-bit integers fromainto memory pointed bymem_addrusingmask(elements are not stored when the highest bit is not set in the corresponding element).
- _mm_maskstore_ ⚠pd (x86 or x86-64) and avxStores packed double-precision (64-bit) floating-point elements fromainto memory usingmask.
- _mm_maskstore_ ⚠ps (x86 or x86-64) and avxStores packed single-precision (32-bit) floating-point elements fromainto memory usingmask.
- _mm_max_ ⚠epi8 (x86 or x86-64) and sse4.1Compares packed 8-bit integers inaandband returns packed maximum values in dst.
- _mm_max_ ⚠epi16 (x86 or x86-64) and sse2Compares packed 16-bit integers inaandb, and returns the packed maximum values.
- _mm_max_ ⚠epi32 (x86 or x86-64) and sse4.1Compares packed 32-bit integers inaandb, and returns packed maximum values.
- _mm_max_ ⚠epu8 (x86 or x86-64) and sse2Compares packed unsigned 8-bit integers inaandb, and returns the packed maximum values.
- _mm_max_ ⚠epu16 (x86 or x86-64) and sse4.1Compares packed unsigned 16-bit integers inaandb, and returns packed maximum.
- _mm_max_ ⚠epu32 (x86 or x86-64) and sse4.1Compares packed unsigned 32-bit integers inaandb, and returns packed maximum values.
- _mm_max_ ⚠pd (x86 or x86-64) and sse2Returns a new vector with the maximum values from corresponding elements inaandb.
- _mm_max_ ⚠ps (x86 or x86-64) and sseCompares packed single-precision (32-bit) floating-point elements inaandb, and return the corresponding maximum values.
- _mm_max_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the maximum of the lower elements ofaandb.
- _mm_max_ ⚠ss (x86 or x86-64) and sseCompares the first single-precision (32-bit) floating-point element ofaandb, and return the maximum value in the first element of the return value, the other elements are copied froma.
- _mm_mfence ⚠(x86 or x86-64) and sse2Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction.
- _mm_min_ ⚠epi8 (x86 or x86-64) and sse4.1Compares packed 8-bit integers inaandband returns packed minimum values in dst.
- _mm_min_ ⚠epi16 (x86 or x86-64) and sse2Compares packed 16-bit integers inaandb, and returns the packed minimum values.
- _mm_min_ ⚠epi32 (x86 or x86-64) and sse4.1Compares packed 32-bit integers inaandb, and returns packed minimum values.
- _mm_min_ ⚠epu8 (x86 or x86-64) and sse2Compares packed unsigned 8-bit integers inaandb, and returns the packed minimum values.
- _mm_min_ ⚠epu16 (x86 or x86-64) and sse4.1Compares packed unsigned 16-bit integers inaandb, and returns packed minimum.
- _mm_min_ ⚠epu32 (x86 or x86-64) and sse4.1Compares packed unsigned 32-bit integers inaandb, and returns packed minimum values.
- _mm_min_ ⚠pd (x86 or x86-64) and sse2Returns a new vector with the minimum values from corresponding elements inaandb.
- _mm_min_ ⚠ps (x86 or x86-64) and sseCompares packed single-precision (32-bit) floating-point elements inaandb, and return the corresponding minimum values.
- _mm_min_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the minimum of the lower elements ofaandb.
- _mm_min_ ⚠ss (x86 or x86-64) and sseCompares the first single-precision (32-bit) floating-point element ofaandb, and return the minimum value in the first element of the return value, the other elements are copied froma.
- _mm_minpos_ ⚠epu16 (x86 or x86-64) and sse4.1Finds the minimum unsigned 16-bit element in the 128-bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero.
- _mm_move_ ⚠epi64 (x86 or x86-64) and sse2Returns a vector where the low element is extracted fromaand its upper element is zero.
- _mm_move_ ⚠sd (x86 or x86-64) and sse2Constructs a 128-bit floating-point vector of[2 x double]. The lower 64 bits are set to the lower 64 bits of the second parameter. The upper 64 bits are set to the upper 64 bits of the first parameter.
- _mm_move_ ⚠ss (x86 or x86-64) and sseReturns a__m128with the first component fromband the remaining components froma.
- _mm_movedup_ ⚠pd (x86 or x86-64) and sse3Duplicate the low double-precision (64-bit) floating-point element froma.
- _mm_movehdup_ ⚠ps (x86 or x86-64) and sse3Duplicate odd-indexed single-precision (32-bit) floating-point elements froma.
- _mm_movehl_ ⚠ps (x86 or x86-64) and sseCombine higher half ofaandb. The higher half ofboccupies the lower half of result.
- _mm_moveldup_ ⚠ps (x86 or x86-64) and sse3Duplicate even-indexed single-precision (32-bit) floating-point elements froma.
- _mm_movelh_ ⚠ps (x86 or x86-64) and sseCombine lower half ofaandb. The lower half ofboccupies the higher half of result.
- _mm_movemask_ ⚠epi8 (x86 or x86-64) and sse2Returns a mask of the most significant bit of each element ina.
- _mm_movemask_ ⚠pd (x86 or x86-64) and sse2Returns a mask of the most significant bit of each element ina.
- _mm_movemask_ ⚠ps (x86 or x86-64) and sseReturns a mask of the most significant bit of each element ina.
- _mm_mpsadbw_ ⚠epu8 (x86 or x86-64) and sse4.1Subtracts 8-bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand.
- _mm_mul_ ⚠epi32 (x86 or x86-64) and sse4.1Multiplies the low 32-bit integers from each packed 64-bit element inaandb, and returns the signed 64-bit result.
- _mm_mul_ ⚠epu32 (x86 or x86-64) and sse2Multiplies the low unsigned 32-bit integers from each packed 64-bit element inaandb.
- _mm_mul_ ⚠pd (x86 or x86-64) and sse2Multiplies packed double-precision (64-bit) floating-point elements inaandb.
- _mm_mul_ ⚠ps (x86 or x86-64) and sseMultiplies packed single-precision (32-bit) floating-point elements inaandb.
- _mm_mul_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by multiplying the low elements ofaandb.
- _mm_mul_ ⚠ss (x86 or x86-64) and sseMultiplies the first component ofaandb, the other components are copied froma.
- _mm_mulhi_ ⚠epi16 (x86 or x86-64) and sse2Multiplies the packed 16-bit integers inaandb.
- _mm_mulhi_ ⚠epu16 (x86 or x86-64) and sse2Multiplies the packed unsigned 16-bit integers inaandb.
- _mm_mulhrs_ ⚠epi16 (x86 or x86-64) and ssse3Multiplies packed 16-bit signed integer values, truncate the 32-bit product to the 18 most significant bits by right-shifting, round the truncated value by adding 1, and write bits[16:1]to the destination.
- _mm_mullo_ ⚠epi16 (x86 or x86-64) and sse2Multiplies the packed 16-bit integers inaandb.
- _mm_mullo_ ⚠epi32 (x86 or x86-64) and sse4.1Multiplies the packed 32-bit integers inaandb, producing intermediate 64-bit integers, and returns the lowest 32-bit, whatever they might be, reinterpreted as a signed integer. Whilepmulld __m128i::splat(2), __m128i::splat(2)returns the obvious__m128i::splat(4), due to wrapping arithmeticpmulld __m128i::splat(i32::MAX), __m128i::splat(2)would return a negative number.
- _mm_or_ ⚠pd (x86 or x86-64) and sse2Computes the bitwise OR ofaandb.
- _mm_or_ ⚠ps (x86 or x86-64) and sseBitwise OR of packed single-precision (32-bit) floating-point elements.
- _mm_or_ ⚠si128 (x86 or x86-64) and sse2Computes the bitwise OR of 128 bits (representing integer data) inaandb.
- _mm_packs_ ⚠epi16 (x86 or x86-64) and sse2Converts packed 16-bit integers fromaandbto packed 8-bit integers using signed saturation.
- _mm_packs_ ⚠epi32 (x86 or x86-64) and sse2Converts packed 32-bit integers fromaandbto packed 16-bit integers using signed saturation.
- _mm_packus_ ⚠epi16 (x86 or x86-64) and sse2Converts packed 16-bit integers fromaandbto packed 8-bit integers using unsigned saturation.
- _mm_packus_ ⚠epi32 (x86 or x86-64) and sse4.1Converts packed 32-bit integers fromaandbto packed 16-bit integers using unsigned saturation
- _mm_pause ⚠x86 or x86-64 Provides a hint to the processor that the code sequence is a spin-wait loop.
- _mm_permute_ ⚠pd (x86 or x86-64) and avxShuffles double-precision (64-bit) floating-point elements inausing the control inimm8.
- _mm_permute_ ⚠ps (x86 or x86-64) and avxShuffles single-precision (32-bit) floating-point elements inausing the control inimm8.
- _mm_permutevar_ ⚠pd (x86 or x86-64) and avxShuffles double-precision (64-bit) floating-point elements inausing the control inb.
- _mm_permutevar_ ⚠ps (x86 or x86-64) and avxShuffles single-precision (32-bit) floating-point elements inausing the control inb.
- _mm_prefetch ⚠(x86 or x86-64) and sseFetch the cache line that contains addresspusing the givenSTRATEGY.
- _mm_rcp_ ⚠ps (x86 or x86-64) and sseReturns the approximate reciprocal of packed single-precision (32-bit) floating-point elements ina.
- _mm_rcp_ ⚠ss (x86 or x86-64) and sseReturns the approximate reciprocal of the first single-precision (32-bit) floating-point element ina, the other elements are unchanged.
- _mm_round_ ⚠pd (x86 or x86-64) and sse4.1Round the packed double-precision (64-bit) floating-point elements inausing theROUNDINGparameter, and stores the results as packed double-precision floating-point elements. Rounding is done according to the rounding parameter, which can be one of:
- _mm_round_ ⚠ps (x86 or x86-64) and sse4.1Round the packed single-precision (32-bit) floating-point elements inausing theROUNDINGparameter, and stores the results as packed single-precision floating-point elements. Rounding is done according to the rounding parameter, which can be one of:
- _mm_round_ ⚠sd (x86 or x86-64) and sse4.1Round the lower double-precision (64-bit) floating-point element inbusing theROUNDINGparameter, store the result as a double-precision floating-point element in the lower element of the intrinsic result, and copies the upper element fromato the upper element of the intrinsic result. Rounding is done according to the rounding parameter, which can be one of:
- _mm_round_ ⚠ss (x86 or x86-64) and sse4.1Round the lower single-precision (32-bit) floating-point element inbusing theROUNDINGparameter, store the result as a single-precision floating-point element in the lower element of the intrinsic result, and copies the upper 3 packed elements fromato the upper elements of the intrinsic result. Rounding is done according to the rounding parameter, which can be one of:
- _mm_rsqrt_ ⚠ps (x86 or x86-64) and sseReturns the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements ina.
- _mm_rsqrt_ ⚠ss (x86 or x86-64) and sseReturns the approximate reciprocal square root of the first single-precision (32-bit) floating-point element ina, the other elements are unchanged.
- _mm_sad_ ⚠epu8 (x86 or x86-64) and sse2Sum the absolute differences of packed unsigned 8-bit integers.
- _mm_set1_ ⚠epi8 (x86 or x86-64) and sse2Broadcasts 8-bit integerato all elements.
- _mm_set1_ ⚠epi16 (x86 or x86-64) and sse2Broadcasts 16-bit integerato all elements.
- _mm_set1_ ⚠epi32 (x86 or x86-64) and sse2Broadcasts 32-bit integerato all elements.
- _mm_set1_ ⚠epi64x (x86 or x86-64) and sse2Broadcasts 64-bit integerato all elements.
- _mm_set1_ ⚠pd (x86 or x86-64) and sse2Broadcasts double-precision (64-bit) floating-point value a to all elements of the return value.
- _mm_set1_ ⚠ps (x86 or x86-64) and sseConstruct a__m128with all element set toa.
- _mm_set_ ⚠epi8 (x86 or x86-64) and sse2Sets packed 8-bit integers with the supplied values.
- _mm_set_ ⚠epi16 (x86 or x86-64) and sse2Sets packed 16-bit integers with the supplied values.
- _mm_set_ ⚠epi32 (x86 or x86-64) and sse2Sets packed 32-bit integers with the supplied values.
- _mm_set_ ⚠epi64x (x86 or x86-64) and sse2Sets packed 64-bit integers with the supplied values, from highest to lowest.
- _mm_set_ ⚠pd (x86 or x86-64) and sse2Sets packed double-precision (64-bit) floating-point elements in the return value with the supplied values.
- _mm_set_ ⚠pd1 (x86 or x86-64) and sse2Broadcasts double-precision (64-bit) floating-point value a to all elements of the return value.
- _mm_set_ ⚠ps (x86 or x86-64) and sseConstruct a__m128from four floating point values highest to lowest.
- _mm_set_ ⚠ps1 (x86 or x86-64) and sseAlias for_mm_set1_ps
- _mm_set_ ⚠sd (x86 or x86-64) and sse2Copies double-precision (64-bit) floating-point elementato the lower element of the packed 64-bit return value.
- _mm_set_ ⚠ss (x86 or x86-64) and sseConstruct a__m128with the lowest element set toaand the rest set to zero.
- _mm_setcsr ⚠Deprecated (x86 or x86-64) and sseSets the MXCSR register with the 32-bit unsigned integer value.
- _mm_setr_ ⚠epi8 (x86 or x86-64) and sse2Sets packed 8-bit integers with the supplied values in reverse order.
- _mm_setr_ ⚠epi16 (x86 or x86-64) and sse2Sets packed 16-bit integers with the supplied values in reverse order.
- _mm_setr_ ⚠epi32 (x86 or x86-64) and sse2Sets packed 32-bit integers with the supplied values in reverse order.
- _mm_setr_ ⚠pd (x86 or x86-64) and sse2Sets packed double-precision (64-bit) floating-point elements in the return value with the supplied values in reverse order.
- _mm_setr_ ⚠ps (x86 or x86-64) and sseConstruct a__m128from four floating point values lowest to highest.
- _mm_setzero_ ⚠pd (x86 or x86-64) and sse2Returns packed double-precision (64-bit) floating-point elements with all zeros.
- _mm_setzero_ ⚠ps (x86 or x86-64) and sseConstruct a__m128with all elements initialized to zero.
- _mm_setzero_ ⚠si128 (x86 or x86-64) and sse2Returns a vector with all elements set to zero.
- _mm_sfence ⚠(x86 or x86-64) and ssePerforms a serializing operation on all non-temporal (“streaming”) store instructions that were issued by the current thread prior to this instruction.
- _mm_sha1msg1_ ⚠epu32 (x86 or x86-64) and shaPerforms an intermediate calculation for the next four SHA1 message values (unsigned 32-bit integers) using previous message values fromaandb, and returning the result.
- _mm_sha1msg2_ ⚠epu32 (x86 or x86-64) and shaPerforms the final calculation for the next four SHA1 message values (unsigned 32-bit integers) using the intermediate result inaand the previous message values inb, and returns the result.
- _mm_sha1nexte_ ⚠epu32 (x86 or x86-64) and shaCalculate SHA1 state variable E after four rounds of operation from the current SHA1 state variablea, add that value to the scheduled values (unsigned 32-bit integers) inb, and returns the result.
- _mm_sha1rnds4_ ⚠epu32 (x86 or x86-64) and shaPerforms four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D) fromaand some pre-computed sum of the next 4 round message values (unsigned 32-bit integers), and state variable E fromb, and return the updated SHA1 state (A,B,C,D).FUNCcontains the logic functions and round constants.
- _mm_sha256msg1_ ⚠epu32 (x86 or x86-64) and shaPerforms an intermediate calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values fromaandb, and return the result.
- _mm_sha256msg2_ ⚠epu32 (x86 or x86-64) and shaPerforms the final calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values fromaandb, and return the result.
- _mm_sha256rnds2_ ⚠epu32 (x86 or x86-64) and shaPerforms 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) froma, an initial SHA256 state (A,B,E,F) fromb, and a pre-computed sum of the next 2 round message values (unsigned 32-bit integers) and the corresponding round constants fromk, and store the updated SHA256 state (A,B,E,F) in dst.
- _mm_shuffle_ ⚠epi8 (x86 or x86-64) and ssse3Shuffles bytes fromaaccording to the content ofb.
- _mm_shuffle_ ⚠epi32 (x86 or x86-64) and sse2Shuffles 32-bit integers inausing the control inIMM8.
- _mm_shuffle_ ⚠pd (x86 or x86-64) and sse2Constructs a 128-bit floating-point vector of[2 x double]from two 128-bit vector parameters of[2 x double], using the immediate-value parameter as a specifier.
- _mm_shuffle_ ⚠ps (x86 or x86-64) and sseShuffles packed single-precision (32-bit) floating-point elements inaandbusingMASK.
- _mm_shufflehi_ ⚠epi16 (x86 or x86-64) and sse2Shuffles 16-bit integers in the high 64 bits ofausing the control inIMM8.
- _mm_shufflelo_ ⚠epi16 (x86 or x86-64) and sse2Shuffles 16-bit integers in the low 64 bits ofausing the control inIMM8.
- _mm_sign_ ⚠epi8 (x86 or x86-64) and ssse3Negates packed 8-bit integers inawhen the corresponding signed 8-bit integer inbis negative, and returns the result. Elements in result are zeroed out when the corresponding element inbis zero.
- _mm_sign_ ⚠epi16 (x86 or x86-64) and ssse3Negates packed 16-bit integers inawhen the corresponding signed 16-bit integer inbis negative, and returns the results. Elements in result are zeroed out when the corresponding element inbis zero.
- _mm_sign_ ⚠epi32 (x86 or x86-64) and ssse3Negates packed 32-bit integers inawhen the corresponding signed 32-bit integer inbis negative, and returns the results. Element in result are zeroed out when the corresponding element inbis zero.
- _mm_sll_ ⚠epi16 (x86 or x86-64) and sse2Shifts packed 16-bit integers inaleft bycountwhile shifting in zeros.
- _mm_sll_ ⚠epi32 (x86 or x86-64) and sse2Shifts packed 32-bit integers inaleft bycountwhile shifting in zeros.
- _mm_sll_ ⚠epi64 (x86 or x86-64) and sse2Shifts packed 64-bit integers inaleft bycountwhile shifting in zeros.
- _mm_slli_ ⚠epi16 (x86 or x86-64) and sse2Shifts packed 16-bit integers inaleft byIMM8while shifting in zeros.
- _mm_slli_ ⚠epi32 (x86 or x86-64) and sse2Shifts packed 32-bit integers inaleft byIMM8while shifting in zeros.
- _mm_slli_ ⚠epi64 (x86 or x86-64) and sse2Shifts packed 64-bit integers inaleft byIMM8while shifting in zeros.
- _mm_slli_ ⚠si128 (x86 or x86-64) and sse2Shiftsaleft byIMM8bytes while shifting in zeros.
- _mm_sllv_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm_sllv_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaleft by the amount specified by the corresponding element incountwhile shifting in zeros, and returns the result.
- _mm_sqrt_ ⚠pd (x86 or x86-64) and sse2Returns a new vector with the square root of each of the values ina.
- _mm_sqrt_ ⚠ps (x86 or x86-64) and sseReturns the square root of packed single-precision (32-bit) floating-point elements ina.
- _mm_sqrt_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by the square root of the lower elementb.
- _mm_sqrt_ ⚠ss (x86 or x86-64) and sseReturns the square root of the first single-precision (32-bit) floating-point element ina, the other elements are unchanged.
- _mm_sra_ ⚠epi16 (x86 or x86-64) and sse2Shifts packed 16-bit integers inaright bycountwhile shifting in sign bits.
- _mm_sra_ ⚠epi32 (x86 or x86-64) and sse2Shifts packed 32-bit integers inaright bycountwhile shifting in sign bits.
- _mm_srai_ ⚠epi16 (x86 or x86-64) and sse2Shifts packed 16-bit integers inaright byIMM8while shifting in sign bits.
- _mm_srai_ ⚠epi32 (x86 or x86-64) and sse2Shifts packed 32-bit integers inaright byIMM8while shifting in sign bits.
- _mm_srav_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright by the amount specified by the corresponding element incountwhile shifting in sign bits.
- _mm_srl_ ⚠epi16 (x86 or x86-64) and sse2Shifts packed 16-bit integers inaright bycountwhile shifting in zeros.
- _mm_srl_ ⚠epi32 (x86 or x86-64) and sse2Shifts packed 32-bit integers inaright bycountwhile shifting in zeros.
- _mm_srl_ ⚠epi64 (x86 or x86-64) and sse2Shifts packed 64-bit integers inaright bycountwhile shifting in zeros.
- _mm_srli_ ⚠epi16 (x86 or x86-64) and sse2Shifts packed 16-bit integers inaright byIMM8while shifting in zeros.
- _mm_srli_ ⚠epi32 (x86 or x86-64) and sse2Shifts packed 32-bit integers inaright byIMM8while shifting in zeros.
- _mm_srli_ ⚠epi64 (x86 or x86-64) and sse2Shifts packed 64-bit integers inaright byIMM8while shifting in zeros.
- _mm_srli_ ⚠si128 (x86 or x86-64) and sse2Shiftsaright byIMM8bytes while shifting in zeros.
- _mm_srlv_ ⚠epi32 (x86 or x86-64) and avx2Shifts packed 32-bit integers inaright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm_srlv_ ⚠epi64 (x86 or x86-64) and avx2Shifts packed 64-bit integers inaright by the amount specified by the corresponding element incountwhile shifting in zeros,
- _mm_store1_ ⚠pd (x86 or x86-64) and sse2Stores the lower double-precision (64-bit) floating-point element fromainto 2 contiguous elements in memory.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store1_ ⚠ps (x86 or x86-64) and sseStores the lowest 32 bit float ofarepeated four times into aligned memory.
- _mm_store_ ⚠pd (x86 or x86-64) and sse2Stores 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) fromainto memory.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠pd1 (x86 or x86-64) and sse2Stores the lower double-precision (64-bit) floating-point element fromainto 2 contiguous elements in memory.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠ps (x86 or x86-64) and sseStores four 32-bit floats into aligned memory.
- _mm_store_ ⚠ps1 (x86 or x86-64) and sseAlias for_mm_store1_ps
- _mm_store_ ⚠sd (x86 or x86-64) and sse2Stores the lower 64 bits of a 128-bit vector of[2 x double]to a memory location.
- _mm_store_ ⚠si128 (x86 or x86-64) and sse2Stores 128-bits of integer data fromainto memory.
- _mm_store_ ⚠ss (x86 or x86-64) and sseStores the lowest 32 bit float ofainto memory.
- _mm_storeh_ ⚠pd (x86 or x86-64) and sse2Stores the upper 64 bits of a 128-bit vector of[2 x double]to a memory location.
- _mm_storel_ ⚠epi64 (x86 or x86-64) and sse2Stores the lower 64-bit integerato a memory location.
- _mm_storel_ ⚠pd (x86 or x86-64) and sse2Stores the lower 64 bits of a 128-bit vector of[2 x double]to a memory location.
- _mm_storer_ ⚠pd (x86 or x86-64) and sse2Stores 2 double-precision (64-bit) floating-point elements fromainto memory in reverse order.mem_addrmust be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_storer_ ⚠ps (x86 or x86-64) and sseStores four 32-bit floats into aligned memory in reverse order.
- _mm_storeu_ ⚠pd (x86 or x86-64) and sse2Stores 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) fromainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠ps (x86 or x86-64) and sseStores four 32-bit floats into memory. There are no restrictions on memory alignment. For aligned memory_mm_store_psmay be faster.
- _mm_storeu_ ⚠si16 (x86 or x86-64) and sse2Store 16-bit integer from the first element of a into memory.
- _mm_storeu_ ⚠si32 (x86 or x86-64) and sse2Store 32-bit integer from the first element of a into memory.
- _mm_storeu_ ⚠si64 (x86 or x86-64) and sse2Store 64-bit integer from the first element of a into memory.
- _mm_storeu_ ⚠si128 (x86 or x86-64) and sse2Stores 128-bits of integer data fromainto memory.
- _mm_stream_ ⚠load_ si128 (x86 or x86-64) and sse4.1Load 128-bits of integer data from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm_stream_ ⚠pd (x86 or x86-64) and sse2Stores a 128-bit floating point vector of[2 x double]to a 128-bit aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠ps (x86 or x86-64) and sseStoresainto the memory atmem_addrusing a non-temporal memory hint.
- _mm_stream_ ⚠sd (x86 or x86-64) and sse4aNon-temporal store ofa.0intop.
- _mm_stream_ ⚠si32 (x86 or x86-64) and sse2Stores a 32-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠si64 sse2Stores a 64-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠si128 (x86 or x86-64) and sse2Stores a 128-bit integer vector to a 128-bit aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon).
- _mm_stream_ ⚠ss (x86 or x86-64) and sse4aNon-temporal store ofa.0intop.
- _mm_sub_ ⚠epi8 (x86 or x86-64) and sse2Subtracts packed 8-bit integers inbfrom packed 8-bit integers ina.
- _mm_sub_ ⚠epi16 (x86 or x86-64) and sse2Subtracts packed 16-bit integers inbfrom packed 16-bit integers ina.
- _mm_sub_ ⚠epi32 (x86 or x86-64) and sse2Subtract packed 32-bit integers inbfrom packed 32-bit integers ina.
- _mm_sub_ ⚠epi64 (x86 or x86-64) and sse2Subtract packed 64-bit integers inbfrom packed 64-bit integers ina.
- _mm_sub_ ⚠pd (x86 or x86-64) and sse2Subtract packed double-precision (64-bit) floating-point elements inbfroma.
- _mm_sub_ ⚠ps (x86 or x86-64) and sseSubtracts packed single-precision (32-bit) floating-point elements inaandb.
- _mm_sub_ ⚠sd (x86 or x86-64) and sse2Returns a new vector with the low element ofareplaced by subtracting the low element bybfrom the low element ofa.
- _mm_sub_ ⚠ss (x86 or x86-64) and sseSubtracts the first component ofbfroma, the other components are copied froma.
- _mm_subs_ ⚠epi8 (x86 or x86-64) and sse2Subtract packed 8-bit integers inbfrom packed 8-bit integers inausing saturation.
- _mm_subs_ ⚠epi16 (x86 or x86-64) and sse2Subtract packed 16-bit integers inbfrom packed 16-bit integers inausing saturation.
- _mm_subs_ ⚠epu8 (x86 or x86-64) and sse2Subtract packed unsigned 8-bit integers inbfrom packed unsigned 8-bit integers inausing saturation.
- _mm_subs_ ⚠epu16 (x86 or x86-64) and sse2Subtract packed unsigned 16-bit integers inbfrom packed unsigned 16-bit integers inausing saturation.
- _mm_test_ ⚠all_ ones (x86 or x86-64) and sse4.1Tests whether the specified bits ina128-bit integer vector are all ones.
- _mm_test_ ⚠all_ zeros (x86 or x86-64) and sse4.1Tests whether the specified bits in a 128-bit integer vector are all zeros.
- _mm_test_ ⚠mix_ ones_ zeros (x86 or x86-64) and sse4.1Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones.
- _mm_testc_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) inaandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm_testc_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) inaandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theCFvalue.
- _mm_testc_ ⚠si128 (x86 or x86-64) and sse4.1Tests whether the specified bits in a 128-bit integer vector are all ones.
- _mm_testnzc_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) inaandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm_testnzc_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) inaandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return 1 if both theZFandCFvalues are zero, otherwise return 0.
- _mm_testnzc_ ⚠si128 (x86 or x86-64) and sse4.1Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones.
- _mm_testz_ ⚠pd (x86 or x86-64) and avxComputes the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) inaandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm_testz_ ⚠ps (x86 or x86-64) and avxComputes the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) inaandb, producing an intermediate 128-bit value, and setZFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setZFto 0. Compute the bitwise NOT ofaand then AND withb, producing an intermediate value, and setCFto 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise setCFto 0. Return theZFvalue.
- _mm_testz_ ⚠si128 (x86 or x86-64) and sse4.1Tests whether the specified bits in a 128-bit integer vector are all zeros.
- _mm_tzcnt_ ⚠32 (x86 or x86-64) and bmi1Counts the number of trailing least significant zero bits.
- _mm_tzcnt_ ⚠64 bmi1Counts the number of trailing least significant zero bits.
- _mm_ucomieq_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor equality.
- _mm_ucomieq_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if they are equal, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomige_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor greater-than-or-equal.
- _mm_ucomige_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais greater than or equal to the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomigt_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor greater-than.
- _mm_ucomigt_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais greater than the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomile_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor less-than-or-equal.
- _mm_ucomile_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais less than or equal to the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomilt_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor less-than.
- _mm_ucomilt_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if the value fromais less than the one fromb, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_ucomineq_ ⚠sd (x86 or x86-64) and sse2Compares the lower element ofaandbfor not-equal.
- _mm_ucomineq_ ⚠ss (x86 or x86-64) and sseCompares two 32-bit floats from the low-order bits ofaandb. Returns1if they are not equal, or0otherwise. This instruction will not signal an exception if either argument is a quiet NaN.
- _mm_undefined_ ⚠pd (x86 or x86-64) and sse2Returns vector of type __m128d with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm_undefined_ ⚠ps (x86 or x86-64) and sseReturns vector of type __m128 with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm_undefined_ ⚠si128 (x86 or x86-64) and sse2Returns vector of type __m128i with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm_unpackhi_ ⚠epi8 (x86 or x86-64) and sse2Unpacks and interleave 8-bit integers from the high half ofaandb.
- _mm_unpackhi_ ⚠epi16 (x86 or x86-64) and sse2Unpacks and interleave 16-bit integers from the high half ofaandb.
- _mm_unpackhi_ ⚠epi32 (x86 or x86-64) and sse2Unpacks and interleave 32-bit integers from the high half ofaandb.
- _mm_unpackhi_ ⚠epi64 (x86 or x86-64) and sse2Unpacks and interleave 64-bit integers from the high half ofaandb.
- _mm_unpackhi_ ⚠pd (x86 or x86-64) and sse2The resulting__m128delement is composed by the low-order values of the two__m128dinterleaved input elements, i.e.:
- _mm_unpackhi_ ⚠ps (x86 or x86-64) and sseUnpacks and interleave single-precision (32-bit) floating-point elements from the higher half ofaandb.
- _mm_unpacklo_ ⚠epi8 (x86 or x86-64) and sse2Unpacks and interleave 8-bit integers from the low half ofaandb.
- _mm_unpacklo_ ⚠epi16 (x86 or x86-64) and sse2Unpacks and interleave 16-bit integers from the low half ofaandb.
- _mm_unpacklo_ ⚠epi32 (x86 or x86-64) and sse2Unpacks and interleave 32-bit integers from the low half ofaandb.
- _mm_unpacklo_ ⚠epi64 (x86 or x86-64) and sse2Unpacks and interleave 64-bit integers from the low half ofaandb.
- _mm_unpacklo_ ⚠pd (x86 or x86-64) and sse2The resulting__m128delement is composed by the high-order values of the two__m128dinterleaved input elements, i.e.:
- _mm_unpacklo_ ⚠ps (x86 or x86-64) and sseUnpacks and interleave single-precision (32-bit) floating-point elements from the lower half ofaandb.
- _mm_xor_ ⚠pd (x86 or x86-64) and sse2Computes the bitwise XOR ofaandb.
- _mm_xor_ ⚠ps (x86 or x86-64) and sseBitwise exclusive OR of packed single-precision (32-bit) floating-point elements.
- _mm_xor_ ⚠si128 (x86 or x86-64) and sse2Computes the bitwise XOR of 128 bits (representing integer data) inaandb.
- _mulx_u32 ⚠(x86 or x86-64) and bmi2Unsigned multiply without affecting flags.
- _mulx_u64 ⚠bmi2Unsigned multiply without affecting flags.
- _pdep_u32 ⚠(x86 or x86-64) and bmi2Scatter contiguous low order bits ofato the result at the positions specified by themask.
- _pdep_u64 ⚠bmi2Scatter contiguous low order bits ofato the result at the positions specified by themask.
- _pext_u32 ⚠(x86 or x86-64) and bmi2Gathers the bits ofxspecified by themaskinto the contiguous low order bit positions of the result.
- _pext_u64 ⚠bmi2Gathers the bits ofxspecified by themaskinto the contiguous low order bit positions of the result.
- _popcnt32⚠(x86 or x86-64) and popcntCounts the bits that are set.
- _popcnt64⚠popcntCounts the bits that are set.
- _rdrand16_step ⚠(x86 or x86-64) and rdrandRead a hardware generated 16-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise.
- _rdrand32_step ⚠(x86 or x86-64) and rdrandRead a hardware generated 32-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise.
- _rdrand64_step ⚠rdrandRead a hardware generated 64-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise.
- _rdseed16_step ⚠(x86 or x86-64) and rdseedRead a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise.
- _rdseed32_step ⚠(x86 or x86-64) and rdseedRead a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise.
- _rdseed64_step ⚠rdseedRead a 64-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise.
- _rdtsc⚠x86 or x86-64 Reads the current value of the processor’s time-stamp counter.
- _subborrow_u32 ⚠x86 or x86-64 Adds unsigned 32-bit integersaandbwith unsigned 8-bit carry-inc_in(carry or overflow flag), and store the unsigned 32-bit result inout, and the carry-out is returned (carry or overflow flag).
- Adds unsigned 64-bit integersaandbwith unsigned 8-bit carry-inc_in. (carry or overflow flag), and store the unsigned 64-bit result inout, and the carry-out is returned (carry or overflow flag).
- _t1mskc_u32 ⚠(x86 or x86-64) and tbmClears all bits below the least significant zero ofxand sets all other bits.
- _t1mskc_u64 ⚠tbmClears all bits below the least significant zero ofxand sets all other bits.
- _tzcnt_u16 ⚠(x86 or x86-64) and bmi1Counts the number of trailing least significant zero bits.
- _tzcnt_u32 ⚠(x86 or x86-64) and bmi1Counts the number of trailing least significant zero bits.
- _tzcnt_u64 ⚠bmi1Counts the number of trailing least significant zero bits.
- _tzmsk_u32 ⚠(x86 or x86-64) and tbmSets all bits below the least significant one ofxand clears all other bits.
- _tzmsk_u64 ⚠tbmSets all bits below the least significant one ofxand clears all other bits.
- _xgetbv⚠(x86 or x86-64) and xsaveReads the contents of the extended control registerXCRspecified inxcr_no.
- _xrstor⚠(x86 or x86-64) and xsavePerforms a full or partial restore of the enabled processor states using the state information stored in memory atmem_addr.
- _xrstor64⚠xsavePerforms a full or partial restore of the enabled processor states using the state information stored in memory atmem_addr.
- _xrstors⚠(x86 or x86-64) and xsave,xsavesPerforms a full or partial restore of the enabled processor states using the state information stored in memory atmem_addr.
- _xrstors64⚠xsave,xsavesPerforms a full or partial restore of the enabled processor states using the state information stored in memory atmem_addr.
- _xsave⚠(x86 or x86-64) and xsavePerforms a full or partial save of the enabled processor states to memory atmem_addr.
- _xsave64⚠xsavePerforms a full or partial save of the enabled processor states to memory atmem_addr.
- _xsavec⚠(x86 or x86-64) and xsave,xsavecPerforms a full or partial save of the enabled processor states to memory atmem_addr.
- _xsavec64⚠xsave,xsavecPerforms a full or partial save of the enabled processor states to memory atmem_addr.
- _xsaveopt⚠(x86 or x86-64) and xsave,xsaveoptPerforms a full or partial save of the enabled processor states to memory atmem_addr.
- _xsaveopt64⚠xsave,xsaveoptPerforms a full or partial save of the enabled processor states to memory atmem_addr.
- _xsaves⚠(x86 or x86-64) and xsave,xsavesPerforms a full or partial save of the enabled processor states to memory atmem_addr
- _xsaves64⚠xsave,xsavesPerforms a full or partial save of the enabled processor states to memory atmem_addr
- _xsetbv⚠(x86 or x86-64) and xsaveCopies 64-bits fromvalto the extended control register (XCR) specified bya.
- cmpxchg16b⚠cmpxchg16bCompares and exchange 16 bytes (128 bits) of data atomically.
- _MM_SHUFFLE Experimental x86 or x86-64 A utility function for creating masks to use with Intel shuffle and permute intrinsics.
- _cvtmask8_u32 ⚠Experimental (x86 or x86-64) and avx512dqConvert 8-bit mask a to a 32-bit integer value and store the result in dst.
- _cvtmask16_u32 ⚠Experimental (x86 or x86-64) and avx512fConvert 16-bit mask a into an integer value, and store the result in dst.
- _cvtmask32_u32 ⚠Experimental (x86 or x86-64) and avx512bwConvert 32-bit mask a into an integer value, and store the result in dst.
- _cvtmask64_u64 ⚠Experimental avx512bwConvert 64-bit mask a into an integer value, and store the result in dst.
- _cvtu32_mask8 ⚠Experimental (x86 or x86-64) and avx512dqConvert 32-bit integer value a to an 8-bit mask and store the result in dst.
- _cvtu32_mask16 ⚠Experimental (x86 or x86-64) and avx512fConvert 32-bit integer value a to an 16-bit mask and store the result in dst.
- _cvtu32_mask32 ⚠Experimental (x86 or x86-64) and avx512bwConvert integer value a into an 32-bit mask, and store the result in k.
- _cvtu64_mask64 ⚠Experimental avx512bwConvert integer value a into an 64-bit mask, and store the result in k.
- _kadd_mask8 ⚠Experimental (x86 or x86-64) and avx512dqAdd 8-bit masks a and b, and store the result in dst.
- _kadd_mask16 ⚠Experimental (x86 or x86-64) and avx512dqAdd 16-bit masks a and b, and store the result in dst.
- _kadd_mask32 ⚠Experimental (x86 or x86-64) and avx512bwAdd 32-bit masks in a and b, and store the result in k.
- _kadd_mask64 ⚠Experimental (x86 or x86-64) and avx512bwAdd 64-bit masks in a and b, and store the result in k.
- _kand_mask8 ⚠Experimental (x86 or x86-64) and avx512dqBitwise AND of 8-bit masks a and b, and store the result in dst.
- _kand_mask16 ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of 16-bit masks a and b, and store the result in k.
- _kand_mask32 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of 32-bit masks a and b, and store the result in k.
- _kand_mask64 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of 64-bit masks a and b, and store the result in k.
- _kandn_mask8 ⚠Experimental (x86 or x86-64) and avx512dqBitwise AND NOT of 8-bit masks a and b, and store the result in dst.
- _kandn_mask16 ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of 16-bit masks a and then AND with b, and store the result in k.
- _kandn_mask32 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise NOT of 32-bit masks a and then AND with b, and store the result in k.
- _kandn_mask64 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise NOT of 64-bit masks a and then AND with b, and store the result in k.
- _knot_mask8 ⚠Experimental (x86 or x86-64) and avx512dqBitwise NOT of 8-bit mask a, and store the result in dst.
- _knot_mask16 ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of 16-bit mask a, and store the result in k.
- _knot_mask32 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise NOT of 32-bit mask a, and store the result in k.
- _knot_mask64 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise NOT of 64-bit mask a, and store the result in k.
- _kor_mask8 ⚠Experimental (x86 or x86-64) and avx512dqBitwise OR of 8-bit masks a and b, and store the result in dst.
- _kor_mask16 ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of 16-bit masks a and b, and store the result in k.
- _kor_mask32 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 32-bit masks a and b, and store the result in k.
- _kor_mask64 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 64-bit masks a and b, and store the result in k.
- _kortest_mask8_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of 8-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortest_mask16_ ⚠u8 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of 16-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortest_mask32_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 32-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortest_mask64_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 64-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst. If the result is all ones, store 1 in all_ones, otherwise store 0 in all_ones.
- _kortestc_mask8_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of 8-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestc_mask16_ ⚠u8 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of 16-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestc_mask32_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 32-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestc_mask64_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 64-bit masks a and b. If the result is all ones, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask8_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of 8-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask16_ ⚠u8 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of 16-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask32_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 32-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kortestz_mask64_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise OR of 64-bit masks a and b. If the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kshiftli_mask8 ⚠Experimental (x86 or x86-64) and avx512dqShift 8-bit mask a left by count bits while shifting in zeros, and store the result in dst.
- _kshiftli_mask16 ⚠Experimental (x86 or x86-64) and avx512fShift 16-bit mask a left by count bits while shifting in zeros, and store the result in dst.
- _kshiftli_mask32 ⚠Experimental (x86 or x86-64) and avx512bwShift the bits of 32-bit mask a left by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _kshiftli_mask64 ⚠Experimental (x86 or x86-64) and avx512bwShift the bits of 64-bit mask a left by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _kshiftri_mask8 ⚠Experimental (x86 or x86-64) and avx512dqShift 8-bit mask a right by count bits while shifting in zeros, and store the result in dst.
- _kshiftri_mask16 ⚠Experimental (x86 or x86-64) and avx512fShift 16-bit mask a right by count bits while shifting in zeros, and store the result in dst.
- _kshiftri_mask32 ⚠Experimental (x86 or x86-64) and avx512bwShift the bits of 32-bit mask a right by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _kshiftri_mask64 ⚠Experimental (x86 or x86-64) and avx512bwShift the bits of 64-bit mask a right by count while shifting in zeros, and store the least significant 32 bits of the result in k.
- _ktest_mask8_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of 8-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktest_mask16_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of 16-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktest_mask32_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of 32-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktest_mask64_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of 64-bit masks a and b, and if the result is all zeros, store 1 in dst, otherwise store 0 in dst. Compute the bitwise NOT of a and then AND with b, if the result is all zeros, store 1 in and_not, otherwise store 0 in and_not.
- _ktestc_mask8_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of 8-bit mask a and then AND with 8-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestc_mask16_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of 16-bit mask a and then AND with 16-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestc_mask32_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise NOT of 32-bit mask a and then AND with 16-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestc_mask64_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise NOT of 64-bit mask a and then AND with 8-bit mask b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask8_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of 8-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask16_ ⚠u8 Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of 16-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask32_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of 32-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _ktestz_mask64_ ⚠u8 Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of 64-bit masks a and b, if the result is all zeros, store 1 in dst, otherwise store 0 in dst.
- _kxnor_mask8 ⚠Experimental (x86 or x86-64) and avx512dqBitwise XNOR of 8-bit masks a and b, and store the result in dst.
- _kxnor_mask16 ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise XNOR of 16-bit masks a and b, and store the result in k.
- _kxnor_mask32 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise XNOR of 32-bit masks a and b, and store the result in k.
- _kxnor_mask64 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise XNOR of 64-bit masks a and b, and store the result in k.
- _kxor_mask8 ⚠Experimental (x86 or x86-64) and avx512dqBitwise XOR of 8-bit masks a and b, and store the result in dst.
- _kxor_mask16 ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of 16-bit masks a and b, and store the result in k.
- _kxor_mask32 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise XOR of 32-bit masks a and b, and store the result in k.
- _kxor_mask64 ⚠Experimental (x86 or x86-64) and avx512bwCompute the bitwise XOR of 64-bit masks a and b, and store the result in k.
- _load_mask8 ⚠Experimental (x86 or x86-64) and avx512dqLoad 8-bit mask from memory
- _load_mask16 ⚠Experimental (x86 or x86-64) and avx512fLoad 16-bit mask from memory
- _load_mask32 ⚠Experimental (x86 or x86-64) and avx512bwLoad 32-bit mask from memory into k.
- _load_mask64 ⚠Experimental (x86 or x86-64) and avx512bwLoad 64-bit mask from memory into k.
- _mm256_abs_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst.
- _mm256_abs_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlFinds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm256_add_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlAdd packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_aesdec_ ⚠epi128 Experimental (x86 or x86-64) and vaesPerforms one round of an AES decryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm256_aesdeclast_ ⚠epi128 Experimental (x86 or x86-64) and vaesPerforms the last round of an AES decryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm256_aesenc_ ⚠epi128 Experimental (x86 or x86-64) and vaesPerforms one round of an AES encryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm256_aesenclast_ ⚠epi128 Experimental (x86 or x86-64) and vaesPerforms the last round of an AES encryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm256_alignr_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 64-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 32 bytes (8 elements) in dst.
- _mm256_alignr_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 64-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 32 bytes (4 elements) in dst.
- _mm256_bcstnebf16_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert scalar BF16 (16-bit) floating point element stored at memory locations starting at location a to single precision (32-bit) floating-point, broadcast it to packed single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_bcstnesh_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location a to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_bitshuffle_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512bitalg,avx512vlConsiders the inputbas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm256_broadcast_ ⚠f32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm256_broadcast_ ⚠f32x4 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm256_broadcast_ ⚠f64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst.
- _mm256_broadcast_ ⚠i32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst.
- _mm256_broadcast_ ⚠i32x4 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the 4 packed 32-bit integers from a to all elements of dst.
- _mm256_broadcast_ ⚠i64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the 2 packed 64-bit integers from a to all elements of dst.
- _mm256_broadcastmb_ ⚠epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlBroadcast the low 8-bits from input mask k to all 64-bit elements of dst.
- _mm256_broadcastmw_ ⚠epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlBroadcast the low 16-bits from input mask k to all 32-bit elements of dst.
- _mm256_castpd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256dto type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph128_ ⚠ph256 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128hto type__m256h. The upper 8 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm256_castph256_ ⚠ph128 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph_ ⚠pd Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph_ ⚠ps Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castph_ ⚠si256 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castps_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256to type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_castsi256_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256ito type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm256_clmulepi64_ ⚠epi128 Experimental (x86 or x86-64) and vpclmulqdqPerforms a carry-less multiplication of two 64-bit polynomials over the finite field GF(2) - in each of the 2 128-bit lanes.
- _mm256_cmp_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmp_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpeq_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpge_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmpgt_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm256_cmple_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmple_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmplt_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmpneq_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm256_cmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_conflict_ ⚠epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_conflict_ ⚠epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_conj_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_cvtepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi16_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi32_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm256_cvtepi64_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepi64_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_cvtepi64_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu16_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu32_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu32_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu64_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtepu64_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_cvtepu64_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtne2ps_ ⚠pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in two 256-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 256-bit wide vector. Intel’s documentation
- _mm256_cvtneebf16_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneeph_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneobf16_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneoph_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneps_ ⚠avx_ pbh Experimental (x86 or x86-64) and avxneconvertConvert packed single precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm256_cvtneps_ ⚠pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
- _mm256_cvtpbh_ ⚠ps Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtpd_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm256_cvtpd_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm256_cvtpd_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm256_cvtpd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm256_cvtph_ ⚠epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm256_cvtph_ ⚠epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm256_cvtph_ ⚠epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm256_cvtph_ ⚠epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm256_cvtph_ ⚠epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm256_cvtph_ ⚠epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm256_cvtph_ ⚠pd Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm256_cvtps_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm256_cvtps_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm256_cvtps_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm256_cvtsepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst.
- _mm256_cvtsh_ ⚠h Experimental (x86 or x86-64) and avx512fp16Copy the lower half-precision (16-bit) floating-point element fromatodst.
- _mm256_cvttpd_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm256_cvttpd_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm256_cvttpd_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm256_cvttph_ ⚠epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ ⚠epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ ⚠epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ ⚠epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm256_cvttph_ ⚠epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm256_cvttph_ ⚠epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm256_cvttps_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm256_cvttps_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm256_cvttps_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm256_cvtusepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtusepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst.
- _mm256_cvtxph_ ⚠ps Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm256_cvtxps_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm256_dbsad_ ⚠epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm256_div_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlDivide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm256_dpbf16_ ⚠ps Experimental (x86 or x86-64) and avx512bf16,avx512vlCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm256_dpbssd_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbssds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbsud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbsuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbusd_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbusd_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbusds_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbusds_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpbuud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpbuuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwssd_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwssd_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwssds_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwssds_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwsud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwsuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwusd_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwusds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_dpwuud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm256_dpwuuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm256_extractf32x4_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlExtract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
- _mm256_extractf64x2_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst.
- _mm256_extracti32x4_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlExtract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM1, and store the result in dst.
- _mm256_extracti64x2_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlExtracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst.
- _mm256_fcmadd_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_fcmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_fixupimm_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm256_fixupimm_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm256_fmadd_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_fmadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm256_fmaddsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_fmsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm256_fmsubadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm256_fmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_fnmadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm256_fnmsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm256_fpclass_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_fpclass_ ⚠ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlTest packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_fpclass_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm256_getexp_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_getexp_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm256_getexp_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_getmant_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_getmant_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlNormalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm256_getmant_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_gf2p8affine_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and gfni,avxPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_gf2p8affineinv_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and gfni,avxPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_gf2p8mul_ ⚠epi8 Experimental (x86 or x86-64) and gfni,avxPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm256_i32scatter_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 8 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm256_i32scatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlScatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm256_i32scatter_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm256_i32scatter_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 8 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_i64scatter_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm256_insertf32x4 ⚠Experimental (x86 or x86-64) and avx512f,avx512vlCopy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8.
- _mm256_insertf64x2 ⚠Experimental (x86 or x86-64) and avx512dq,avx512vlCopy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by IMM8.
- _mm256_inserti32x4 ⚠Experimental (x86 or x86-64) and avx512f,avx512vlCopy a to dst, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm8.
- _mm256_inserti64x2 ⚠Experimental (x86 or x86-64) and avx512dq,avx512vlCopy a to dst, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by IMM8.
- _mm256_load_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 256-bits (composed of 8 packed 32-bit integers) from memory into dst. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 256-bits (composed of 4 packed 64-bit integers) from memory into dst. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_load_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlLoad 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_loadu_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad 256-bits (composed of 32 packed 8-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad 256-bits (composed of 16 packed 16-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 256-bits (composed of 8 packed 32-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 256-bits (composed of 4 packed 64-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm256_loadu_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlLoad 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm256_lzcnt_ ⚠epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst.
- _mm256_lzcnt_ ⚠epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst.
- _mm256_madd52hi_ ⚠avx_ epu64 Experimental (x86 or x86-64) and avxifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd52hi_ ⚠epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd52lo_ ⚠avx_ epu64 Experimental (x86 or x86-64) and avxifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_madd52lo_ ⚠epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm256_mask2_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask2_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask2_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask2_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask2_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set)
- _mm256_mask2_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask3_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask3_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm256_mask3_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm256_mask_ ⚠abs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠abs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠abs_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠abs_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ pd Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlAdd packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠add_ ps Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠adds_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠adds_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠adds_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠adds_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠alignr_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠alignr_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 64-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 32 bytes (8 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠alignr_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 64-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 32 bytes (4 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlPerforms element-by-element bitwise AND between packed 32-bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠and_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠and_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠andnot_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠andnot_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠andnot_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠andnot_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠avg_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠avg_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠bitshuffle_ epi64_ mask Experimental (x86 or x86-64) and avx512bitalg,avx512vlConsiders the inputbas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm256_mask_ ⚠blend_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBlend packed 8-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠blend_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBlend packed 16-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠blend_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed 32-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠blend_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed 64-bit integers from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠blend_ pd Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠blend_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlBlend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠blend_ ps Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm256_mask_ ⚠broadcast_ f32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠broadcast_ f32x4 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcast_ f64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠broadcast_ i32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠broadcast_ i32x4 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the 4 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcast_ i64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the 2 packed 64-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠broadcastb_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 8-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcastd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 32-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcastq_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 64-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcastsd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcastss_ ps Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠broadcastw_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 16-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ pd_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmp_ ps_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpeq_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpge_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpgt_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmple_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmplt_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmpneq_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ ⚠compress_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compress_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compress_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compress_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compress_ pd Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compress_ ps Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm256_mask_ ⚠compressstoreu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠compressstoreu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠conflict_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_mask_ ⚠conflict_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_mask_ ⚠conj_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ ⚠cvt_ roundps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:
- _mm256_mask_ ⚠cvtepi8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepi64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtepi64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_mask_ ⚠cvtepi64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtepu8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtepu64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtepu64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_mask_ ⚠cvtepu64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtne2ps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_mask_ ⚠cvtneps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_mask_ ⚠cvtpbh_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtpd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_mask_ ⚠cvtpd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ pd Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtph_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_mask_ ⚠cvtsepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtsepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtsepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvttpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvttpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvttph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvttps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvttps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠cvtusepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtusepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtusepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm256_mask_ ⚠cvtxph_ ps Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠cvtxps_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm256_mask_ ⚠dbsad_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm256_mask_ ⚠div_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠div_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlDivide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠div_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠dpbf16_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm256_mask_ ⚠dpbusd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠dpbusds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠dpwssd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠dpwssds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expand_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expand_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expand_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expand_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expand_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expand_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠expandloadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠extractf32x4_ ps Experimental (x86 or x86-64) and avx512f,avx512vlExtract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠extractf64x2_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠extracti32x4_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlExtract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM1, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠extracti64x2_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlExtracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ ⚠fcmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_mask_ ⚠fixupimm_ pd Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_mask_ ⚠fixupimm_ ps Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_mask_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠fpclass_ pd_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_mask_ ⚠fpclass_ ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlTest packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_mask_ ⚠fpclass_ ps_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm256_mask_ ⚠getexp_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_mask_ ⚠getexp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm256_mask_ ⚠getexp_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_mask_ ⚠getmant_ pd Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_mask_ ⚠getmant_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlNormalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm256_mask_ ⚠getmant_ ps Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_mask_ ⚠gf2p8affine_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_mask_ ⚠gf2p8affineinv_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_mask_ ⚠gf2p8mul_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm256_mask_ ⚠i32scatter_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 8 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i32scatter_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i32scatter_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i32scatter_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 8 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠i64scatter_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm256_mask_ ⚠insertf32x4 Experimental (x86 or x86-64) and avx512f,avx512vlCopy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠insertf64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlCopy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠inserti32x4 Experimental (x86 or x86-64) and avx512f,avx512vlCopy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠inserti64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlCopy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠load_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠load_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠load_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠load_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠loadu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 8-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 16-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠loadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠lzcnt_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠lzcnt_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠madd52hi_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm256_mask_ ⚠madd52lo_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm256_mask_ ⚠madd_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠maddubs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_mask_ ⚠max_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_mask_ ⚠min_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mov_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 8-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mov_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 16-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mov_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 32-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mov_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 64-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mov_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMove packed double-precision (64-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mov_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMove packed single-precision (32-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠movedup_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠movehdup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠moveldup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mul_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mask_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mulhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mulhi_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mulhrs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mullo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mullo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠mullo_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing writemaskk(elements are copied fromsrcif the corresponding bit is not set).
- _mm256_mask_ ⚠multishift_ epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠or_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠or_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠packs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠packs_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠packus_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠packus_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permute_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permute_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutevar_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutevar_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutex_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutexvar_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutexvar_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutexvar_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutexvar_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutexvar_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠permutexvar_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠popcnt_ epi8 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ ⚠popcnt_ epi16 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ ⚠popcnt_ epi32 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ ⚠popcnt_ epi64 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm256_mask_ ⚠range_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_mask_ ⚠range_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_mask_ ⚠rcp14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ ⚠rcp14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ ⚠rcp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_mask_ ⚠reduce_ add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm256_mask_ ⚠reduce_ add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm256_mask_ ⚠reduce_ and_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm256_mask_ ⚠reduce_ and_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm256_mask_ ⚠reduce_ max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ ⚠reduce_ max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ ⚠reduce_ max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ ⚠reduce_ max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm256_mask_ ⚠reduce_ min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ ⚠reduce_ min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ ⚠reduce_ min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ ⚠reduce_ min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm256_mask_ ⚠reduce_ mul_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm256_mask_ ⚠reduce_ mul_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm256_mask_ ⚠reduce_ or_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm256_mask_ ⚠reduce_ or_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm256_mask_ ⚠reduce_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_mask_ ⚠reduce_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlExtract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠reduce_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_mask_ ⚠rol_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠rol_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠rolv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠rolv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠ror_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠ror_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠rorv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠rorv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠roundscale_ pd Experimental (x86 or x86-64) and avx512f,avx512vlRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_mask_ ⚠roundscale_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlRound packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠roundscale_ ps Experimental (x86 or x86-64) and avx512f,avx512vlRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_mask_ ⚠rsqrt14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ ⚠rsqrt14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_mask_ ⚠rsqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_mask_ ⚠scalef_ pd Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠scalef_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlScale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠scalef_ ps Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠set1_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast 8-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠set1_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast 16-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠set1_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 32-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠set1_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 64-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shldi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shldi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shldi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shldv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shldv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shldv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shrdi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shrdi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shrdi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shrdv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shrdv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shrdv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ f32x4 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ f64x2 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ i32x4 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ i64x2 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shuffle_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shufflehi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠shufflelo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sll_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sll_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sll_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠slli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠slli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠slli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sllv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sllv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sllv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sqrt_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sqrt_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sra_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sra_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sra_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srai_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srai_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srai_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srav_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srav_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srav_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srl_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srl_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srl_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srlv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srlv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠srlv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠store_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠store_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠store_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStore packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠store_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStore packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_mask_ ⚠storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlStore packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlStore packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStore packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠storeu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStore packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm256_mask_ ⚠sub_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sub_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sub_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sub_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlSubtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠sub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠subs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠subs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠subs_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠subs_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠ternarylogic_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32-bit granularity (32-bit elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠ternarylogic_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64-bit granularity (64-bit elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠test_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ ⚠test_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ ⚠test_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ ⚠test_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm256_mask_ ⚠testn_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ ⚠testn_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ ⚠testn_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ ⚠testn_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm256_mask_ ⚠unpackhi_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpackhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpackhi_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpackhi_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpackhi_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpackhi_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpacklo_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpacklo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpacklo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpacklo_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpacklo_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠unpacklo_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠xor_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠xor_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mask_ ⚠xor_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_mask_ ⚠xor_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm256_maskz_ ⚠abs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠abs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠abs_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠abs_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ pd Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlAdd packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠add_ ps Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠adds_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠adds_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠adds_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠adds_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠alignr_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠alignr_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 64-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 32 bytes (8 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠alignr_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 64-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 32 bytes (4 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠and_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠and_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠andnot_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠andnot_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠andnot_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠andnot_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠avg_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠avg_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcast_ f32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠broadcast_ f32x4 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcast_ f64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠broadcast_ i32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠broadcast_ i32x4 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the 4 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcast_ i64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the 2 packed 64-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠broadcastb_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 8-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcastd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 32-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcastq_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 64-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcastsd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcastss_ ps Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠broadcastw_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ ⚠compress_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 8-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ ⚠compress_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 16-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ ⚠compress_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ ⚠compress_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ ⚠compress_ pd Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ ⚠compress_ ps Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm256_maskz_ ⚠conflict_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_maskz_ ⚠conflict_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm256_maskz_ ⚠conj_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ ⚠cvt_ roundps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ ⚠cvtepi8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi32_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepi64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtepi64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_maskz_ ⚠cvtepi64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtepu8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtepu64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtepu64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_maskz_ ⚠cvtepu64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtne2ps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_maskz_ ⚠cvtneps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_maskz_ ⚠cvtpbh_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtpd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm256_maskz_ ⚠cvtpd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ pd Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtph_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ ⚠cvtsepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtsepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtsepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm256_maskz_ ⚠cvtsepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtsepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtsepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvttpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvttph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvttps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvttps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠cvtusepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtusepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtusepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtusepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtusepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtusepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtxph_ ps Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠cvtxps_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠dbsad_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm256_maskz_ ⚠div_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠div_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlDivide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠div_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠dpbf16_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm256_maskz_ ⚠dpbusd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠dpbusds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠dpwssd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠dpwssds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expand_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expand_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expand_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expand_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expand_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expand_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠expandloadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠extractf32x4_ ps Experimental (x86 or x86-64) and avx512f,avx512vlExtract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠extractf64x2_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠extracti32x4_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlExtract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM1, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠extracti64x2_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlExtracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ ⚠fcmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm256_maskz_ ⚠fixupimm_ pd Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_maskz_ ⚠fixupimm_ ps Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm256_maskz_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_maskz_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_maskz_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠getexp_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_maskz_ ⚠getexp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm256_maskz_ ⚠getexp_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm256_maskz_ ⚠getmant_ pd Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_maskz_ ⚠getmant_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlNormalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm256_maskz_ ⚠getmant_ ps Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm256_maskz_ ⚠gf2p8affine_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_maskz_ ⚠gf2p8affineinv_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm256_maskz_ ⚠gf2p8mul_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm256_maskz_ ⚠insertf32x4 Experimental (x86 or x86-64) and avx512f,avx512vlCopy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠insertf64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlCopy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠inserti32x4 Experimental (x86 or x86-64) and avx512f,avx512vlCopy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠inserti64x2 Experimental (x86 or x86-64) and avx512dq,avx512vlCopy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠load_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠load_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠load_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠load_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_maskz_ ⚠loadu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 8-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 16-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠loadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm256_maskz_ ⚠lzcnt_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠lzcnt_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠madd52hi_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠madd52lo_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠madd_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠maddubs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_maskz_ ⚠max_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_maskz_ ⚠min_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mov_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 8-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mov_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 16-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mov_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 32-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mov_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 64-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mov_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMove packed double-precision (64-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mov_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMove packed single-precision (32-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠movedup_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠movehdup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠moveldup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mul_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_maskz_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mulhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mulhi_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mulhrs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mullo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mullo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠mullo_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing zeromaskk(elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠multishift_ epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠or_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠or_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠packs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠packs_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠packus_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠packus_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permute_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permute_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutevar_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutevar_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutex_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutexvar_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutexvar_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutexvar_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutexvar_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutexvar_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠permutexvar_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠popcnt_ epi8 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ ⚠popcnt_ epi16 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ ⚠popcnt_ epi32 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ ⚠popcnt_ epi64 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm256_maskz_ ⚠range_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_maskz_ ⚠range_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_maskz_ ⚠rcp14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ ⚠rcp14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ ⚠rcp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_maskz_ ⚠reduce_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_maskz_ ⚠reduce_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlExtract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠reduce_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_maskz_ ⚠rol_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠rol_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠rolv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠rolv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠ror_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠ror_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠rorv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠rorv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠roundscale_ pd Experimental (x86 or x86-64) and avx512f,avx512vlRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ ⚠roundscale_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlRound packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠roundscale_ ps Experimental (x86 or x86-64) and avx512f,avx512vlRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_maskz_ ⚠rsqrt14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ ⚠rsqrt14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm256_maskz_ ⚠rsqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_maskz_ ⚠scalef_ pd Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠scalef_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlScale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠scalef_ ps Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠set1_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast 8-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠set1_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠set1_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 32-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠set1_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 64-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shldi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shldi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shldi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shldv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shldv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shldv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shrdi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shrdi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shrdi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shrdv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shrdv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shrdv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ f32x4 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ f64x2 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ i32x4 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ i64x2 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shuffle_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shufflehi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠shufflelo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sll_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sll_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sll_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠slli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠slli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠slli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sllv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sllv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sllv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sqrt_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sqrt_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sra_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sra_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sra_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srai_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srai_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srai_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srav_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srav_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srav_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srl_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srl_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srl_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srlv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srlv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠srlv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlSubtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠sub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠subs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠subs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠subs_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠subs_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠ternarylogic_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠ternarylogic_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpackhi_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpackhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpackhi_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpackhi_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpackhi_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpackhi_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpacklo_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpacklo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpacklo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpacklo_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpacklo_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠unpacklo_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠xor_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠xor_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm256_maskz_ ⚠xor_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_maskz_ ⚠xor_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm256_max_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst.
- _mm256_max_ ⚠epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst.
- _mm256_max_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm256_min_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst.
- _mm256_min_ ⚠epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst.
- _mm256_min_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm256_mmask_ ⚠i32gather_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 8 32-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i32gather_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i32gather_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i32gather_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoads 8 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 32-bit integer elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_mmask_ ⚠i64gather_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm256_movepi8_ ⚠mask Experimental (x86 or x86-64) and avx512bw,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 8-bit integer in a.
- _mm256_movepi16_ ⚠mask Experimental (x86 or x86-64) and avx512bw,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 16-bit integer in a.
- _mm256_movepi32_ ⚠mask Experimental (x86 or x86-64) and avx512dq,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 32-bit integer in a.
- _mm256_movepi64_ ⚠mask Experimental (x86 or x86-64) and avx512dq,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 64-bit integer in a.
- _mm256_movm_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSet each packed 8-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_movm_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSet each packed 16-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_movm_ ⚠epi32 Experimental (x86 or x86-64) and avx512dq,avx512vlSet each packed 32-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_movm_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlSet each packed 64-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm256_mul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm256_mul_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm256_mullo_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indst.
- _mm256_multishift_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst.
- _mm256_or_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst.
- _mm256_or_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 64-bit integers in a and b, and store the resut in dst.
- _mm256_permutex2var_ ⚠epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlShuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex2var_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm256_permutex_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm256_permutex_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm256_permutexvar_ ⚠epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlShuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm256_permutexvar_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
- _mm256_popcnt_ ⚠epi8 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm256_popcnt_ ⚠epi16 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm256_popcnt_ ⚠epi32 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm256_popcnt_ ⚠epi64 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm256_range_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_range_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm256_rcp14_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rcp14_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rcp_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_reduce_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by addition. Returns the sum of all elements in a.
- _mm256_reduce_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by addition. Returns the sum of all elements in a.
- _mm256_reduce_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm256_reduce_ ⚠and_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm256_reduce_ ⚠and_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm256_reduce_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm256_reduce_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm256_reduce_ ⚠mul_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm256_reduce_ ⚠mul_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm256_reduce_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm256_reduce_ ⚠or_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm256_reduce_ ⚠or_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm256_reduce_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_reduce_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlExtract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm256_reduce_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm256_rol_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm256_rol_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm256_rolv_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_rolv_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_ror_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm256_ror_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm256_rorv_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_rorv_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm256_roundscale_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_roundscale_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlRound packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm256_roundscale_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm256_rsqrt14_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rsqrt14_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm256_rsqrt_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm256_scalef_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_scalef_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlScale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_scalef_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm256_set1_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm256_set_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm256_setr_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm256_setzero_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReturn vector of type __m256h with all elements set to zero.
- _mm256_shldi_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst).
- _mm256_shldi_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst.
- _mm256_shldi_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst).
- _mm256_shldv_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst.
- _mm256_shldv_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst.
- _mm256_shldv_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst.
- _mm256_shrdi_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst.
- _mm256_shrdi_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst.
- _mm256_shrdi_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst.
- _mm256_shrdv_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst.
- _mm256_shrdv_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
- _mm256_shrdv_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst.
- _mm256_shuffle_ ⚠f32x4 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ ⚠f64x2 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ ⚠i32x4 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm256_shuffle_ ⚠i64x2 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm256_sllv_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm256_sqrt_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm256_sra_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm256_srai_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm256_srav_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm256_srav_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm256_srlv_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm256_store_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore 256-bits (composed of 8 packed 32-bit integers) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore 256-bits (composed of 4 packed 64-bit integers) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
- _mm256_store_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlStore 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 32 bytes or a general-protection exception may be generated.
- _mm256_storeu_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlStore 256-bits (composed of 32 packed 8-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlStore 256-bits (composed of 16 packed 16-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore 256-bits (composed of 8 packed 32-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore 256-bits (composed of 4 packed 64-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm256_storeu_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlStore 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm256_sub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlSubtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm256_ternarylogic_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm256_ternarylogic_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm256_test_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_test_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_test_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_test_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm256_testn_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testn_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testn_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_testn_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm256_undefined_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReturn vector of type__m256hwith undefined elements. In practice, this returns the all-zero vector.
- _mm256_xor_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst.
- _mm256_xor_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst.
- _mm256_zextph128_ ⚠ph256 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m128h. The upper 8 elements of the result are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_abs_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst.
- _mm512_abs_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst.
- _mm512_abs_ ⚠epi32 Experimental (x86 or x86-64) and avx512fComputes the absolute values of packed 32-bit integers ina.
- _mm512_abs_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst.
- _mm512_abs_ ⚠pd Experimental (x86 or x86-64) and avx512fFinds the absolute value of each packed double-precision (64-bit) floating-point element in v2, storing the results in dst.
- _mm512_abs_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Finds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the result in dst.
- _mm512_abs_ ⚠ps Experimental (x86 or x86-64) and avx512fFinds the absolute value of each packed single-precision (32-bit) floating-point element in v2, storing the results in dst.
- _mm512_add_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwAdd packed 8-bit integers in a and b, and store the results in dst.
- _mm512_add_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwAdd packed 16-bit integers in a and b, and store the results in dst.
- _mm512_add_ ⚠epi32 Experimental (x86 or x86-64) and avx512fAdd packed 32-bit integers in a and b, and store the results in dst.
- _mm512_add_ ⚠epi64 Experimental (x86 or x86-64) and avx512fAdd packed 64-bit integers in a and b, and store the results in dst.
- _mm512_add_ ⚠pd Experimental (x86 or x86-64) and avx512fAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_add_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_add_ ⚠ps Experimental (x86 or x86-64) and avx512fAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_add_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_add_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_add_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_adds_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst.
- _mm512_adds_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst.
- _mm512_adds_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst.
- _mm512_adds_ ⚠epu16 Experimental (x86 or x86-64) and avx512bwAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst.
- _mm512_aesdec_ ⚠epi128 Experimental (x86 or x86-64) and vaes,avx512fPerforms one round of an AES decryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm512_aesdeclast_ ⚠epi128 Experimental (x86 or x86-64) and vaes,avx512fPerforms the last round of an AES decryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm512_aesenc_ ⚠epi128 Experimental (x86 or x86-64) and vaes,avx512fPerforms one round of an AES encryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm512_aesenclast_ ⚠epi128 Experimental (x86 or x86-64) and vaes,avx512fPerforms the last round of an AES encryption flow on each 128-bit word (state) inausing the corresponding 128-bit word (key) inround_key.
- _mm512_alignr_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst. Unlike_mm_alignr_epi8,_mm256_alignr_epi8functions, where the entire input vectors are concatenated to the temporary result, this concatenation happens in 4 steps, where each step builds 32-byte temporary result.
- _mm512_alignr_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConcatenate a and b into a 128-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 64 bytes (16 elements) in dst.
- _mm512_alignr_ ⚠epi64 Experimental (x86 or x86-64) and avx512fConcatenate a and b into a 128-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 64 bytes (8 elements) in dst.
- _mm512_and_ ⚠epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst.
- _mm512_and_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of 512 bits (composed of packed 64-bit integers) in a and b, and store the results in dst.
- _mm512_and_ ⚠pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst.
- _mm512_and_ ⚠ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst.
- _mm512_and_ ⚠si512 Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of 512 bits (representing integer data) in a and b, and store the result in dst.
- _mm512_andnot_ ⚠epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst.
- _mm512_andnot_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of 512 bits (composed of packed 64-bit integers) in a and then AND with b, and store the results in dst.
- _mm512_andnot_ ⚠pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst.
- _mm512_andnot_ ⚠ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst.
- _mm512_andnot_ ⚠si512 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of 512 bits (representing integer data) in a and then AND with b, and store the result in dst.
- _mm512_avg_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwAverage packed unsigned 8-bit integers in a and b, and store the results in dst.
- _mm512_avg_ ⚠epu16 Experimental (x86 or x86-64) and avx512bwAverage packed unsigned 16-bit integers in a and b, and store the results in dst.
- _mm512_bitshuffle_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512bitalgConsiders the inputbas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm512_broadcast_ ⚠f32x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ ⚠f32x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ ⚠f32x8 Experimental (x86 or x86-64) and avx512dqBroadcasts the 8 packed single-precision (32-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ ⚠f64x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ ⚠f64x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed double-precision (64-bit) floating-point elements from a to all elements of dst.
- _mm512_broadcast_ ⚠i32x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst.
- _mm512_broadcast_ ⚠i32x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed 32-bit integers from a to all elements of dst.
- _mm512_broadcast_ ⚠i32x8 Experimental (x86 or x86-64) and avx512dqBroadcasts the 8 packed 32-bit integers from a to all elements of dst.
- _mm512_broadcast_ ⚠i64x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the 2 packed 64-bit integers from a to all elements of dst.
- _mm512_broadcast_ ⚠i64x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed 64-bit integers from a to all elements of dst.
- _mm512_broadcastb_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 8-bit integer from a to all elements of dst.
- _mm512_broadcastd_ ⚠epi32 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 32-bit integer from a to all elements of dst.
- _mm512_broadcastmb_ ⚠epi64 Experimental (x86 or x86-64) and avx512cdBroadcast the low 8-bits from input mask k to all 64-bit elements of dst.
- _mm512_broadcastmw_ ⚠epi32 Experimental (x86 or x86-64) and avx512cdBroadcast the low 16-bits from input mask k to all 32-bit elements of dst.
- _mm512_broadcastq_ ⚠epi64 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 64-bit integer from a to all elements of dst.
- _mm512_broadcastsd_ ⚠pd Experimental (x86 or x86-64) and avx512fBroadcast the low double-precision (64-bit) floating-point element from a to all elements of dst.
- _mm512_broadcastss_ ⚠ps Experimental (x86 or x86-64) and avx512fBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst.
- _mm512_broadcastw_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 16-bit integer from a to all elements of dst.
- _mm512_bslli_ ⚠epi128 Experimental (x86 or x86-64) and avx512bwShift 128-bit lanes in a left by imm8 bytes while shifting in zeros, and store the results in dst.
- _mm512_bsrli_ ⚠epi128 Experimental (x86 or x86-64) and avx512bwShift 128-bit lanes in a right by imm8 bytes while shifting in zeros, and store the results in dst.
- _mm512_castpd128_ ⚠pd512 Experimental (x86 or x86-64) and avx512fCast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd256_ ⚠pd512 Experimental (x86 or x86-64) and avx512fCast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd512_ ⚠pd128 Experimental (x86 or x86-64) and avx512fCast vector of type __m512d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd512_ ⚠pd256 Experimental (x86 or x86-64) and avx512fCast vector of type __m512d to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512dto type__m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd_ ⚠ps Experimental (x86 or x86-64) and avx512fCast vector of type __m512d to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castpd_ ⚠si512 Experimental (x86 or x86-64) and avx512fCast vector of type __m512d to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph128_ ⚠ph512 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128hto type__m512h. The upper 24 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_castph256_ ⚠ph512 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m512h. The upper 16 elements of the result are undefined. In practice, the upper elements are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_castph512_ ⚠ph128 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512hto type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph512_ ⚠ph256 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512hto type__m256h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph_ ⚠pd Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512hto type__m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph_ ⚠ps Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512hto type__m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castph_ ⚠si512 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512hto type__m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps128_ ⚠ps512 Experimental (x86 or x86-64) and avx512fCast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps256_ ⚠ps512 Experimental (x86 or x86-64) and avx512fCast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps512_ ⚠ps128 Experimental (x86 or x86-64) and avx512fCast vector of type __m512 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps512_ ⚠ps256 Experimental (x86 or x86-64) and avx512fCast vector of type __m512 to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps_ ⚠pd Experimental (x86 or x86-64) and avx512fCast vector of type __m512 to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512to type__m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castps_ ⚠si512 Experimental (x86 or x86-64) and avx512fCast vector of type __m512 to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi128_ ⚠si512 Experimental (x86 or x86-64) and avx512fCast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi256_ ⚠si512 Experimental (x86 or x86-64) and avx512fCast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ⚠pd Experimental (x86 or x86-64) and avx512fCast vector of type __m512i to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m512ito type__m512h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ⚠ps Experimental (x86 or x86-64) and avx512fCast vector of type __m512i to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ⚠si128 Experimental (x86 or x86-64) and avx512fCast vector of type __m512i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_castsi512_ ⚠si256 Experimental (x86 or x86-64) and avx512fCast vector of type __m512i to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_clmulepi64_ ⚠epi128 Experimental (x86 or x86-64) and vpclmulqdq,avx512fPerforms a carry-less multiplication of two 64-bit polynomials over the finite field GF(2) - in each of the 4 128-bit lanes.
- _mm512_cmp_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b based on the comparison operand specified byIMM8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠ph_ mask Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠round_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cmp_ ⚠round_ ph_ mask Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm512_cmp_ ⚠round_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cmpeq_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for equality, and store the results in mask vector k.
- _mm512_cmpeq_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for equality, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpge_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmpgt_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm512_cmple_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmple_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for less-than, and store the results in mask vector k.
- _mm512_cmplt_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for less-than, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpneq_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k.
- _mm512_cmpnle_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k.
- _mm512_cmpnle_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k.
- _mm512_cmpnlt_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k.
- _mm512_cmpnlt_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k.
- _mm512_cmpord_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k.
- _mm512_cmpord_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k.
- _mm512_cmpunord_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k.
- _mm512_cmpunord_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k.
- _mm512_cmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_cmul_ ⚠round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_conflict_ ⚠epi32 Experimental (x86 or x86-64) and avx512cdTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_conflict_ ⚠epi64 Experimental (x86 or x86-64) and avx512cdTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_conj_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Compute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_cvt_ ⚠roundepi16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundepi32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundepi32_ ps Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.\
- _mm512_cvt_ ⚠roundepi64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundepi64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundepi64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundepu16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundepu32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundepu32_ ps Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.\
- _mm512_cvt_ ⚠roundepu64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundepu64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundepu64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.\
- _mm512_cvt_ ⚠roundpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.\
- _mm512_cvt_ ⚠roundpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundpd_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundpd_ ps Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.\
- _mm512_cvt_ ⚠roundph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ pd Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvt_ ⚠roundph_ ps Experimental (x86 or x86-64) and avx512fConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvt_ ⚠roundps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvt_ ⚠roundps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.\
- _mm512_cvt_ ⚠roundps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst. Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_cvt_ ⚠roundps_ pd Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvt_ ⚠roundps_ ph Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtepi8_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvtepi8_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepi8_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi16_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepi16_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepi16_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi32_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepi32_ ⚠pd Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32_ ⚠ps Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi32lo_ ⚠pd Experimental (x86 or x86-64) and avx512fPerforms element-by-element conversion of the lower half of packed 32-bit integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst.
- _mm512_cvtepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvtepi64_ ⚠pd Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi64_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepi64_ ⚠ps Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu8_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvtepu8_ ⚠epi32 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepu8_ ⚠epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 8-bit integers in the low 8 byte sof a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepu16_ ⚠epi32 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtepu16_ ⚠epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepu16_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32_ ⚠epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtepu32_ ⚠pd Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32_ ⚠ps Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu32lo_ ⚠pd Experimental (x86 or x86-64) and avx512fPerforms element-by-element conversion of the lower half of packed 32-bit unsigned integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst.
- _mm512_cvtepu64_ ⚠pd Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu64_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtepu64_ ⚠ps Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtne2ps_ ⚠pbh Experimental (x86 or x86-64) and avx512bf16,avx512fConvert packed single-precision (32-bit) floating-point elements in two 512-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 512-bit wide vector. Intel’s documentation
- _mm512_cvtneps_ ⚠pbh Experimental (x86 or x86-64) and avx512bf16,avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst. Intel’s documentation
- _mm512_cvtpbh_ ⚠ps Experimental (x86 or x86-64) and avx512bf16,avx512fConverts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtpd_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtpd_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm512_cvtpd_ ⚠epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm512_cvtpd_ ⚠epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm512_cvtpd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtpd_ ⚠ps Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtpd_ ⚠pslo Experimental (x86 or x86-64) and avx512fPerforms an element-by-element conversion of packed double-precision (64-bit) floating-point elements in v2 to single-precision (32-bit) floating-point elements and stores them in dst. The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.
- _mm512_cvtph_ ⚠epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm512_cvtph_ ⚠epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtph_ ⚠epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm512_cvtph_ ⚠epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm512_cvtph_ ⚠epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst.
- _mm512_cvtph_ ⚠epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm512_cvtph_ ⚠pd Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtph_ ⚠ps Experimental (x86 or x86-64) and avx512fConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtps_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm512_cvtps_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm512_cvtps_ ⚠epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm512_cvtps_ ⚠epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm512_cvtps_ ⚠pd Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm512_cvtps_ ⚠ph Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtpslo_ ⚠pd Experimental (x86 or x86-64) and avx512fPerforms element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst.
- _mm512_cvtsd_ ⚠f64 Experimental (x86 or x86-64) and avx512fCopy the lower double-precision (64-bit) floating-point element of a to dst.
- _mm512_cvtsepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst.
- _mm512_cvtsh_ ⚠h Experimental (x86 or x86-64) and avx512fp16Copy the lower half-precision (16-bit) floating-point element fromatodst.
- _mm512_cvtsi512_ ⚠si32 Experimental (x86 or x86-64) and avx512fCopy the lower 32-bit integer in a to dst.
- _mm512_cvtss_ ⚠f32 Experimental (x86 or x86-64) and avx512fCopy the lower single-precision (32-bit) floating-point element of a to dst.
- _mm512_cvtt_ ⚠roundpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ ⚠roundpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvtt_ ⚠roundpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ ⚠roundpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvtt_ ⚠roundph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ ⚠roundph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ ⚠roundph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ ⚠roundph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_cvtt_ ⚠roundph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvtt_ ⚠roundph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvtt_ ⚠roundps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ ⚠roundps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvtt_ ⚠roundps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_cvtt_ ⚠roundps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_cvttpd_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttpd_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm512_cvttpd_ ⚠epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttpd_ ⚠epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm512_cvttph_ ⚠epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ ⚠epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ ⚠epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ ⚠epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm512_cvttph_ ⚠epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvttph_ ⚠epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm512_cvttps_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttps_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm512_cvttps_ ⚠epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm512_cvttps_ ⚠epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm512_cvtusepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtusepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst.
- _mm512_cvtx_ ⚠roundph_ ps Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtx_ ⚠roundps_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_cvtxph_ ⚠ps Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm512_cvtxps_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm512_dbsad_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm512_div_ ⚠pd Experimental (x86 or x86-64) and avx512fDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst.
- _mm512_div_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm512_div_ ⚠ps Experimental (x86 or x86-64) and avx512fDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst.
- _mm512_div_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, =and store the results in dst.\
- _mm512_div_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_div_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst.\
- _mm512_dpbf16_ ⚠ps Experimental (x86 or x86-64) and avx512bf16,avx512fCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm512_dpbusd_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm512_dpbusds_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm512_dpwssd_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm512_dpwssds_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm512_extractf32x4_ ⚠ps Experimental (x86 or x86-64) and avx512fExtract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
- _mm512_extractf32x8_ ⚠ps Experimental (x86 or x86-64) and avx512dqExtracts 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst.
- _mm512_extractf64x2_ ⚠pd Experimental (x86 or x86-64) and avx512dqExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst.
- _mm512_extractf64x4_ ⚠pd Experimental (x86 or x86-64) and avx512fExtract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the result in dst.
- _mm512_extracti32x4_ ⚠epi32 Experimental (x86 or x86-64) and avx512fExtract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM2, and store the result in dst.
- _mm512_extracti32x8_ ⚠epi32 Experimental (x86 or x86-64) and avx512dqExtracts 256 bits (composed of 8 packed 32-bit integers) from a, selected with IMM8, and stores the result in dst.
- _mm512_extracti64x2_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqExtracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst.
- _mm512_extracti64x4_ ⚠epi64 Experimental (x86 or x86-64) and avx512fExtract 256 bits (composed of 4 packed 64-bit integers) from a, selected with IMM1, and store the result in dst.
- _mm512_fcmadd_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_fcmadd_ ⚠round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_fcmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_fcmul_ ⚠round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1],
- _mm512_fixupimm_ ⚠pd Experimental (x86 or x86-64) and avx512fFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm512_fixupimm_ ⚠ps Experimental (x86 or x86-64) and avx512fFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm512_fixupimm_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.\
- _mm512_fixupimm_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.\
- _mm512_fmadd_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_fmadd_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ ⚠round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_fmadd_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fmadd_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm512_fmadd_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fmaddsub_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.\
- _mm512_fmaddsub_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmaddsub_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.\
- _mm512_fmsub_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.\
- _mm512_fmsub_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm512_fmsub_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.\
- _mm512_fmsubadd_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.\
- _mm512_fmsubadd_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm512_fmsubadd_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst.\
- _mm512_fmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_fmul_ ⚠round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_fnmadd_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.
- _mm512_fnmadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_fnmadd_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.
- _mm512_fnmadd_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fnmadd_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm512_fnmadd_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst.\
- _mm512_fnmsub_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.\
- _mm512_fnmsub_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm512_fnmsub_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.\
- _mm512_fpclass_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512dqTest packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_fpclass_ ⚠ph_ mask Experimental (x86 or x86-64) and avx512fp16Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_fpclass_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512dqTest packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm512_getexp_ ⚠pd Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_getexp_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm512_getexp_ ⚠ps Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_getexp_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_getexp_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculatesfloor(log2(x))for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_getexp_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_getmant_ ⚠pd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_getmant_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm512_getmant_ ⚠ps Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_getmant_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_getmant_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_getmant_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_gf2p8affine_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512fPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_gf2p8affineinv_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512fPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_gf2p8mul_ ⚠epi8 Experimental (x86 or x86-64) and gfni,avx512fPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm512_i32gather_ ⚠epi32 Experimental (x86 or x86-64) and avx512fGather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32gather_ ⚠epi64 Experimental (x86 or x86-64) and avx512fGather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32gather_ ⚠pd Experimental (x86 or x86-64) and avx512fGather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32gather_ ⚠ps Experimental (x86 or x86-64) and avx512fGather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i32logather_ ⚠epi64 Experimental (x86 or x86-64) and avx512fLoads 8 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst.
- _mm512_i32logather_ ⚠pd Experimental (x86 or x86-64) and avx512fLoads 8 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst.
- _mm512_i32loscatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512fStores 8 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale.
- _mm512_i32loscatter_ ⚠pd Experimental (x86 or x86-64) and avx512fStores 8 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale.
- _mm512_i32scatter_ ⚠epi32 Experimental (x86 or x86-64) and avx512fScatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i32scatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512fScatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i32scatter_ ⚠pd Experimental (x86 or x86-64) and avx512fScatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i32scatter_ ⚠ps Experimental (x86 or x86-64) and avx512fScatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠epi32 Experimental (x86 or x86-64) and avx512fGather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠epi64 Experimental (x86 or x86-64) and avx512fGather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠pd Experimental (x86 or x86-64) and avx512fGather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64gather_ ⚠ps Experimental (x86 or x86-64) and avx512fGather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst. scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠epi32 Experimental (x86 or x86-64) and avx512fScatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512fScatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠pd Experimental (x86 or x86-64) and avx512fScatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). scale should be 1, 2, 4 or 8.
- _mm512_i64scatter_ ⚠ps Experimental (x86 or x86-64) and avx512fScatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_insertf32x4 ⚠Experimental (x86 or x86-64) and avx512fCopy a to dst, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by imm8.
- _mm512_insertf32x8 ⚠Experimental (x86 or x86-64) and avx512dqCopy a to dst, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into dst at the location specified by IMM8.
- _mm512_insertf64x2 ⚠Experimental (x86 or x86-64) and avx512dqCopy a to dst, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by IMM8.
- _mm512_insertf64x4 ⚠Experimental (x86 or x86-64) and avx512fCopy a to dst, then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from b into dst at the location specified by imm8.
- _mm512_inserti32x4 ⚠Experimental (x86 or x86-64) and avx512fCopy a to dst, then insert 128 bits (composed of 4 packed 32-bit integers) from b into dst at the location specified by imm8.
- _mm512_inserti32x8 ⚠Experimental (x86 or x86-64) and avx512dqCopy a to dst, then insert 256 bits (composed of 8 packed 32-bit integers) from b into dst at the location specified by IMM8.
- _mm512_inserti64x2 ⚠Experimental (x86 or x86-64) and avx512dqCopy a to dst, then insert 128 bits (composed of 2 packed 64-bit integers) from b into dst at the location specified by IMM8.
- _mm512_inserti64x4 ⚠Experimental (x86 or x86-64) and avx512fCopy a to dst, then insert 256 bits (composed of 4 packed 64-bit integers) from b into dst at the location specified by imm8.
- _mm512_int2mask ⚠Experimental (x86 or x86-64) and avx512fConverts integer mask into bitmask, storing the result in dst.
- _mm512_kand ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of 16-bit masks a and b, and store the result in k.
- _mm512_kandn ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of 16-bit masks a and then AND with b, and store the result in k.
- _mm512_kmov ⚠Experimental (x86 or x86-64) and avx512fCopy 16-bit mask a to k.
- _mm512_knot ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of 16-bit mask a, and store the result in k.
- _mm512_kor ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of 16-bit masks a and b, and store the result in k.
- _mm512_kortestc ⚠Experimental (x86 or x86-64) and avx512fPerforms bitwise OR between k1 and k2, storing the result in dst. CF flag is set if dst consists of all 1’s.
- _mm512_kortestz ⚠Experimental (x86 or x86-64) and avx512fPerforms bitwise OR between k1 and k2, storing the result in dst. ZF flag is set if dst is 0.
- _mm512_kunpackb ⚠Experimental (x86 or x86-64) and avx512fUnpack and interleave 8 bits from masks a and b, and store the 16-bit result in k.
- _mm512_kunpackd ⚠Experimental (x86 or x86-64) and avx512bwUnpack and interleave 32 bits from masks a and b, and store the 64-bit result in k.
- _mm512_kunpackw ⚠Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16 bits from masks a and b, and store the 32-bit result in k.
- _mm512_kxnor ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise XNOR of 16-bit masks a and b, and store the result in k.
- _mm512_kxor ⚠Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of 16-bit masks a and b, and store the result in k.
- _mm512_load_ ⚠epi32 Experimental (x86 or x86-64) and avx512fLoad 512-bits (composed of 16 packed 32-bit integers) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠epi64 Experimental (x86 or x86-64) and avx512fLoad 512-bits (composed of 8 packed 64-bit integers) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠pd Experimental (x86 or x86-64) and avx512fLoad 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_load_ ⚠ps Experimental (x86 or x86-64) and avx512fLoad 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_load_ ⚠si512 Experimental (x86 or x86-64) and avx512fLoad 512-bits of integer data from memory into dst. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_loadu_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwLoad 512-bits (composed of 64 packed 8-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwLoad 512-bits (composed of 32 packed 16-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠epi32 Experimental (x86 or x86-64) and avx512fLoad 512-bits (composed of 16 packed 32-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠epi64 Experimental (x86 or x86-64) and avx512fLoad 512-bits (composed of 8 packed 64-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠pd Experimental (x86 or x86-64) and avx512fLoads 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into result.mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm512_loadu_ ⚠ps Experimental (x86 or x86-64) and avx512fLoads 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into result.mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_loadu_ ⚠si512 Experimental (x86 or x86-64) and avx512fLoad 512-bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm512_lzcnt_ ⚠epi32 Experimental (x86 or x86-64) and avx512cdCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst.
- _mm512_lzcnt_ ⚠epi64 Experimental (x86 or x86-64) and avx512cdCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst.
- _mm512_madd52hi_ ⚠epu64 Experimental (x86 or x86-64) and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm512_madd52lo_ ⚠epu64 Experimental (x86 or x86-64) and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm512_madd_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst.
- _mm512_maddubs_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwVertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst.
- _mm512_mask2_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask2_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set)
- _mm512_mask2_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm512_mask2int ⚠Experimental (x86 or x86-64) and avx512fConverts bit mask k1 into an integer value, storing the results in dst.
- _mm512_mask3_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask3_ ⚠fcmadd_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using writemask k (the element is copied from c when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask3_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask3_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmadd_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask3_ ⚠fmadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmaddsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmaddsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmaddsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsubadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fmsubadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fmsubadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fnmadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask3_ ⚠fnmsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm512_mask3_ ⚠fnmsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠abs_ epi8 Experimental (x86 or x86-64) and avx512bwCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠abs_ epi16 Experimental (x86 or x86-64) and avx512bwCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠abs_ epi32 Experimental (x86 or x86-64) and avx512fComputes the absolute value of packed 32-bit integers ina, and store the unsigned results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set).
- _mm512_mask_ ⚠abs_ epi64 Experimental (x86 or x86-64) and avx512fCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠abs_ pd Experimental (x86 or x86-64) and avx512fFinds the absolute value of each packed double-precision (64-bit) floating-point element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠abs_ ps Experimental (x86 or x86-64) and avx512fFinds the absolute value of each packed single-precision (32-bit) floating-point element in v2, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bwAdd packed 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bwAdd packed 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512fAdd packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512fAdd packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ pd Experimental (x86 or x86-64) and avx512fAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ ps Experimental (x86 or x86-64) and avx512fAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠add_ round_ pd Experimental (x86 or x86-64) and avx512fAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠add_ round_ ph Experimental (x86 or x86-64) and avx512fp16Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ ⚠add_ round_ ps Experimental (x86 or x86-64) and avx512fAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠adds_ epi8 Experimental (x86 or x86-64) and avx512bwAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠adds_ epi16 Experimental (x86 or x86-64) and avx512bwAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠adds_ epu8 Experimental (x86 or x86-64) and avx512bwAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠adds_ epu16 Experimental (x86 or x86-64) and avx512bwAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠alignr_ epi8 Experimental (x86 or x86-64) and avx512bwConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠alignr_ epi32 Experimental (x86 or x86-64) and avx512fConcatenate a and b into a 128-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 64 bytes (16 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠alignr_ epi64 Experimental (x86 or x86-64) and avx512fConcatenate a and b into a 128-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 64 bytes (8 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512fPerforms element-by-element bitwise AND between packed 32-bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠and_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠and_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠andnot_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠andnot_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠andnot_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠andnot_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠avg_ epu8 Experimental (x86 or x86-64) and avx512bwAverage packed unsigned 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠avg_ epu16 Experimental (x86 or x86-64) and avx512bwAverage packed unsigned 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠bitshuffle_ epi64_ mask Experimental (x86 or x86-64) and avx512bitalgConsiders the inputbas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm512_mask_ ⚠blend_ epi8 Experimental (x86 or x86-64) and avx512bwBlend packed 8-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠blend_ epi16 Experimental (x86 or x86-64) and avx512bwBlend packed 16-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠blend_ epi32 Experimental (x86 or x86-64) and avx512fBlend packed 32-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠blend_ epi64 Experimental (x86 or x86-64) and avx512fBlend packed 64-bit integers from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠blend_ pd Experimental (x86 or x86-64) and avx512fBlend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠blend_ ph Experimental (x86 or x86-64) and avx512fp16Blend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠blend_ ps Experimental (x86 or x86-64) and avx512fBlend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm512_mask_ ⚠broadcast_ f32x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠broadcast_ f32x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcast_ f32x8 Experimental (x86 or x86-64) and avx512dqBroadcasts the 8 packed single-precision (32-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠broadcast_ f64x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠broadcast_ f64x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed double-precision (64-bit) floating-point elements from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcast_ i32x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠broadcast_ i32x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcast_ i32x8 Experimental (x86 or x86-64) and avx512dqBroadcasts the 8 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠broadcast_ i64x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the 2 packed 64-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠broadcast_ i64x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed 64-bit integers from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcastb_ epi8 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 8-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcastd_ epi32 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 32-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcastq_ epi64 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 64-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcastsd_ pd Experimental (x86 or x86-64) and avx512fBroadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcastss_ ps Experimental (x86 or x86-64) and avx512fBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠broadcastw_ epi16 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 16-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ ph_ mask Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ round_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cmp_ round_ ph_ mask Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmp_ round_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cmpeq_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpeq_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpge_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpgt_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmple_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmplt_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epu8_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epu16_ mask Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epu32_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ epu64_ mask Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpneq_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpnle_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpnle_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpnlt_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpnlt_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b for not-less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpord_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpord_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b to see if neither is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpunord_ pd_ mask Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmpunord_ ps_ mask Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b to see if either is NaN, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cmul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠cmul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠compress_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compress_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compress_ epi32 Experimental (x86 or x86-64) and avx512fContiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compress_ epi64 Experimental (x86 or x86-64) and avx512fContiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compress_ pd Experimental (x86 or x86-64) and avx512fContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compress_ ps Experimental (x86 or x86-64) and avx512fContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm512_mask_ ⚠compressstoreu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Contiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Contiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ epi32 Experimental (x86 or x86-64) and avx512fContiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ epi64 Experimental (x86 or x86-64) and avx512fContiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ pd Experimental (x86 or x86-64) and avx512fContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠compressstoreu_ ps Experimental (x86 or x86-64) and avx512fContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠conflict_ epi32 Experimental (x86 or x86-64) and avx512cdTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_mask_ ⚠conflict_ epi64 Experimental (x86 or x86-64) and avx512cdTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_mask_ ⚠conj_ pch Experimental (x86 or x86-64) and avx512fp16Compute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠cvt_ roundepi16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundepi32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundepi32_ ps Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundepi64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundepi64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundepi64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundepu16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundepu32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundepu32_ ps Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundepu64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundepu64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundepu64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundpd_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundpd_ ps Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ pd Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvt_ roundph_ ps Experimental (x86 or x86-64) and avx512fConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvt_ roundps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠cvt_ roundps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_mask_ ⚠cvt_ roundps_ pd Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvt_ roundps_ ph Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvtepi8_ epi16 Experimental (x86 or x86-64) and avx512bwSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi8_ epi32 Experimental (x86 or x86-64) and avx512fSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi8_ epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi16_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi16_ epi32 Experimental (x86 or x86-64) and avx512fSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi16_ epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi32_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ pd Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ ps Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi32lo_ pd Experimental (x86 or x86-64) and avx512fPerforms element-by-element conversion of the lower half of packed 32-bit integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi64_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi64_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi64_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtepi64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepi64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtepu8_ epi16 Experimental (x86 or x86-64) and avx512bwZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu8_ epi32 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu8_ epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu16_ epi32 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu16_ epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu32_ epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu32_ pd Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu32_ ps Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu32lo_ pd Experimental (x86 or x86-64) and avx512fPerforms element-by-element conversion of the lower half of 32-bit unsigned integer elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtepu64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtepu64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtne2ps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512fConvert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_mask_ ⚠cvtneps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_mask_ ⚠cvtpbh_ ps Experimental (x86 or x86-64) and avx512bf16,avx512fConverts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtpd_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtpd_ ps Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtpd_ pslo Experimental (x86 or x86-64) and avx512fPerforms an element-by-element conversion of packed double-precision (64-bit) floating-point elements in v2 to single-precision (32-bit) floating-point elements and stores them in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.
- _mm512_mask_ ⚠cvtph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ pd Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtph_ ps Experimental (x86 or x86-64) and avx512fConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtps_ pd Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtps_ ph Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvtpslo_ pd Experimental (x86 or x86-64) and avx512fPerforms element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in v2 to packed double-precision (64-bit) floating-point elements, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi16_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi32_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi32_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi64_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi64_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi64_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtsepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtsepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtt_ roundpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtt_ roundph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtt_ roundph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtt_ roundph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtt_ roundph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtt_ roundph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtt_ roundps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠cvtt_ roundps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_mask_ ⚠cvttpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvttpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvttph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvttps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvttps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠cvtusepi16_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi32_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi32_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi64_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi64_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi64_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtusepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtusepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm512_mask_ ⚠cvtx_ roundph_ ps Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtx_ roundps_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtxph_ ps Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠cvtxps_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm512_mask_ ⚠dbsad_ epu8 Experimental (x86 or x86-64) and avx512bwCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm512_mask_ ⚠div_ pd Experimental (x86 or x86-64) and avx512fDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠div_ ph Experimental (x86 or x86-64) and avx512fp16Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠div_ ps Experimental (x86 or x86-64) and avx512fDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠div_ round_ pd Experimental (x86 or x86-64) and avx512fDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠div_ round_ ph Experimental (x86 or x86-64) and avx512fp16Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ ⚠div_ round_ ps Experimental (x86 or x86-64) and avx512fDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠dpbf16_ ps Experimental (x86 or x86-64) and avx512bf16,avx512fCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm512_mask_ ⚠dpbusd_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠dpbusds_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠dpwssd_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠dpwssds_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expand_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expand_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expand_ epi32 Experimental (x86 or x86-64) and avx512fLoad contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expand_ epi64 Experimental (x86 or x86-64) and avx512fLoad contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expand_ pd Experimental (x86 or x86-64) and avx512fLoad contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expand_ ps Experimental (x86 or x86-64) and avx512fLoad contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi32 Experimental (x86 or x86-64) and avx512fLoad contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ epi64 Experimental (x86 or x86-64) and avx512fLoad contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ pd Experimental (x86 or x86-64) and avx512fLoad contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠expandloadu_ ps Experimental (x86 or x86-64) and avx512fLoad contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠extractf32x4_ ps Experimental (x86 or x86-64) and avx512fExtract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠extractf32x8_ ps Experimental (x86 or x86-64) and avx512dqExtracts 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠extractf64x2_ pd Experimental (x86 or x86-64) and avx512dqExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠extractf64x4_ pd Experimental (x86 or x86-64) and avx512fExtract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠extracti32x4_ epi32 Experimental (x86 or x86-64) and avx512fExtract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM2, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠extracti32x8_ epi32 Experimental (x86 or x86-64) and avx512dqExtracts 256 bits (composed of 8 packed 32-bit integers) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠extracti64x2_ epi64 Experimental (x86 or x86-64) and avx512dqExtracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠extracti64x4_ epi64 Experimental (x86 or x86-64) and avx512fExtract 256 bits (composed of 4 packed 64-bit integers) from a, selected with IMM1, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠fcmadd_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠fcmul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠fcmul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_mask_ ⚠fixupimm_ pd Experimental (x86 or x86-64) and avx512fFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_mask_ ⚠fixupimm_ ps Experimental (x86 or x86-64) and avx512fFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_mask_ ⚠fixupimm_ round_ pd Experimental (x86 or x86-64) and avx512fFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_mask_ ⚠fixupimm_ round_ ps Experimental (x86 or x86-64) and avx512fFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_mask_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmadd_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ ⚠fmadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmaddsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmaddsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmaddsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsubadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmsubadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fmsubadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fmul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ ⚠fmul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fnmadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fnmsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠fnmsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠fpclass_ pd_ mask Experimental (x86 or x86-64) and avx512dqTest packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_mask_ ⚠fpclass_ ph_ mask Experimental (x86 or x86-64) and avx512fp16Test packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_mask_ ⚠fpclass_ ps_ mask Experimental (x86 or x86-64) and avx512dqTest packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm512_mask_ ⚠getexp_ pd Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_mask_ ⚠getexp_ ph Experimental (x86 or x86-64) and avx512fp16Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm512_mask_ ⚠getexp_ ps Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_mask_ ⚠getexp_ round_ pd Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠getexp_ round_ ph Experimental (x86 or x86-64) and avx512fp16Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_mask_ ⚠getexp_ round_ ps Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠getmant_ pd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_mask_ ⚠getmant_ ph Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm512_mask_ ⚠getmant_ ps Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_mask_ ⚠getmant_ round_ pd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠getmant_ round_ ph Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_mask_ ⚠getmant_ round_ ps Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠gf2p8affine_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512fPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_mask_ ⚠gf2p8affineinv_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512fPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_mask_ ⚠gf2p8mul_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512fPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm512_mask_ ⚠i32gather_ epi32 Experimental (x86 or x86-64) and avx512fGather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32gather_ epi64 Experimental (x86 or x86-64) and avx512fGather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32gather_ pd Experimental (x86 or x86-64) and avx512fGather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32gather_ ps Experimental (x86 or x86-64) and avx512fGather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32logather_ epi64 Experimental (x86 or x86-64) and avx512fLoads 8 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠i32logather_ pd Experimental (x86 or x86-64) and avx512fLoads 8 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale and stores them in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠i32loscatter_ epi64 Experimental (x86 or x86-64) and avx512fStores 8 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm512_mask_ ⚠i32loscatter_ pd Experimental (x86 or x86-64) and avx512fStores 8 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in the lower half of vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm512_mask_ ⚠i32scatter_ epi32 Experimental (x86 or x86-64) and avx512fScatter 32-bit integers from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32scatter_ epi64 Experimental (x86 or x86-64) and avx512fScatter 64-bit integers from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32scatter_ pd Experimental (x86 or x86-64) and avx512fScatter double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i32scatter_ ps Experimental (x86 or x86-64) and avx512fScatter single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ epi32 Experimental (x86 or x86-64) and avx512fGather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ epi64 Experimental (x86 or x86-64) and avx512fGather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ pd Experimental (x86 or x86-64) and avx512fGather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64gather_ ps Experimental (x86 or x86-64) and avx512fGather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ epi32 Experimental (x86 or x86-64) and avx512fScatter 32-bit integers from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ epi64 Experimental (x86 or x86-64) and avx512fScatter 64-bit integers from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ pd Experimental (x86 or x86-64) and avx512fScatter double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠i64scatter_ ps Experimental (x86 or x86-64) and avx512fScatter single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k (elements are not stored when the corresponding mask bit is not set). scale should be 1, 2, 4 or 8.
- _mm512_mask_ ⚠insertf32x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠insertf32x8 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠insertf64x2 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠insertf64x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠inserti32x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠inserti32x8 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠inserti64x2 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠inserti64x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 256 bits (composed of 4 packed 64-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠load_ epi32 Experimental (x86 or x86-64) and avx512fLoad packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠load_ epi64 Experimental (x86 or x86-64) and avx512fLoad packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠load_ pd Experimental (x86 or x86-64) and avx512fLoad packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠load_ ps Experimental (x86 or x86-64) and avx512fLoad packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠loadu_ epi8 Experimental (x86 or x86-64) and avx512bwLoad packed 8-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ epi16 Experimental (x86 or x86-64) and avx512bwLoad packed 16-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ epi32 Experimental (x86 or x86-64) and avx512fLoad packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ epi64 Experimental (x86 or x86-64) and avx512fLoad packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ pd Experimental (x86 or x86-64) and avx512fLoad packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠loadu_ ps Experimental (x86 or x86-64) and avx512fLoad packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠lzcnt_ epi32 Experimental (x86 or x86-64) and avx512cdCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠lzcnt_ epi64 Experimental (x86 or x86-64) and avx512cdCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠madd52hi_ epu64 Experimental (x86 or x86-64) and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm512_mask_ ⚠madd52lo_ epu64 Experimental (x86 or x86-64) and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm512_mask_ ⚠madd_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠maddubs_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_mask_ ⚠max_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠max_ round_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠max_ round_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_mask_ ⚠max_ round_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_mask_ ⚠min_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠min_ round_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠min_ round_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_mask_ ⚠min_ round_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠mov_ epi8 Experimental (x86 or x86-64) and avx512bwMove packed 8-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mov_ epi16 Experimental (x86 or x86-64) and avx512bwMove packed 16-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mov_ epi32 Experimental (x86 or x86-64) and avx512fMove packed 32-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mov_ epi64 Experimental (x86 or x86-64) and avx512fMove packed 64-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mov_ pd Experimental (x86 or x86-64) and avx512fMove packed double-precision (64-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mov_ ps Experimental (x86 or x86-64) and avx512fMove packed single-precision (32-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠movedup_ pd Experimental (x86 or x86-64) and avx512fDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠movehdup_ ps Experimental (x86 or x86-64) and avx512fDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠moveldup_ ps Experimental (x86 or x86-64) and avx512fDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512fMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mul_ epu32 Experimental (x86 or x86-64) and avx512fMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply the packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mask_ ⚠mul_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠mul_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ ⚠mul_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠mulhi_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mulhi_ epu16 Experimental (x86 or x86-64) and avx512bwMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mulhrs_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mullo_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mullo_ epi32 Experimental (x86 or x86-64) and avx512fMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠mullo_ epi64 Experimental (x86 or x86-64) and avx512dqMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing writemaskk(elements are copied fromsrcif the corresponding bit is not set).
- _mm512_mask_ ⚠mullox_ epi64 Experimental (x86 or x86-64) and avx512fMultiplies elements in packed 64-bit integer vectors a and b together, storing the lower 64 bits of the result in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠multishift_ epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmiFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠or_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠or_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠packs_ epi16 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠packs_ epi32 Experimental (x86 or x86-64) and avx512bwConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠packus_ epi16 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠packus_ epi32 Experimental (x86 or x86-64) and avx512bwConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permute_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permute_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutevar_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the permutevar name. This intrinsic is identical to _mm512_mask_permutexvar_epi32, and it is recommended that you use that intrinsic name.
- _mm512_mask_ ⚠permutevar_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutevar_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutex_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutexvar_ epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutexvar_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutexvar_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutexvar_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutexvar_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠permutexvar_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠popcnt_ epi8 Experimental (x86 or x86-64) and avx512bitalgFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ ⚠popcnt_ epi16 Experimental (x86 or x86-64) and avx512bitalgFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ ⚠popcnt_ epi32 Experimental (x86 or x86-64) and avx512vpopcntdqFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ ⚠popcnt_ epi64 Experimental (x86 or x86-64) and avx512vpopcntdqFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm512_mask_ ⚠range_ pd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_mask_ ⚠range_ ps Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_mask_ ⚠range_ round_ pd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_mask_ ⚠range_ round_ ps Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_mask_ ⚠rcp14_ pd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ ⚠rcp14_ ps Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ ⚠rcp_ ph Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_mask_ ⚠reduce_ add_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ ⚠reduce_ add_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ ⚠reduce_ add_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ ⚠reduce_ add_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ ⚠reduce_ and_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm512_mask_ ⚠reduce_ and_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm512_mask_ ⚠reduce_ max_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed signed 32-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ ⚠reduce_ max_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed signed 64-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ ⚠reduce_ max_ epu32 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 32-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ ⚠reduce_ max_ epu64 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 64-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ ⚠reduce_ max_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ ⚠reduce_ max_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm512_mask_ ⚠reduce_ min_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed signed 32-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ ⚠reduce_ min_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed signed 64-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ ⚠reduce_ min_ epu32 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 32-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ ⚠reduce_ min_ epu64 Experimental (x86 or x86-64) and avx512fReduce the packed signed 64-bit integers in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ ⚠reduce_ min_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ ⚠reduce_ min_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by maximum using mask k. Returns the minimum of all active elements in a.
- _mm512_mask_ ⚠reduce_ mul_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ ⚠reduce_ mul_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ ⚠reduce_ mul_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ ⚠reduce_ mul_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm512_mask_ ⚠reduce_ or_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm512_mask_ ⚠reduce_ or_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm512_mask_ ⚠reduce_ pd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ ⚠reduce_ ph Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠reduce_ ps Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ ⚠reduce_ round_ pd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ ⚠reduce_ round_ ph Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠reduce_ round_ ps Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_mask_ ⚠rol_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠rol_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠rolv_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠rolv_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠ror_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠ror_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠rorv_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠rorv_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠roundscale_ pd Experimental (x86 or x86-64) and avx512fRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ ⚠roundscale_ ph Experimental (x86 or x86-64) and avx512fp16Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠roundscale_ ps Experimental (x86 or x86-64) and avx512fRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ ⚠roundscale_ round_ pd Experimental (x86 or x86-64) and avx512fRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ ⚠roundscale_ round_ ph Experimental (x86 or x86-64) and avx512fp16Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_mask_ ⚠roundscale_ round_ ps Experimental (x86 or x86-64) and avx512fRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_mask_ ⚠rsqrt14_ pd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ ⚠rsqrt14_ ps Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_mask_ ⚠rsqrt_ ph Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_mask_ ⚠scalef_ pd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠scalef_ ph Experimental (x86 or x86-64) and avx512fp16Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠scalef_ ps Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠scalef_ round_ pd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠scalef_ round_ ph Experimental (x86 or x86-64) and avx512fp16Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠scalef_ round_ ps Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠set1_ epi8 Experimental (x86 or x86-64) and avx512bwBroadcast 8-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠set1_ epi16 Experimental (x86 or x86-64) and avx512bwBroadcast 16-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠set1_ epi32 Experimental (x86 or x86-64) and avx512fBroadcast 32-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠set1_ epi64 Experimental (x86 or x86-64) and avx512fBroadcast 64-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shldi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shldi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shldi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shldv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shldv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shldv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shrdi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shrdi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shrdi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shrdv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shrdv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shrdv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ epi8 Experimental (x86 or x86-64) and avx512bwShuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ f32x4 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ f64x2 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ i32x4 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ i64x2 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shuffle_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shufflehi_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠shufflelo_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sll_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sll_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sll_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠slli_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠slli_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠slli_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sllv_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sllv_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sllv_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sqrt_ pd Experimental (x86 or x86-64) and avx512fCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sqrt_ ph Experimental (x86 or x86-64) and avx512fp16Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sqrt_ ps Experimental (x86 or x86-64) and avx512fCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sqrt_ round_ pd Experimental (x86 or x86-64) and avx512fCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠sqrt_ round_ ph Experimental (x86 or x86-64) and avx512fp16Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ ⚠sqrt_ round_ ps Experimental (x86 or x86-64) and avx512fCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠sra_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sra_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sra_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srai_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srai_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srai_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srav_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srav_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srav_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srl_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srl_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srl_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srli_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srli_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srli_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srlv_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srlv_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠srlv_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠store_ epi32 Experimental (x86 or x86-64) and avx512fStore packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠store_ epi64 Experimental (x86 or x86-64) and avx512fStore packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠store_ pd Experimental (x86 or x86-64) and avx512fStore packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠store_ ps Experimental (x86 or x86-64) and avx512fStore packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_mask_ ⚠storeu_ epi8 Experimental (x86 or x86-64) and avx512bwStore packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ epi16 Experimental (x86 or x86-64) and avx512bwStore packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ epi32 Experimental (x86 or x86-64) and avx512fStore packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ epi64 Experimental (x86 or x86-64) and avx512fStore packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ pd Experimental (x86 or x86-64) and avx512fStore packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠storeu_ ps Experimental (x86 or x86-64) and avx512fStore packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm512_mask_ ⚠sub_ epi8 Experimental (x86 or x86-64) and avx512bwSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ epi16 Experimental (x86 or x86-64) and avx512bwSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ epi32 Experimental (x86 or x86-64) and avx512fSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ epi64 Experimental (x86 or x86-64) and avx512fSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ pd Experimental (x86 or x86-64) and avx512fSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ ph Experimental (x86 or x86-64) and avx512fp16Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ ps Experimental (x86 or x86-64) and avx512fSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠sub_ round_ pd Experimental (x86 or x86-64) and avx512fSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠sub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mask_ ⚠sub_ round_ ps Experimental (x86 or x86-64) and avx512fSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).\
- _mm512_mask_ ⚠subs_ epi8 Experimental (x86 or x86-64) and avx512bwSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠subs_ epi16 Experimental (x86 or x86-64) and avx512bwSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠subs_ epu8 Experimental (x86 or x86-64) and avx512bwSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠subs_ epu16 Experimental (x86 or x86-64) and avx512bwSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠ternarylogic_ epi32 Experimental (x86 or x86-64) and avx512fBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32-bit granularity (32-bit elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠ternarylogic_ epi64 Experimental (x86 or x86-64) and avx512fBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64-bit granularity (64-bit elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠test_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ ⚠test_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ ⚠test_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ ⚠test_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm512_mask_ ⚠testn_ epi8_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ ⚠testn_ epi16_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ ⚠testn_ epi32_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ ⚠testn_ epi64_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm512_mask_ ⚠unpackhi_ epi8 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpackhi_ epi16 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpackhi_ epi32 Experimental (x86 or x86-64) and avx512fUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpackhi_ epi64 Experimental (x86 or x86-64) and avx512fUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpackhi_ pd Experimental (x86 or x86-64) and avx512fUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpackhi_ ps Experimental (x86 or x86-64) and avx512fUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpacklo_ epi8 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpacklo_ epi16 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpacklo_ epi32 Experimental (x86 or x86-64) and avx512fUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpacklo_ epi64 Experimental (x86 or x86-64) and avx512fUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpacklo_ pd Experimental (x86 or x86-64) and avx512fUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠unpacklo_ ps Experimental (x86 or x86-64) and avx512fUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠xor_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠xor_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm512_mask_ ⚠xor_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_mask_ ⚠xor_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm512_maskz_ ⚠abs_ epi8 Experimental (x86 or x86-64) and avx512bwCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠abs_ epi16 Experimental (x86 or x86-64) and avx512bwCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠abs_ epi32 Experimental (x86 or x86-64) and avx512fComputes the absolute value of packed 32-bit integers ina, and store the unsigned results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠abs_ epi64 Experimental (x86 or x86-64) and avx512fCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bwAdd packed 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bwAdd packed 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512fAdd packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512fAdd packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ pd Experimental (x86 or x86-64) and avx512fAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ ps Experimental (x86 or x86-64) and avx512fAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠add_ round_ pd Experimental (x86 or x86-64) and avx512fAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠add_ round_ ph Experimental (x86 or x86-64) and avx512fp16Add packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ ⚠add_ round_ ps Experimental (x86 or x86-64) and avx512fAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠adds_ epi8 Experimental (x86 or x86-64) and avx512bwAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠adds_ epi16 Experimental (x86 or x86-64) and avx512bwAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠adds_ epu8 Experimental (x86 or x86-64) and avx512bwAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠adds_ epu16 Experimental (x86 or x86-64) and avx512bwAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠alignr_ epi8 Experimental (x86 or x86-64) and avx512bwConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠alignr_ epi32 Experimental (x86 or x86-64) and avx512fConcatenate a and b into a 128-byte immediate result, shift the result right by imm8 32-bit elements, and stores the low 64 bytes (16 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠alignr_ epi64 Experimental (x86 or x86-64) and avx512fConcatenate a and b into a 128-byte immediate result, shift the result right by imm8 64-bit elements, and stores the low 64 bytes (8 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠and_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠and_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠andnot_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠andnot_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠andnot_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠andnot_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠avg_ epu8 Experimental (x86 or x86-64) and avx512bwAverage packed unsigned 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠avg_ epu16 Experimental (x86 or x86-64) and avx512bwAverage packed unsigned 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcast_ f32x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the lower 2 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠broadcast_ f32x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcast_ f32x8 Experimental (x86 or x86-64) and avx512dqBroadcasts the 8 packed single-precision (32-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠broadcast_ f64x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the 2 packed double-precision (64-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠broadcast_ f64x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed double-precision (64-bit) floating-point elements from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcast_ i32x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠broadcast_ i32x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcast_ i32x8 Experimental (x86 or x86-64) and avx512dqBroadcasts the 8 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠broadcast_ i64x2 Experimental (x86 or x86-64) and avx512dqBroadcasts the 2 packed 64-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠broadcast_ i64x4 Experimental (x86 or x86-64) and avx512fBroadcast the 4 packed 64-bit integers from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcastb_ epi8 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 8-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcastd_ epi32 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 32-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcastq_ epi64 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 64-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcastsd_ pd Experimental (x86 or x86-64) and avx512fBroadcast the low double-precision (64-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcastss_ ps Experimental (x86 or x86-64) and avx512fBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠broadcastw_ epi16 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cmul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠cmul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠compress_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Contiguously store the active 8-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ ⚠compress_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Contiguously store the active 16-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ ⚠compress_ epi32 Experimental (x86 or x86-64) and avx512fContiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ ⚠compress_ epi64 Experimental (x86 or x86-64) and avx512fContiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ ⚠compress_ pd Experimental (x86 or x86-64) and avx512fContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ ⚠compress_ ps Experimental (x86 or x86-64) and avx512fContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm512_maskz_ ⚠conflict_ epi32 Experimental (x86 or x86-64) and avx512cdTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_maskz_ ⚠conflict_ epi64 Experimental (x86 or x86-64) and avx512cdTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm512_maskz_ ⚠conj_ pch Experimental (x86 or x86-64) and avx512fp16Compute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠cvt_ roundepi16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundepi32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundepi32_ ps Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundepi64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundepi64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundepi64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundepu16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundepu32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundepu32_ ps Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundepu64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundepu64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundepu64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundpd_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundpd_ ps Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ pd Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvt_ roundph_ ps Experimental (x86 or x86-64) and avx512fConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvt_ roundps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠cvt_ roundps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Rounding is done according to the ROUNDING parameter, which can be one of:
- _mm512_maskz_ ⚠cvt_ roundps_ pd Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvt_ roundps_ ph Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvtepi8_ epi16 Experimental (x86 or x86-64) and avx512bwSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi8_ epi32 Experimental (x86 or x86-64) and avx512fSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi8_ epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi16_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi16_ epi32 Experimental (x86 or x86-64) and avx512fSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi16_ epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi32_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi32_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi32_ epi64 Experimental (x86 or x86-64) and avx512fSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi32_ pd Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi32_ ps Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi64_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi64_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi64_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtepi64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepi64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtepu8_ epi16 Experimental (x86 or x86-64) and avx512bwZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu8_ epi32 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu8_ epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 8-bit integers in the low 8 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu16_ epi32 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu16_ epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu16_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu32_ epi64 Experimental (x86 or x86-64) and avx512fZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu32_ pd Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu32_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu32_ ps Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu64_ pd Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtepu64_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtepu64_ ps Experimental (x86 or x86-64) and avx512dqConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtne2ps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512fConvert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_maskz_ ⚠cvtneps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_maskz_ ⚠cvtpbh_ ps Experimental (x86 or x86-64) and avx512bf16,avx512fConverts packed BF16 (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtpd_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtpd_ ps Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ pd Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtph_ ps Experimental (x86 or x86-64) and avx512fConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtps_ pd Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtps_ ph Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvtsepi16_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtsepi32_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtsepi32_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm512_maskz_ ⚠cvtsepi64_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtsepi64_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtsepi64_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtt_ roundps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠cvtt_ roundps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC to the sae parameter.
- _mm512_maskz_ ⚠cvttpd_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttpd_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvttpd_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttpd_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding
- _mm512_maskz_ ⚠cvttph_ epi16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttph_ epi32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttph_ epi64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttph_ epu16 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttph_ epu32 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttph_ epu64 Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttps_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttps_ epi64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvttps_ epu32 Experimental (x86 or x86-64) and avx512fConvert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvttps_ epu64 Experimental (x86 or x86-64) and avx512dqConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠cvtusepi16_ epi8 Experimental (x86 or x86-64) and avx512bwConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtusepi32_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtusepi32_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtusepi64_ epi8 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtusepi64_ epi16 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtusepi64_ epi32 Experimental (x86 or x86-64) and avx512fConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtx_ roundph_ ps Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtx_ roundps_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtxph_ ps Experimental (x86 or x86-64) and avx512fp16Convert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠cvtxps_ ph Experimental (x86 or x86-64) and avx512fp16Convert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠dbsad_ epu8 Experimental (x86 or x86-64) and avx512bwCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm512_maskz_ ⚠div_ pd Experimental (x86 or x86-64) and avx512fDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠div_ ph Experimental (x86 or x86-64) and avx512fp16Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠div_ ps Experimental (x86 or x86-64) and avx512fDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠div_ round_ pd Experimental (x86 or x86-64) and avx512fDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠div_ round_ ph Experimental (x86 or x86-64) and avx512fp16Divide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ ⚠div_ round_ ps Experimental (x86 or x86-64) and avx512fDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠dpbf16_ ps Experimental (x86 or x86-64) and avx512bf16,avx512fCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm512_maskz_ ⚠dpbusd_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠dpbusds_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠dpwssd_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠dpwssds_ epi32 Experimental (x86 or x86-64) and avx512vnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expand_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expand_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expand_ epi32 Experimental (x86 or x86-64) and avx512fLoad contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expand_ epi64 Experimental (x86 or x86-64) and avx512fLoad contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expand_ pd Experimental (x86 or x86-64) and avx512fLoad contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expand_ ps Experimental (x86 or x86-64) and avx512fLoad contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Load contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi32 Experimental (x86 or x86-64) and avx512fLoad contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ epi64 Experimental (x86 or x86-64) and avx512fLoad contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ pd Experimental (x86 or x86-64) and avx512fLoad contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠expandloadu_ ps Experimental (x86 or x86-64) and avx512fLoad contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠extractf32x4_ ps Experimental (x86 or x86-64) and avx512fExtract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠extractf32x8_ ps Experimental (x86 or x86-64) and avx512dqExtracts 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠extractf64x2_ pd Experimental (x86 or x86-64) and avx512dqExtracts 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠extractf64x4_ pd Experimental (x86 or x86-64) and avx512fExtract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a, selected with imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠extracti32x4_ epi32 Experimental (x86 or x86-64) and avx512fExtract 128 bits (composed of 4 packed 32-bit integers) from a, selected with IMM2, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠extracti32x8_ epi32 Experimental (x86 or x86-64) and avx512dqExtracts 256 bits (composed of 8 packed 32-bit integers) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠extracti64x2_ epi64 Experimental (x86 or x86-64) and avx512dqExtracts 128 bits (composed of 2 packed 64-bit integers) from a, selected with IMM8, and stores the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠extracti64x4_ epi64 Experimental (x86 or x86-64) and avx512fExtract 256 bits (composed of 4 packed 64-bit integers) from a, selected with IMM1, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠fcmadd_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠fcmul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠fcmul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm512_maskz_ ⚠fixupimm_ pd Experimental (x86 or x86-64) and avx512fFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_maskz_ ⚠fixupimm_ ps Experimental (x86 or x86-64) and avx512fFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm512_maskz_ ⚠fixupimm_ round_ pd Experimental (x86 or x86-64) and avx512fFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_maskz_ ⚠fixupimm_ round_ ps Experimental (x86 or x86-64) and avx512fFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.\
- _mm512_maskz_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmadd_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ ⚠fmadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in a using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmaddsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmaddsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmaddsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsubadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmsubadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fmsubadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fmul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ ⚠fmul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1]. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmadd_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fnmadd_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmadd_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmsub_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠fnmsub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠fnmsub_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠getexp_ pd Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_maskz_ ⚠getexp_ ph Experimental (x86 or x86-64) and avx512fp16Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm512_maskz_ ⚠getexp_ ps Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm512_maskz_ ⚠getexp_ round_ pd Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠getexp_ round_ ph Experimental (x86 or x86-64) and avx512fp16Convert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_maskz_ ⚠getexp_ round_ ps Experimental (x86 or x86-64) and avx512fConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠getmant_ pd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_maskz_ ⚠getmant_ ph Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm512_maskz_ ⚠getmant_ ps Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm512_maskz_ ⚠getmant_ round_ pd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠getmant_ round_ ph Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_maskz_ ⚠getmant_ round_ ps Experimental (x86 or x86-64) and avx512fNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠gf2p8affine_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512fPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_maskz_ ⚠gf2p8affineinv_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512fPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm512_maskz_ ⚠gf2p8mul_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512fPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm512_maskz_ ⚠insertf32x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠insertf32x8 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠insertf64x2 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠insertf64x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠inserti32x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 128 bits (composed of 4 packed 32-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠inserti32x8 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 256 bits (composed of 8 packed 32-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠inserti64x2 Experimental (x86 or x86-64) and avx512dqCopy a to tmp, then insert 128 bits (composed of 2 packed 64-bit integers) from b into tmp at the location specified by IMM8, and copy tmp to dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠inserti64x4 Experimental (x86 or x86-64) and avx512fCopy a to tmp, then insert 256 bits (composed of 4 packed 64-bit integers) from b into tmp at the location specified by imm8. Store tmp to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠load_ epi32 Experimental (x86 or x86-64) and avx512fLoad packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠load_ epi64 Experimental (x86 or x86-64) and avx512fLoad packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠load_ pd Experimental (x86 or x86-64) and avx512fLoad packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠load_ ps Experimental (x86 or x86-64) and avx512fLoad packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_maskz_ ⚠loadu_ epi8 Experimental (x86 or x86-64) and avx512bwLoad packed 8-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ epi16 Experimental (x86 or x86-64) and avx512bwLoad packed 16-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ epi32 Experimental (x86 or x86-64) and avx512fLoad packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ epi64 Experimental (x86 or x86-64) and avx512fLoad packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ pd Experimental (x86 or x86-64) and avx512fLoad packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠loadu_ ps Experimental (x86 or x86-64) and avx512fLoad packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm512_maskz_ ⚠lzcnt_ epi32 Experimental (x86 or x86-64) and avx512cdCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠lzcnt_ epi64 Experimental (x86 or x86-64) and avx512cdCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠madd52hi_ epu64 Experimental (x86 or x86-64) and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠madd52lo_ epu64 Experimental (x86 or x86-64) and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠madd_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠maddubs_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ ⚠max_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠max_ round_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠max_ round_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ ⚠max_ round_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ ⚠min_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠min_ round_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠min_ round_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_maskz_ ⚠min_ round_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠mov_ epi8 Experimental (x86 or x86-64) and avx512bwMove packed 8-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mov_ epi16 Experimental (x86 or x86-64) and avx512bwMove packed 16-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mov_ epi32 Experimental (x86 or x86-64) and avx512fMove packed 32-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mov_ epi64 Experimental (x86 or x86-64) and avx512fMove packed 64-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mov_ pd Experimental (x86 or x86-64) and avx512fMove packed double-precision (64-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mov_ ps Experimental (x86 or x86-64) and avx512fMove packed single-precision (32-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠movedup_ pd Experimental (x86 or x86-64) and avx512fDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠movehdup_ ps Experimental (x86 or x86-64) and avx512fDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠moveldup_ ps Experimental (x86 or x86-64) and avx512fDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512fMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mul_ epu32 Experimental (x86 or x86-64) and avx512fMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mul_ pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mul_ round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply the packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_maskz_ ⚠mul_ round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠mul_ round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ ⚠mul_ round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠mulhi_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mulhi_ epu16 Experimental (x86 or x86-64) and avx512bwMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mulhrs_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mullo_ epi16 Experimental (x86 or x86-64) and avx512bwMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mullo_ epi32 Experimental (x86 or x86-64) and avx512fMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠mullo_ epi64 Experimental (x86 or x86-64) and avx512dqMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing zeromaskk(elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠multishift_ epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmiFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠or_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠or_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠packs_ epi16 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠packs_ epi32 Experimental (x86 or x86-64) and avx512bwConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠packus_ epi16 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠packus_ epi32 Experimental (x86 or x86-64) and avx512bwConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permute_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permute_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutevar_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutevar_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutex_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutexvar_ epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutexvar_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutexvar_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutexvar_ epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutexvar_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠permutexvar_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠popcnt_ epi8 Experimental (x86 or x86-64) and avx512bitalgFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ ⚠popcnt_ epi16 Experimental (x86 or x86-64) and avx512bitalgFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ ⚠popcnt_ epi32 Experimental (x86 or x86-64) and avx512vpopcntdqFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ ⚠popcnt_ epi64 Experimental (x86 or x86-64) and avx512vpopcntdqFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm512_maskz_ ⚠range_ pd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_maskz_ ⚠range_ ps Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_maskz_ ⚠range_ round_ pd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_maskz_ ⚠range_ round_ ps Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_maskz_ ⚠rcp14_ pd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ ⚠rcp14_ ps Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ ⚠rcp_ ph Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_maskz_ ⚠reduce_ pd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ ⚠reduce_ ph Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠reduce_ ps Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ ⚠reduce_ round_ pd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ ⚠reduce_ round_ ph Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠reduce_ round_ ps Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_maskz_ ⚠rol_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠rol_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠rolv_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠rolv_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠ror_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠ror_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠rorv_ epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠rorv_ epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠roundscale_ pd Experimental (x86 or x86-64) and avx512fRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ ⚠roundscale_ ph Experimental (x86 or x86-64) and avx512fp16Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠roundscale_ ps Experimental (x86 or x86-64) and avx512fRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ ⚠roundscale_ round_ pd Experimental (x86 or x86-64) and avx512fRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ ⚠roundscale_ round_ ph Experimental (x86 or x86-64) and avx512fp16Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_maskz_ ⚠roundscale_ round_ ps Experimental (x86 or x86-64) and avx512fRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_maskz_ ⚠rsqrt14_ pd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ ⚠rsqrt14_ ps Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm512_maskz_ ⚠rsqrt_ ph Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_maskz_ ⚠scalef_ pd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠scalef_ ph Experimental (x86 or x86-64) and avx512fp16Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠scalef_ ps Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠scalef_ round_ pd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠scalef_ round_ ph Experimental (x86 or x86-64) and avx512fp16Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠scalef_ round_ ps Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠set1_ epi8 Experimental (x86 or x86-64) and avx512bwBroadcast 8-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠set1_ epi16 Experimental (x86 or x86-64) and avx512bwBroadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠set1_ epi32 Experimental (x86 or x86-64) and avx512fBroadcast 32-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠set1_ epi64 Experimental (x86 or x86-64) and avx512fBroadcast 64-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shldi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shldi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shldi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shldv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shldv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shldv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shrdi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shrdi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shrdi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shrdv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shrdv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shrdv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ epi8 Experimental (x86 or x86-64) and avx512bwShuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ f32x4 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ f64x2 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ i32x4 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ i64x2 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shuffle_ ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shufflehi_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠shufflelo_ epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sll_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sll_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sll_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠slli_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠slli_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠slli_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sllv_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sllv_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sllv_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sqrt_ pd Experimental (x86 or x86-64) and avx512fCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sqrt_ ph Experimental (x86 or x86-64) and avx512fp16Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sqrt_ ps Experimental (x86 or x86-64) and avx512fCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sqrt_ round_ pd Experimental (x86 or x86-64) and avx512fCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠sqrt_ round_ ph Experimental (x86 or x86-64) and avx512fp16Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ ⚠sqrt_ round_ ps Experimental (x86 or x86-64) and avx512fCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠sra_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sra_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sra_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srai_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srai_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srai_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srav_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srav_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srav_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srl_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srl_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srl_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srli_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srli_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srli_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srlv_ epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srlv_ epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠srlv_ epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ epi8 Experimental (x86 or x86-64) and avx512bwSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ epi16 Experimental (x86 or x86-64) and avx512bwSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ epi32 Experimental (x86 or x86-64) and avx512fSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ epi64 Experimental (x86 or x86-64) and avx512fSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ pd Experimental (x86 or x86-64) and avx512fSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ ph Experimental (x86 or x86-64) and avx512fp16Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ ps Experimental (x86 or x86-64) and avx512fSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠sub_ round_ pd Experimental (x86 or x86-64) and avx512fSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠sub_ round_ ph Experimental (x86 or x86-64) and avx512fp16Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm512_maskz_ ⚠sub_ round_ ps Experimental (x86 or x86-64) and avx512fSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).\
- _mm512_maskz_ ⚠subs_ epi8 Experimental (x86 or x86-64) and avx512bwSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠subs_ epi16 Experimental (x86 or x86-64) and avx512bwSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠subs_ epu8 Experimental (x86 or x86-64) and avx512bwSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠subs_ epu16 Experimental (x86 or x86-64) and avx512bwSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠ternarylogic_ epi32 Experimental (x86 or x86-64) and avx512fBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠ternarylogic_ epi64 Experimental (x86 or x86-64) and avx512fBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpackhi_ epi8 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpackhi_ epi16 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpackhi_ epi32 Experimental (x86 or x86-64) and avx512fUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpackhi_ epi64 Experimental (x86 or x86-64) and avx512fUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpackhi_ pd Experimental (x86 or x86-64) and avx512fUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpackhi_ ps Experimental (x86 or x86-64) and avx512fUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpacklo_ epi8 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpacklo_ epi16 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpacklo_ epi32 Experimental (x86 or x86-64) and avx512fUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpacklo_ epi64 Experimental (x86 or x86-64) and avx512fUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpacklo_ pd Experimental (x86 or x86-64) and avx512fUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠unpacklo_ ps Experimental (x86 or x86-64) and avx512fUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠xor_ epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠xor_ epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm512_maskz_ ⚠xor_ pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_maskz_ ⚠xor_ ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm512_max_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epi32 Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epu16 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epu32 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠epu64 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_max_ ⚠ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst.
- _mm512_max_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_max_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm512_max_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_min_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epi32 Experimental (x86 or x86-64) and avx512fCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epu16 Experimental (x86 or x86-64) and avx512bwCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epu32 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠epu64 Experimental (x86 or x86-64) and avx512fCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_min_ ⚠ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst.
- _mm512_min_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_min_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Compare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm512_min_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_movedup_ ⚠pd Experimental (x86 or x86-64) and avx512fDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst.
- _mm512_movehdup_ ⚠ps Experimental (x86 or x86-64) and avx512fDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
- _mm512_moveldup_ ⚠ps Experimental (x86 or x86-64) and avx512fDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst.
- _mm512_movepi8_ ⚠mask Experimental (x86 or x86-64) and avx512bwSet each bit of mask register k based on the most significant bit of the corresponding packed 8-bit integer in a.
- _mm512_movepi16_ ⚠mask Experimental (x86 or x86-64) and avx512bwSet each bit of mask register k based on the most significant bit of the corresponding packed 16-bit integer in a.
- _mm512_movepi32_ ⚠mask Experimental (x86 or x86-64) and avx512dqSet each bit of mask register k based on the most significant bit of the corresponding packed 32-bit integer in a.
- _mm512_movepi64_ ⚠mask Experimental (x86 or x86-64) and avx512dqSet each bit of mask register k based on the most significant bit of the corresponding packed 64-bit integer in a.
- _mm512_movm_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwSet each packed 8-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_movm_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwSet each packed 16-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_movm_ ⚠epi32 Experimental (x86 or x86-64) and avx512dqSet each packed 32-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_movm_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqSet each packed 64-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm512_mul_ ⚠epi32 Experimental (x86 or x86-64) and avx512fMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst.
- _mm512_mul_ ⚠epu32 Experimental (x86 or x86-64) and avx512fMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst.
- _mm512_mul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16Multiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mul_ ⚠pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_mul_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_mul_ ⚠ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.
- _mm512_mul_ ⚠round_ pch Experimental (x86 or x86-64) and avx512fp16Multiply the packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm512_mul_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_mul_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Multiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_mul_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst.\
- _mm512_mulhi_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
- _mm512_mulhi_ ⚠epu16 Experimental (x86 or x86-64) and avx512bwMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst.
- _mm512_mulhrs_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
- _mm512_mullo_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst.
- _mm512_mullo_ ⚠epi32 Experimental (x86 or x86-64) and avx512fMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.
- _mm512_mullo_ ⚠epi64 Experimental (x86 or x86-64) and avx512dqMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indst.
- _mm512_mullox_ ⚠epi64 Experimental (x86 or x86-64) and avx512fMultiplies elements in packed 64-bit integer vectors a and b together, storing the lower 64 bits of the result in dst.
- _mm512_multishift_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmiFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst.
- _mm512_or_ ⚠epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst.
- _mm512_or_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of packed 64-bit integers in a and b, and store the resut in dst.
- _mm512_or_ ⚠pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst.
- _mm512_or_ ⚠ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst.
- _mm512_or_ ⚠si512 Experimental (x86 or x86-64) and avx512fCompute the bitwise OR of 512 bits (representing integer data) in a and b, and store the result in dst.
- _mm512_packs_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst.
- _mm512_packs_ ⚠epi32 Experimental (x86 or x86-64) and avx512bwConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst.
- _mm512_packus_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst.
- _mm512_packus_ ⚠epi32 Experimental (x86 or x86-64) and avx512bwConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst.
- _mm512_permute_ ⚠pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permute_ ⚠ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permutevar_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst. Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the permutevar name. This intrinsic is identical to _mm512_permutexvar_epi32, and it is recommended that you use that intrinsic name.
- _mm512_permutevar_ ⚠pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
- _mm512_permutevar_ ⚠ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst.
- _mm512_permutex2var_ ⚠epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ⚠pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Shuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex2var_ ⚠ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm512_permutex_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permutex_ ⚠pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a within 256-bit lanes using the control in imm8, and store the results in dst.
- _mm512_permutexvar_ ⚠epi8 Experimental (x86 or x86-64) and avx512vbmiShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShuffle 32-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShuffle 64-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ⚠pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Shuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm512_permutexvar_ ⚠ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a across lanes using the corresponding index in idx.
- _mm512_popcnt_ ⚠epi8 Experimental (x86 or x86-64) and avx512bitalgFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm512_popcnt_ ⚠epi16 Experimental (x86 or x86-64) and avx512bitalgFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm512_popcnt_ ⚠epi32 Experimental (x86 or x86-64) and avx512vpopcntdqFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm512_popcnt_ ⚠epi64 Experimental (x86 or x86-64) and avx512vpopcntdqFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm512_range_ ⚠pd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_range_ ⚠ps Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm512_range_ ⚠round_ pd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_range_ ⚠round_ ps Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm512_rcp14_ ⚠pd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_rcp14_ ⚠ps Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_rcp_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_reduce_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ ⚠add_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16Reduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ ⚠add_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm512_reduce_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm512_reduce_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm512_reduce_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed signed 32-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed signed 64-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 32-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 64-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠max_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16Reduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠max_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm512_reduce_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed signed 32-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed signed 64-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 32-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512fReduce the packed unsigned 64-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠min_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16Reduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠min_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm512_reduce_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ ⚠mul_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512fReduce the packed double-precision (64-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16Reduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512fReduce the packed single-precision (32-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm512_reduce_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512fReduce the packed 32-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm512_reduce_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512fReduce the packed 64-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm512_reduce_ ⚠pd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_reduce_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_reduce_ ⚠ps Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_reduce_ ⚠round_ pd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_reduce_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm512_reduce_ ⚠round_ ps Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm512_rol_ ⚠epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm512_rol_ ⚠epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm512_rolv_ ⚠epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_rolv_ ⚠epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_ror_ ⚠epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm512_ror_ ⚠epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm512_rorv_ ⚠epi32 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_rorv_ ⚠epi64 Experimental (x86 or x86-64) and avx512fRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm512_roundscale_ ⚠pd Experimental (x86 or x86-64) and avx512fRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_roundscale_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm512_roundscale_ ⚠ps Experimental (x86 or x86-64) and avx512fRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_roundscale_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_roundscale_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Round packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm512_roundscale_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm512_rsqrt14_ ⚠pd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_rsqrt14_ ⚠ps Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm512_rsqrt_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm512_sad_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwCompute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce eight unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in dst.
- _mm512_scalef_ ⚠pd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ ⚠ps Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.\
- _mm512_scalef_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Scale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm512_scalef_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.\
- _mm512_set1_ ⚠epi8 Experimental (x86 or x86-64) and avx512fBroadcast 8-bit integer a to all elements of dst.
- _mm512_set1_ ⚠epi16 Experimental (x86 or x86-64) and avx512fBroadcast the low packed 16-bit integer from a to all elements of dst.
- _mm512_set1_ ⚠epi32 Experimental (x86 or x86-64) and avx512fBroadcast 32-bit integerato all elements ofdst.
- _mm512_set1_ ⚠epi64 Experimental (x86 or x86-64) and avx512fBroadcast 64-bit integerato all elements ofdst.
- _mm512_set1_ ⚠pd Experimental (x86 or x86-64) and avx512fBroadcast 64-bit floatato all elements ofdst.
- _mm512_set1_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm512_set1_ ⚠ps Experimental (x86 or x86-64) and avx512fBroadcast 32-bit floatato all elements ofdst.
- _mm512_set4_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSet packed 32-bit integers in dst with the repeated 4 element sequence.
- _mm512_set4_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSet packed 64-bit integers in dst with the repeated 4 element sequence.
- _mm512_set4_ ⚠pd Experimental (x86 or x86-64) and avx512fSet packed double-precision (64-bit) floating-point elements in dst with the repeated 4 element sequence.
- _mm512_set4_ ⚠ps Experimental (x86 or x86-64) and avx512fSet packed single-precision (32-bit) floating-point elements in dst with the repeated 4 element sequence.
- _mm512_set_ ⚠epi8 Experimental (x86 or x86-64) and avx512fSet packed 8-bit integers in dst with the supplied values.
- _mm512_set_ ⚠epi16 Experimental (x86 or x86-64) and avx512fSet packed 16-bit integers in dst with the supplied values.
- _mm512_set_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSets packed 32-bit integers indstwith the supplied values.
- _mm512_set_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSet packed 64-bit integers in dst with the supplied values.
- _mm512_set_ ⚠pd Experimental (x86 or x86-64) and avx512fSet packed double-precision (64-bit) floating-point elements in dst with the supplied values.
- _mm512_set_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm512_set_ ⚠ps Experimental (x86 or x86-64) and avx512fSets packed 32-bit integers indstwith the supplied values.
- _mm512_setr4_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSet packed 32-bit integers in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr4_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSet packed 64-bit integers in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr4_ ⚠pd Experimental (x86 or x86-64) and avx512fSet packed double-precision (64-bit) floating-point elements in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr4_ ⚠ps Experimental (x86 or x86-64) and avx512fSet packed single-precision (32-bit) floating-point elements in dst with the repeated 4 element sequence in reverse order.
- _mm512_setr_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSets packed 32-bit integers indstwith the supplied values in reverse order.
- _mm512_setr_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSet packed 64-bit integers in dst with the supplied values in reverse order.
- _mm512_setr_ ⚠pd Experimental (x86 or x86-64) and avx512fSet packed double-precision (64-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm512_setr_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm512_setr_ ⚠ps Experimental (x86 or x86-64) and avx512fSets packed 32-bit integers indstwith the supplied values in reverse order.
- _mm512_setzero ⚠Experimental (x86 or x86-64) and avx512fReturn vector of type__m512with all elements set to zero.
- _mm512_setzero_ ⚠epi32 Experimental (x86 or x86-64) and avx512fReturn vector of type__m512iwith all elements set to zero.
- _mm512_setzero_ ⚠pd Experimental (x86 or x86-64) and avx512fReturns vector of type__m512dwith all elements set to zero.
- _mm512_setzero_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Return vector of type __m512h with all elements set to zero.
- _mm512_setzero_ ⚠ps Experimental (x86 or x86-64) and avx512fReturns vector of type__m512with all elements set to zero.
- _mm512_setzero_ ⚠si512 Experimental (x86 or x86-64) and avx512fReturns vector of type__m512iwith all elements set to zero.
- _mm512_shldi_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst).
- _mm512_shldi_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst.
- _mm512_shldi_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst).
- _mm512_shldv_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst.
- _mm512_shldv_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst.
- _mm512_shldv_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst.
- _mm512_shrdi_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst.
- _mm512_shrdi_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst.
- _mm512_shrdi_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst.
- _mm512_shrdv_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst.
- _mm512_shrdv_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
- _mm512_shrdv_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2Concatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst.
- _mm512_shuffle_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwShuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst.
- _mm512_shuffle_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_shuffle_ ⚠f32x4 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ ⚠f64x2 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ ⚠i32x4 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 4 32-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ ⚠i64x2 Experimental (x86 or x86-64) and avx512fShuffle 128-bits (composed of 2 64-bit integers) selected by imm8 from a and b, and store the results in dst.
- _mm512_shuffle_ ⚠pd Experimental (x86 or x86-64) and avx512fShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_shuffle_ ⚠ps Experimental (x86 or x86-64) and avx512fShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst.
- _mm512_shufflehi_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst.
- _mm512_shufflelo_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst.
- _mm512_sll_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst.
- _mm512_sll_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst.
- _mm512_sll_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst.
- _mm512_slli_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
- _mm512_slli_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
- _mm512_slli_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst.
- _mm512_sllv_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_sllv_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_sllv_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_sqrt_ ⚠pd Experimental (x86 or x86-64) and avx512fCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
- _mm512_sqrt_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm512_sqrt_ ⚠ps Experimental (x86 or x86-64) and avx512fCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
- _mm512_sqrt_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst.\
- _mm512_sqrt_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Compute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_sqrt_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst.\
- _mm512_sra_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm512_sra_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm512_sra_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm512_srai_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm512_srai_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm512_srai_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm512_srav_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm512_srav_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm512_srav_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm512_srl_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst.
- _mm512_srl_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst.
- _mm512_srl_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst.
- _mm512_srli_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
- _mm512_srli_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
- _mm512_srli_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst.
- _mm512_srlv_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_srlv_ ⚠epi32 Experimental (x86 or x86-64) and avx512fShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_srlv_ ⚠epi64 Experimental (x86 or x86-64) and avx512fShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm512_store_ ⚠epi32 Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 16 packed 32-bit integers) from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠epi64 Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 8 packed 64-bit integers) from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠pd Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 64 bytes or a general-protection exception may be generated.
- _mm512_store_ ⚠ps Experimental (x86 or x86-64) and avx512fStore 512-bits of integer data from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_store_ ⚠si512 Experimental (x86 or x86-64) and avx512fStore 512-bits of integer data from a into memory. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_storeu_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwStore 512-bits (composed of 64 packed 8-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwStore 512-bits (composed of 32 packed 16-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠epi32 Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 16 packed 32-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠epi64 Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 8 packed 64-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠pd Experimental (x86 or x86-64) and avx512fStores 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) fromainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm512_storeu_ ⚠ps Experimental (x86 or x86-64) and avx512fStores 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) fromainto memory.mem_addrdoes not need to be aligned on any particular boundary.
- _mm512_storeu_ ⚠si512 Experimental (x86 or x86-64) and avx512fStore 512-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm512_stream_ ⚠load_ si512 Experimental (x86 or x86-64) and avx512fLoad 512-bits of integer data from memory into dst using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon)
- _mm512_stream_ ⚠pd Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_stream_ ⚠ps Experimental (x86 or x86-64) and avx512fStore 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_stream_ ⚠si512 Experimental (x86 or x86-64) and avx512fStore 512-bits of integer data from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 64-byte boundary or a general-protection exception may be generated.
- _mm512_sub_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst.
- _mm512_sub_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst.
- _mm512_sub_ ⚠epi32 Experimental (x86 or x86-64) and avx512fSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst.
- _mm512_sub_ ⚠epi64 Experimental (x86 or x86-64) and avx512fSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst.
- _mm512_sub_ ⚠pd Experimental (x86 or x86-64) and avx512fSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst.
- _mm512_sub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm512_sub_ ⚠ps Experimental (x86 or x86-64) and avx512fSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst.
- _mm512_sub_ ⚠round_ pd Experimental (x86 or x86-64) and avx512fSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst.\
- _mm512_sub_ ⚠round_ ph Experimental (x86 or x86-64) and avx512fp16Subtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm512_sub_ ⚠round_ ps Experimental (x86 or x86-64) and avx512fSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst.\
- _mm512_subs_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst.
- _mm512_subs_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst.
- _mm512_subs_ ⚠epu8 Experimental (x86 or x86-64) and avx512bwSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst.
- _mm512_subs_ ⚠epu16 Experimental (x86 or x86-64) and avx512bwSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst.
- _mm512_ternarylogic_ ⚠epi32 Experimental (x86 or x86-64) and avx512fBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm512_ternarylogic_ ⚠epi64 Experimental (x86 or x86-64) and avx512fBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm512_test_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_test_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_test_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_test_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm512_testn_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_testn_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bwCompute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_testn_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_testn_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512fCompute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm512_undefined ⚠Experimental (x86 or x86-64) and avx512fReturn vector of type __m512 with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm512_undefined_ ⚠epi32 Experimental (x86 or x86-64) and avx512fReturn vector of type __m512i with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm512_undefined_ ⚠pd Experimental (x86 or x86-64) and avx512fReturns vector of type__m512dwith indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm512_undefined_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Return vector of type__m512hwith undefined elements. In practice, this returns the all-zero vector.
- _mm512_undefined_ ⚠ps Experimental (x86 or x86-64) and avx512fReturns vector of type__m512with indeterminate elements. Despite being “undefined”, this is some valid value and not equivalent tomem::MaybeUninit. In practice, this is equivalent tomem::zeroed.
- _mm512_unpackhi_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ ⚠epi32 Experimental (x86 or x86-64) and avx512fUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ ⚠epi64 Experimental (x86 or x86-64) and avx512fUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ ⚠pd Experimental (x86 or x86-64) and avx512fUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpackhi_ ⚠ps Experimental (x86 or x86-64) and avx512fUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ⚠epi8 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ⚠epi16 Experimental (x86 or x86-64) and avx512bwUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ⚠epi32 Experimental (x86 or x86-64) and avx512fUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ⚠epi64 Experimental (x86 or x86-64) and avx512fUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ⚠pd Experimental (x86 or x86-64) and avx512fUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_unpacklo_ ⚠ps Experimental (x86 or x86-64) and avx512fUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst.
- _mm512_xor_ ⚠epi32 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst.
- _mm512_xor_ ⚠epi64 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst.
- _mm512_xor_ ⚠pd Experimental (x86 or x86-64) and avx512dqCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst.
- _mm512_xor_ ⚠ps Experimental (x86 or x86-64) and avx512dqCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst.
- _mm512_xor_ ⚠si512 Experimental (x86 or x86-64) and avx512fCompute the bitwise XOR of 512 bits (representing integer data) in a and b, and store the result in dst.
- _mm512_zextpd128_ ⚠pd512 Experimental (x86 or x86-64) and avx512fCast vector of type __m128d to type __m512d; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextpd256_ ⚠pd512 Experimental (x86 or x86-64) and avx512fCast vector of type __m256d to type __m512d; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextph128_ ⚠ph512 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128hto type__m512h. The upper 24 elements of the result are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_zextph256_ ⚠ph512 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m256hto type__m512h. The upper 16 elements of the result are zeroed. This intrinsic can generate thevzeroupperinstruction, but most of the time it does not generate any instructions.
- _mm512_zextps128_ ⚠ps512 Experimental (x86 or x86-64) and avx512fCast vector of type __m128 to type __m512; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextps256_ ⚠ps512 Experimental (x86 or x86-64) and avx512fCast vector of type __m256 to type __m512; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextsi128_ ⚠si512 Experimental (x86 or x86-64) and avx512fCast vector of type __m128i to type __m512i; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm512_zextsi256_ ⚠si512 Experimental (x86 or x86-64) and avx512fCast vector of type __m256i to type __m512i; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_abs_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst.
- _mm_abs_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlFinds the absolute value of each packed half-precision (16-bit) floating-point element in v2, storing the results in dst.
- _mm_add_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlAdd packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_add_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fAdd the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_add_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_add_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fAdd the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_add_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_alignr_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 32-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 16 bytes (4 elements) in dst.
- _mm_alignr_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 32-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 16 bytes (2 elements) in dst.
- _mm_bcstnebf16_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert scalar BF16 (16-bit) floating point element stored at memory locations starting at location a to single precision (32-bit) floating-point, broadcast it to packed single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_bcstnesh_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location a to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_bitshuffle_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512bitalg,avx512vlConsiders the inputbas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm_broadcast_ ⚠i32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst.
- _mm_broadcastmb_ ⚠epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlBroadcast the low 8-bits from input mask k to all 64-bit elements of dst.
- _mm_broadcastmw_ ⚠epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlBroadcast the low 16-bits from input mask k to all 32-bit elements of dst.
- _mm_castpd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128dto type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castph_ ⚠pd Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128hto type__m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castph_ ⚠ps Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128hto type__m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castph_ ⚠si128 Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128hto type__m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castps_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128to type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_castsi128_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Cast vector of type__m128ito type__m128h. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
- _mm_cmp_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k.
- _mm_cmp_ ⚠round_ sd_ mask Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cmp_ ⚠round_ sh_ mask Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cmp_ ⚠round_ ss_ mask Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cmp_ ⚠sd_ mask Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_cmp_ ⚠sh_ mask Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_cmp_ ⚠ss_ mask Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k.
- _mm_cmpeq_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpeq_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k.
- _mm_cmpge_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpge_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmpgt_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k.
- _mm_cmple_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmple_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k.
- _mm_cmplt_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmplt_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmpneq_ ⚠epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k.
- _mm_cmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_cmul_ ⚠round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1],
- _mm_cmul_ ⚠sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1],
- _mm_comi_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_comi_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1). Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_comi_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_comi_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and return the boolean result (0 or 1).
- _mm_comieq_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1).
- _mm_comige_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
- _mm_comigt_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1).
- _mm_comile_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1).
- _mm_comilt_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1).
- _mm_comineq_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1).
- _mm_conflict_ ⚠epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm_conflict_ ⚠epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit. Each element’s comparison forms a zero extended bit vector in dst.
- _mm_conj_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the complex conjugates of complex numbers in a, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_cvt_ ⚠roundi32_ sh Experimental (x86 or x86-64) and avx512fp16Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundi32_ ss Experimental (x86 or x86-64) and avx512fConvert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_cvt_ ⚠roundi64_ sd Experimental avx512fConvert the signed 64-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundi64_ sh Experimental avx512fp16Convert the signed 64-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundi64_ ss Experimental avx512fConvert the signed 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ i32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ i64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundsd_ si32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ si64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ ss Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ u32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsd_ u64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 64-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsh_ i32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvt_ ⚠roundsh_ i64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
- _mm_cvt_ ⚠roundsh_ sd Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvt_ ⚠roundsh_ ss Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundsh_ u32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_cvt_ ⚠roundsh_ u64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit unsigned integer, and store the result in dst.
- _mm_cvt_ ⚠roundsi32_ ss Experimental (x86 or x86-64) and avx512fConvert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_cvt_ ⚠roundsi64_ sd Experimental avx512fConvert the signed 64-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundsi64_ ss Experimental avx512fConvert the signed 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundss_ i32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundss_ i64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundss_ sd Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvt_ ⚠roundss_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundss_ si32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundss_ si64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundss_ u32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundss_ u64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 64-bit integer, and store the result in dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundu32_ sh Experimental (x86 or x86-64) and avx512fp16Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundu32_ ss Experimental (x86 or x86-64) and avx512fConvert the unsigned 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundu64_ sd Experimental avx512fConvert the unsigned 64-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvt_ ⚠roundu64_ sh Experimental avx512fp16Convert the unsigned 64-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 1 packed elements from a to the upper elements of dst.
- _mm_cvt_ ⚠roundu64_ ss Experimental avx512fConvert the unsigned 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_cvtepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm_cvtepi16_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm_cvtepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_cvtepi32_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_cvtepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst.
- _mm_cvtepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_cvtepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm_cvtepi64_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtepi64_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_cvtepi64_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu16_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu32_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu32_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 64 bits of dst are zeroed out.
- _mm_cvtepu64_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtepu64_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_cvtepu64_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvti32_ ⚠sd Experimental (x86 or x86-64) and avx512fConvert the signed 32-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvti32_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Convert the signed 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvti32_ ⚠ss Experimental (x86 or x86-64) and avx512fConvert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvti64_ ⚠sd Experimental avx512fConvert the signed 64-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvti64_ ⚠sh Experimental avx512fp16Convert the signed 64-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvti64_ ⚠ss Experimental avx512fConvert the signed 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtne2ps_ ⚠pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in two 128-bit vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in a 128-bit wide vector. Intel’s documentation
- _mm_cvtneebf16_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneeph_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneobf16_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneoph_ ⚠ps Experimental (x86 or x86-64) and avxneconvertConvert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location a to single precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtneps_ ⚠avx_ pbh Experimental (x86 or x86-64) and avxneconvertConvert packed single precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtneps_ ⚠pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.
- _mm_cvtness_ ⚠sbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts a single-precision (32-bit) floating-point element in a to a BF16 (16-bit) floating-point element, and store the result in dst.
- _mm_cvtpbh_ ⚠ps Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtpd_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm_cvtpd_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm_cvtpd_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm_cvtpd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst. The upper 96 bits of dst are zeroed out.
- _mm_cvtph_ ⚠epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst.
- _mm_cvtph_ ⚠epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_cvtph_ ⚠epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst.
- _mm_cvtph_ ⚠epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst.
- _mm_cvtph_ ⚠epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst.
- _mm_cvtph_ ⚠epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst.
- _mm_cvtph_ ⚠pd Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst.
- _mm_cvtps_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst.
- _mm_cvtps_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst.
- _mm_cvtps_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst.
- _mm_cvtsbh_ ⚠ss Experimental (x86 or x86-64) and avx512bf16,avx512fConverts a single BF16 (16-bit) floating-point element in a to a single-precision (32-bit) floating-point element, and store the result in dst.
- _mm_cvtsd_ ⚠i32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvtsd_ ⚠i64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
- _mm_cvtsd_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtsd_ ⚠u32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
- _mm_cvtsd_ ⚠u64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 64-bit integer, and store the result in dst.
- _mm_cvtsepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst.
- _mm_cvtsh_ ⚠h Experimental (x86 or x86-64) and avx512fp16Copy the lower half-precision (16-bit) floating-point element fromatodst.
- _mm_cvtsh_ ⚠i32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvtsh_ ⚠i64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
- _mm_cvtsh_ ⚠sd Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvtsh_ ⚠ss Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtsh_ ⚠u32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer, and store the result in dst.
- _mm_cvtsh_ ⚠u64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit unsigned integer, and store the result in dst.
- _mm_cvtsi16_ ⚠si128 Experimental (x86 or x86-64) and avx512fp16Copy 16-bit integer a to the lower elements of dst, and zero the upper elements of dst.
- _mm_cvtsi128_ ⚠si16 Experimental (x86 or x86-64) and avx512fp16Copy the lower 16-bit integer in a to dst.
- _mm_cvtss_ ⚠i32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer, and store the result in dst.
- _mm_cvtss_ ⚠i64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer, and store the result in dst.
- _mm_cvtss_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtss_ ⚠u32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer, and store the result in dst.
- _mm_cvtss_ ⚠u64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 64-bit integer, and store the result in dst.
- _mm_cvtt_ ⚠roundsd_ i32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundsd_ i64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundsd_ si32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundsd_ si64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundsd_ u32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundsd_ u64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 64-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundsh_ i32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvtt_ ⚠roundsh_ i64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
- _mm_cvtt_ ⚠roundsh_ u32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_cvtt_ ⚠roundsh_ u64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit unsigned integer with truncation, and store the result in dst.
- _mm_cvtt_ ⚠roundss_ i32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundss_ i64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundss_ si32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundss_ si64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundss_ u32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvtt_ ⚠roundss_ u64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 64-bit integer with truncation, and store the result in dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_cvttpd_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm_cvttpd_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm_cvttpd_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm_cvttph_ ⚠epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ ⚠epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ ⚠epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ ⚠epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst.
- _mm_cvttph_ ⚠epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst.
- _mm_cvttph_ ⚠epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst.
- _mm_cvttps_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst.
- _mm_cvttps_ ⚠epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst.
- _mm_cvttps_ ⚠epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst.
- _mm_cvttsd_ ⚠i32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvttsd_ ⚠i64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
- _mm_cvttsd_ ⚠u32 Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
- _mm_cvttsd_ ⚠u64 Experimental avx512fConvert the lower double-precision (64-bit) floating-point element in a to an unsigned 64-bit integer with truncation, and store the result in dst.
- _mm_cvttsh_ ⚠i32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvttsh_ ⚠i64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
- _mm_cvttsh_ ⚠u32 Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 32-bit unsigned integer with truncation, and store the result in dst.
- _mm_cvttsh_ ⚠u64 Experimental avx512fp16Convert the lower half-precision (16-bit) floating-point element in a to a 64-bit unsigned integer with truncation, and store the result in dst.
- _mm_cvttss_ ⚠i32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation, and store the result in dst.
- _mm_cvttss_ ⚠i64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer with truncation, and store the result in dst.
- _mm_cvttss_ ⚠u32 Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 32-bit integer with truncation, and store the result in dst.
- _mm_cvttss_ ⚠u64 Experimental avx512fConvert the lower single-precision (32-bit) floating-point element in a to an unsigned 64-bit integer with truncation, and store the result in dst.
- _mm_cvtu32_ ⚠sd Experimental (x86 or x86-64) and avx512fConvert the unsigned 32-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvtu32_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Convert the unsigned 32-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_cvtu32_ ⚠ss Experimental (x86 or x86-64) and avx512fConvert the unsigned 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtu64_ ⚠sd Experimental avx512fConvert the unsigned 64-bit integer b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_cvtu64_ ⚠sh Experimental avx512fp16Convert the unsigned 64-bit integer b to a half-precision (16-bit) floating-point element, store the result in the lower element of dst, and copy the upper 1 packed elements from a to the upper elements of dst.
- _mm_cvtu64_ ⚠ss Experimental avx512fConvert the unsigned 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_cvtusepi16_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi32_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi32_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi64_ ⚠epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi64_ ⚠epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtusepi64_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst.
- _mm_cvtxph_ ⚠ps Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst.
- _mm_cvtxps_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst.
- _mm_dbsad_ ⚠epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst. Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm_div_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlDivide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst.
- _mm_div_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fDivide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_div_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_div_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fDivide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_div_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_dpbf16_ ⚠ps Experimental (x86 or x86-64) and avx512bf16,avx512vlCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst. Intel’s documentation
- _mm_dpbssd_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbssds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbsud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbsuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbusd_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbusd_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbusds_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbusds_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpbuud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpbuuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint8Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwssd_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwssd_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwssds_ ⚠avx_ epi32 Experimental (x86 or x86-64) and avxvnniMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwssds_ ⚠epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwsud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwsuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwusd_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwusds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding signed 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_dpwuud_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.
- _mm_dpwuuds_ ⚠epi32 Experimental (x86 or x86-64) and avxvnniint16Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in a with corresponding unsigned 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src with signed saturation, and store the packed 32-bit results in dst.
- _mm_fcmadd_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmadd_ ⚠round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmadd_ ⚠sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fcmul_ ⚠round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1],
- _mm_fcmul_ ⚠sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_fixupimm_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm_fixupimm_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst. imm8 is used to set the required flags reporting.
- _mm_fixupimm_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fFix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_fixupimm_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fFix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_fixupimm_ ⚠sd Experimental (x86 or x86-64) and avx512fFix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
- _mm_fixupimm_ ⚠ss Experimental (x86 or x86-64) and avx512fFix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
- _mm_fmadd_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst.
- _mm_fmadd_ ⚠round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmadd_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fmadd_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmadd_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fmadd_ ⚠sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmadd_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmaddsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_fmsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst.
- _mm_fmsub_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fmsub_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmsub_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fmsub_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fmsubadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst.
- _mm_fmul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmul_ ⚠round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fmul_ ⚠sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_fnmadd_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst.
- _mm_fnmadd_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fnmadd_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fnmadd_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fnmadd_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fnmsub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst.
- _mm_fnmsub_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_fnmsub_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fnmsub_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, subtract the lower element in c from the negated intermediate result, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_fnmsub_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_fpclass_ ⚠pd_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ ⚠ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlTest packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ ⚠ps_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ ⚠sd_ mask Experimental (x86 or x86-64) and avx512dqTest the lower double-precision (64-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_fpclass_ ⚠sh_ mask Experimental (x86 or x86-64) and avx512fp16Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k. imm can be a combination of:
- _mm_fpclass_ ⚠ss_ mask Experimental (x86 or x86-64) and avx512dqTest the lower single-precision (32-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k. imm can be a combination of:
- _mm_getexp_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_getexp_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm_getexp_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst. This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_getexp_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getexp_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculatesfloor(log2(x))for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_getexp_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getexp_ ⚠sd Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_getexp_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculatesfloor(log2(x))for the lower element.
- _mm_getexp_ ⚠ss Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_getmant_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_getmant_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlNormalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_getmant_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign. The mantissa is normalized to the interval specified by interv, which can take the following values: _MM_MANT_NORM_1_2 // interval [1, 2) _MM_MANT_NORM_p5_2 // interval [0.5, 2) _MM_MANT_NORM_p5_1 // interval [0.5, 1) _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5) The sign is determined by sc which can take the following values: _MM_MANT_SIGN_src // sign = sign(src) _MM_MANT_SIGN_zero // sign = 0 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_getmant_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getmant_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_getmant_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getmant_ ⚠sd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_getmant_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_getmant_ ⚠ss Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_gf2p8affine_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and gfniPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_gf2p8affineinv_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and gfniPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_gf2p8mul_ ⚠epi8 Experimental (x86 or x86-64) and gfniPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm_i32scatter_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i32scatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i32scatter_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i32scatter_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_i64scatter_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale
- _mm_load_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 128-bits (composed of 4 packed 32-bit integers) from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_load_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 128-bits (composed of 2 packed 64-bit integers) from memory into dst. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_load_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlLoad 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_load_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector, and zero the upper elements
- _mm_loadu_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad 128-bits (composed of 16 packed 8-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad 128-bits (composed of 8 packed 16-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 128-bits (composed of 4 packed 32-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad 128-bits (composed of 2 packed 64-bit integers) from memory into dst. mem_addr does not need to be aligned on any particular boundary.
- _mm_loadu_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlLoad 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into a new vector. The address does not need to be aligned to any particular boundary.
- _mm_lzcnt_ ⚠epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst.
- _mm_lzcnt_ ⚠epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst.
- _mm_madd52hi_ ⚠avx_ epu64 Experimental (x86 or x86-64) and avxifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd52hi_ ⚠epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd52lo_ ⚠avx_ epu64 Experimental (x86 or x86-64) and avxifmaMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_madd52lo_ ⚠epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indst.
- _mm_mask2_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask2_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask2_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask2_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask2_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set)
- _mm_mask2_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from idx when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask3_ ⚠fcmadd_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask3_ ⚠fcmadd_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from c when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask3_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask3_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmadd_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask3_ ⚠fmadd_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ ⚠fmadd_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fmadd_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ ⚠fmadd_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from c when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask3_ ⚠fmadd_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ ⚠fmadd_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fmadd_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmsub_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ ⚠fmsub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fmsub_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ ⚠fmsub_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ ⚠fmsub_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fmsub_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmadd_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ ⚠fnmadd_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fnmadd_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ ⚠fnmadd_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ ⚠fnmadd_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fnmadd_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from c when the corresponding mask bit is not set).
- _mm_mask3_ ⚠fnmsub_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.\
- _mm_mask3_ ⚠fnmsub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fnmsub_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.\
- _mm_mask3_ ⚠fnmsub_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from c to the upper element of dst.
- _mm_mask3_ ⚠fnmsub_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from c when the mask bit 0 is not set), and copy the upper 7 packed elements from c to the upper elements of dst.
- _mm_mask3_ ⚠fnmsub_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from c to the upper elements of dst.
- _mm_mask_ ⚠abs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set)
- _mm_mask_ ⚠abs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠abs_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠abs_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ pd Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlAdd packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ ps Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠add_ round_ sd Experimental (x86 or x86-64) and avx512fAdd the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠add_ round_ sh Experimental (x86 or x86-64) and avx512fp16Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ ⚠add_ round_ ss Experimental (x86 or x86-64) and avx512fAdd the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠add_ sd Experimental (x86 or x86-64) and avx512fAdd the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠add_ sh Experimental (x86 or x86-64) and avx512fp16Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ ⚠add_ ss Experimental (x86 or x86-64) and avx512fAdd the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠adds_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠adds_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠adds_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠adds_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠alignr_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠alignr_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 32-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 16 bytes (4 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠alignr_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 32-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 16 bytes (2 elements) in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlPerforms element-by-element bitwise AND between packed 32-bit integer elements of a and b, storing the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠and_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠and_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠andnot_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠andnot_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠andnot_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠andnot_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠avg_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 8-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠avg_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 16-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠bitshuffle_ epi64_ mask Experimental (x86 or x86-64) and avx512bitalg,avx512vlConsiders the inputbas packed 64-bit integers andcas packed 8-bit integers. Then groups 8 8-bit values fromcas indices into the bits of the corresponding 64-bit integer. It then selects these bits and packs them into the output.
- _mm_mask_ ⚠blend_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBlend packed 8-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠blend_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBlend packed 16-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠blend_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed 32-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠blend_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed 64-bit integers from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠blend_ pd Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed double-precision (64-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠blend_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlBlend packed half-precision (16-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠blend_ ps Experimental (x86 or x86-64) and avx512f,avx512vlBlend packed single-precision (32-bit) floating-point elements from a and b using control mask k, and store the results in dst.
- _mm_mask_ ⚠broadcast_ i32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠broadcastb_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 8-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠broadcastd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 32-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠broadcastq_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 64-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠broadcastss_ ps Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠broadcastw_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 16-bit integer from a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ pd_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ ps_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmp_ round_ sd_ mask Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠cmp_ round_ sh_ mask Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠cmp_ round_ ss_ mask Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not seti).
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠cmp_ sd_ mask Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
- _mm_mask_ ⚠cmp_ sh_ mask Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1.
- _mm_mask_ ⚠cmp_ ss_ mask Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point element in a and b based on the comparison operand specified by imm8, and store the result in mask vector k using zeromask k1 (the element is zeroed out when mask bit 0 is not set).
- _mm_mask_ ⚠cmpeq_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpeq_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for equality, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpge_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpgt_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for greater-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmple_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than-or-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmplt_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for less-than, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epu8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epu16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epu32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmpneq_ epu64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b for not-equal, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set).
- _mm_mask_ ⚠cmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠cmul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠cmul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1],
- _mm_mask_ ⚠compress_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compress_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compress_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compress_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compress_ pd Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compress_ ps Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.
- _mm_mask_ ⚠compressstoreu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 8-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 16-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 32-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 64-bit integers in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠compressstoreu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠conflict_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_mask_ ⚠conflict_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit using writemask k (elements are copied from src when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_mask_ ⚠conj_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the complex conjugates of complex numbers in a, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠cvt_ roundps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠cvt_ roundsd_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvt_ roundsd_ ss Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_mask_ ⚠cvt_ roundsh_ sd Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠cvt_ roundsh_ ss Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvt_ roundss_ sd Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠cvt_ roundss_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvtepi8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_mask_ ⚠cvtepi32_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepi64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtepi64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_mask_ ⚠cvtepi64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtepu8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 4 bytes of a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtepu32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_mask_ ⚠cvtepu64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtepu64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_mask_ ⚠cvtepu64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtne2ps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm_mask_ ⚠cvtneps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtpbh_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtpd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_mask_ ⚠cvtpd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ pd Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtph_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠cvtsd_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvtsd_ ss Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvtsepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtsepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtsh_ sd Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠cvtsh_ ss Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src to dst when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvtss_ sd Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠cvtss_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using writemask k (the element if copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠cvttpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvttpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvttph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvttps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvttps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠cvtusepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi16_ storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi32_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi32_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtusepi64_ storeu_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi64_ storeu_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtusepi64_ storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask k) to unaligned memory at base_addr.
- _mm_mask_ ⚠cvtxph_ ps Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set).
- _mm_mask_ ⚠cvtxps_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using writemask k (elements are copied from src to dst when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_mask_ ⚠dbsad_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm_mask_ ⚠div_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠div_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlDivide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠div_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠div_ round_ sd Experimental (x86 or x86-64) and avx512fDivide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠div_ round_ sh Experimental (x86 or x86-64) and avx512fp16Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ ⚠div_ round_ ss Experimental (x86 or x86-64) and avx512fDivide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠div_ sd Experimental (x86 or x86-64) and avx512fDivide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠div_ sh Experimental (x86 or x86-64) and avx512fp16Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ ⚠div_ ss Experimental (x86 or x86-64) and avx512fDivide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠dpbf16_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Intel’s documentation
- _mm_mask_ ⚠dpbusd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠dpbusds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠dpwssd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠dpwssds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expand_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expand_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expand_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expand_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expand_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expand_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠expandloadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠fcmadd_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠fcmadd_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (the element is copied from a when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠fcmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠fcmul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠fcmul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_mask_ ⚠fixupimm_ pd Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_mask_ ⚠fixupimm_ ps Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_mask_ ⚠fixupimm_ round_ sd Experimental (x86 or x86-64) and avx512fFix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠fixupimm_ round_ ss Experimental (x86 or x86-64) and avx512fFix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠fixupimm_ sd Experimental (x86 or x86-64) and avx512fFix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
- _mm_mask_ ⚠fixupimm_ ss Experimental (x86 or x86-64) and avx512fFix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
- _mm_mask_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmadd_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠fmadd_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠fmadd_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fmadd_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠fmadd_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using writemask k (elements are copied from a when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠fmadd_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠fmadd_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fmadd_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmsub_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠fmsub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fmsub_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠fmsub_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠fmsub_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fmsub_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠fmul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠fmul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fnmadd_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠fnmadd_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fnmadd_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠fnmadd_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠fnmadd_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fnmadd_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (the element is copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠fnmsub_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠fnmsub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fnmsub_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠fnmsub_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠fnmsub_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using writemask k (the element is copied from a when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fnmsub_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using writemask k (the element is copied from c when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠fpclass_ pd_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed double-precision (64-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ ⚠fpclass_ ph_ mask Experimental (x86 or x86-64) and avx512fp16,avx512vlTest packed half-precision (16-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ ⚠fpclass_ ps_ mask Experimental (x86 or x86-64) and avx512dq,avx512vlTest packed single-precision (32-bit) floating-point elements in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ ⚠fpclass_ sd_ mask Experimental (x86 or x86-64) and avx512dqTest the lower double-precision (64-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ ⚠fpclass_ sh_ mask Experimental (x86 or x86-64) and avx512fp16Test the lower half-precision (16-bit) floating-point element in a for special categories specified by imm8, and store the result in mask vector k using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ ⚠fpclass_ ss_ mask Experimental (x86 or x86-64) and avx512dqTest the lower single-precision (32-bit) floating-point element in a for special categories specified by imm8, and store the results in mask vector k using zeromask k1 (elements are zeroed out when the corresponding mask bit is not set). imm can be a combination of:
- _mm_mask_ ⚠getexp_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_mask_ ⚠getexp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm_mask_ ⚠getexp_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_mask_ ⚠getexp_ round_ sd Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠getexp_ round_ sh Experimental (x86 or x86-64) and avx512fp16Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculatesfloor(log2(x))for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_mask_ ⚠getexp_ round_ ss Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠getexp_ sd Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_mask_ ⚠getexp_ sh Experimental (x86 or x86-64) and avx512fp16Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculatesfloor(log2(x))for the lower element.
- _mm_mask_ ⚠getexp_ ss Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_mask_ ⚠getmant_ pd Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_mask_ ⚠getmant_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlNormalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_mask_ ⚠getmant_ ps Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_mask_ ⚠getmant_ round_ sd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠getmant_ round_ sh Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_mask_ ⚠getmant_ round_ ss Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠getmant_ sd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠getmant_ sh Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_mask_ ⚠getmant_ ss Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠gf2p8affine_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_mask_ ⚠gf2p8affineinv_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_mask_ ⚠gf2p8mul_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm_mask_ ⚠i32scatter_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 32-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i32scatter_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 64-bit integer elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i32scatter_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i32scatter_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 4 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 32-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 64-bit integer elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 double-precision (64-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding mask bit is not set are not written to memory).
- _mm_mask_ ⚠i64scatter_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStores 2 single-precision (32-bit) floating-point elements from a to memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements whose corresponding
- _mm_mask_ ⚠load_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ sd Experimental (x86 or x86-64) and avx512fLoad a double-precision (64-bit) floating-point element from memory into the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and set the upper element of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠load_ sh Experimental (x86 or x86-64) and avx512fp16Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using writemask k (the element is copied from src when mask bit 0 is not set), and zero the upper elements.
- _mm_mask_ ⚠load_ ss Experimental (x86 or x86-64) and avx512fLoad a single-precision (32-bit) floating-point element from memory into the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and set the upper 3 packed elements of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠loadu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 8-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 16-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠loadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using writemask k (elements are copied from src when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠lzcnt_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠lzcnt_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠madd52hi_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm_mask_ ⚠madd52lo_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are copied fromkwhen the corresponding mask bit is not set).
- _mm_mask_ ⚠madd_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠maddubs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_mask_ ⚠max_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠max_ round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠max_ round_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_mask_ ⚠max_ round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠max_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠max_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_mask_ ⚠max_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mask_ ⚠min_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠min_ round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠min_ round_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mask_ ⚠min_ round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠min_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠min_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mask_ ⚠min_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠mov_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 8-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mov_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 16-bit integers from a into dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mov_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 32-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mov_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 64-bit integers from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mov_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMove packed double-precision (64-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mov_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMove packed single-precision (32-bit) floating-point elements from a to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠move_ sd Experimental (x86 or x86-64) and avx512fMove the lower double-precision (64-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠move_ sh Experimental (x86 or x86-64) and avx512fp16Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠move_ ss Experimental (x86 or x86-64) and avx512fMove the lower single-precision (32-bit) floating-point element from b to the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠movedup_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠movehdup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠moveldup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mul_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using writemask k (the element is copied from src when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠mul_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠mul_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ ⚠mul_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠mul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mask_ ⚠mul_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠mul_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ ⚠mul_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠mulhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mulhi_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mulhrs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mullo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mullo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠mullo_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing writemaskk(elements are copied fromsrcif the corresponding bit is not set).
- _mm_mask_ ⚠multishift_ epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠or_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠or_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠packs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠packs_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠packus_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠packus_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠permute_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠permute_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutevar_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutevar_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutexvar_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠permutexvar_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠popcnt_ epi8 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ ⚠popcnt_ epi16 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ ⚠popcnt_ epi32 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ ⚠popcnt_ epi64 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm_mask_ ⚠range_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ ⚠range_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ ⚠range_ round_ sd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠range_ round_ ss Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_mask_ ⚠range_ sd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ ⚠range_ ss Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_mask_ ⚠rcp14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rcp14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rcp14_ sd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rcp14_ ss Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rcp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indstusing writemaskk(elements are copied fromsrcwhen the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_mask_ ⚠rcp_ sh Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_mask_ ⚠reduce_ add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm_mask_ ⚠reduce_ add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by addition using mask k. Returns the sum of all active elements in a.
- _mm_mask_ ⚠reduce_ and_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm_mask_ ⚠reduce_ and_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise AND using mask k. Returns the bitwise AND of all active elements in a.
- _mm_mask_ ⚠reduce_ max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ ⚠reduce_ max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ ⚠reduce_ max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ ⚠reduce_ max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by maximum using mask k. Returns the maximum of all active elements in a.
- _mm_mask_ ⚠reduce_ min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ ⚠reduce_ min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ ⚠reduce_ min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ ⚠reduce_ min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by minimum using mask k. Returns the minimum of all active elements in a.
- _mm_mask_ ⚠reduce_ mul_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm_mask_ ⚠reduce_ mul_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by multiplication using mask k. Returns the product of all active elements in a.
- _mm_mask_ ⚠reduce_ or_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm_mask_ ⚠reduce_ or_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise OR using mask k. Returns the bitwise OR of all active elements in a.
- _mm_mask_ ⚠reduce_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ ⚠reduce_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlExtract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠reduce_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using writemask k (elements are copied from src to dst if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ ⚠reduce_ round_ sd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ ⚠reduce_ round_ sh Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠reduce_ round_ ss Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ ⚠reduce_ sd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ ⚠reduce_ sh Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠reduce_ ss Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_mask_ ⚠rol_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠rol_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠rolv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠rolv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠ror_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠ror_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠rorv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠rorv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠roundscale_ pd Experimental (x86 or x86-64) and avx512f,avx512vlRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠roundscale_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlRound packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠roundscale_ ps Experimental (x86 or x86-64) and avx512f,avx512vlRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠roundscale_ round_ sd Experimental (x86 or x86-64) and avx512fRound the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠roundscale_ round_ sh Experimental (x86 or x86-64) and avx512fp16Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠roundscale_ round_ ss Experimental (x86 or x86-64) and avx512fRound the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠roundscale_ sd Experimental (x86 or x86-64) and avx512fRound the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠roundscale_ sh Experimental (x86 or x86-64) and avx512fp16Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠roundscale_ ss Experimental (x86 or x86-64) and avx512fRound the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_mask_ ⚠rsqrt14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rsqrt14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rsqrt14_ sd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rsqrt14_ ss Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_mask_ ⚠rsqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_mask_ ⚠rsqrt_ sh Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_mask_ ⚠scalef_ pd Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠scalef_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlScale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠scalef_ ps Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠scalef_ round_ sd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠scalef_ round_ sh Experimental (x86 or x86-64) and avx512fp16Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠scalef_ round_ ss Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠scalef_ sd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠scalef_ sh Experimental (x86 or x86-64) and avx512fp16Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠scalef_ ss Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠set1_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast 8-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠set1_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast 16-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠set1_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 32-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠set1_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 64-bit integer a to all elements of dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shldi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shldi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shldi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shldv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠shldv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠shldv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠shrdi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shrdi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shrdi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using writemask k (elements are copied from src“ when the corresponding mask bit is not set).
- _mm_mask_ ⚠shrdv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠shrdv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠shrdv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using writemask k (elements are copied from a when the corresponding mask bit is not set).
- _mm_mask_ ⚠shuffle_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 8-bit integers in a within 128-bit lanes using the control in the corresponding 8-bit element of b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shuffle_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shuffle_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shuffle_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shufflehi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠shufflelo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sll_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sll_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sll_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠slli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠slli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠slli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sllv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sllv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sllv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sqrt_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sqrt_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sqrt_ round_ sd Experimental (x86 or x86-64) and avx512fCompute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠sqrt_ round_ sh Experimental (x86 or x86-64) and avx512fp16Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ ⚠sqrt_ round_ ss Experimental (x86 or x86-64) and avx512fCompute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠sqrt_ sd Experimental (x86 or x86-64) and avx512fCompute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠sqrt_ sh Experimental (x86 or x86-64) and avx512fp16Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠sqrt_ ss Experimental (x86 or x86-64) and avx512fCompute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠sra_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sra_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sra_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srai_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srai_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srai_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srav_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srav_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srav_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srl_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srl_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srl_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srlv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srlv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠srlv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠store_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 32-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 64-bit integers from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStore packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStore packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ sd Experimental (x86 or x86-64) and avx512fStore a double-precision (64-bit) floating-point element from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠store_ sh Experimental (x86 or x86-64) and avx512fp16Store the lower half-precision (16-bit) floating-point element from a into memory using writemask k
- _mm_mask_ ⚠store_ ss Experimental (x86 or x86-64) and avx512fStore a single-precision (32-bit) floating-point element from a into memory using writemask k. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_mask_ ⚠storeu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlStore packed 8-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlStore packed 16-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 32-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore packed 64-bit integers from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlStore packed double-precision (64-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠storeu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlStore packed single-precision (32-bit) floating-point elements from a into memory using writemask k. mem_addr does not need to be aligned on any particular boundary.
- _mm_mask_ ⚠sub_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlSubtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠sub_ round_ sd Experimental (x86 or x86-64) and avx512fSubtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_mask_ ⚠sub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_mask_ ⚠sub_ round_ ss Experimental (x86 or x86-64) and avx512fSubtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mask_ ⚠sub_ sd Experimental (x86 or x86-64) and avx512fSubtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_mask_ ⚠sub_ sh Experimental (x86 or x86-64) and avx512fp16Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using writemask k (the element is copied from src when mask bit 0 is not set).
- _mm_mask_ ⚠sub_ ss Experimental (x86 or x86-64) and avx512fSubtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_mask_ ⚠subs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠subs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠subs_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠subs_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠ternarylogic_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 32-bit granularity (32-bit elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠ternarylogic_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from src, a, and b are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using writemask k at 64-bit granularity (64-bit elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠test_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ ⚠test_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ ⚠test_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ ⚠test_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is non-zero.
- _mm_mask_ ⚠testn_ epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ ⚠testn_ epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ ⚠testn_ epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ ⚠testn_ epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k (subject to writemask k) if the intermediate value is zero.
- _mm_mask_ ⚠unpackhi_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpackhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpackhi_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpackhi_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpackhi_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpackhi_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpacklo_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpacklo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpacklo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpacklo_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpacklo_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠unpacklo_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠xor_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠xor_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mask_ ⚠xor_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_mask_ ⚠xor_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using writemask k (elements are copied from src if the corresponding bit is not set).
- _mm_maskz_ ⚠abs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 8-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠abs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the absolute value of packed signed 16-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠abs_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 32-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠abs_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the absolute value of packed signed 64-bit integers in a, and store the unsigned results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ pd Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlAdd packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ ps Experimental (x86 or x86-64) and avx512f,avx512vlAdd packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠add_ round_ sd Experimental (x86 or x86-64) and avx512fAdd the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠add_ round_ sh Experimental (x86 or x86-64) and avx512fp16Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ ⚠add_ round_ ss Experimental (x86 or x86-64) and avx512fAdd the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠add_ sd Experimental (x86 or x86-64) and avx512fAdd the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠add_ sh Experimental (x86 or x86-64) and avx512fp16Add the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ ⚠add_ ss Experimental (x86 or x86-64) and avx512fAdd the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠adds_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠adds_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed signed 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠adds_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 8-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠adds_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAdd packed unsigned 16-bit integers in a and b using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠alignr_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConcatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and store the low 16 bytes in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠alignr_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 32-byte immediate result, shift the result right by imm8 32-bit elements, and store the low 16 bytes (4 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠alignr_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlConcatenate a and b into a 32-byte immediate result, shift the result right by imm8 64-bit elements, and store the low 16 bytes (2 elements) in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠and_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠and_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠and_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠and_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise AND of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠andnot_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 32-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠andnot_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NOT of packed 64-bit integers in a and then AND with b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠andnot_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed double-precision (64-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠andnot_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise NOT of packed single-precision (32-bit) floating point numbers in a and then bitwise AND with b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠avg_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 8-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠avg_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlAverage packed unsigned 16-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠broadcast_ i32x2 Experimental (x86 or x86-64) and avx512dq,avx512vlBroadcasts the lower 2 packed 32-bit integers from a to all elements of dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠broadcastb_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 8-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠broadcastd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 32-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠broadcastq_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low packed 64-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠broadcastss_ ps Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast the low single-precision (32-bit) floating-point element from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠broadcastw_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠cmul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠cmul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1],
- _mm_maskz_ ⚠compress_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 8-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ ⚠compress_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlContiguously store the active 16-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ ⚠compress_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 32-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ ⚠compress_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active 64-bit integers in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ ⚠compress_ pd Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active double-precision (64-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ ⚠compress_ ps Experimental (x86 or x86-64) and avx512f,avx512vlContiguously store the active single-precision (32-bit) floating-point elements in a (those with their respective bit set in zeromask k) to dst, and set the remaining elements to zero.
- _mm_maskz_ ⚠conflict_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 32-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_maskz_ ⚠conflict_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlTest each 64-bit element of a for equality with all other elements in a closer to the least significant bit using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Each element’s comparison forms a zero extended bit vector in dst.
- _mm_maskz_ ⚠conj_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the complex conjugates of complex numbers in a, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠cvt_ roundps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠cvt_ roundsd_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvt_ roundsd_ ss Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the rounding[3:0] parameter, which can be one of:\
- _mm_maskz_ ⚠cvt_ roundsh_ sd Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠cvt_ roundsh_ ss Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvt_ roundss_ sd Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠cvt_ roundss_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvtepi8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSign extend packed 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed 16-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 16-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 32-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSign extend packed 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_maskz_ ⚠cvtepi32_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 8-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed 64-bit integers in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepi64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtepi64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed signed 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_maskz_ ⚠cvtepi64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed signed 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtepu8_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlZero extend packed unsigned 8-bit integers in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu8_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in th elow 4 bytes of a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu8_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 8-bit integers in the low 2 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu16_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu16_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 16-bit integers in the low 4 bytes of a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu16_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 16-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu32_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlZero extend packed unsigned 32-bit integers in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu32_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtepu32_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 32-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_maskz_ ⚠cvtepu64_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtepu64_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed unsigned 64-bit integers in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_maskz_ ⚠cvtepu64_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed unsigned 64-bit integers in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtne2ps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConvert packed single-precision (32-bit) floating-point elements in two vectors a and b to packed BF16 (16-bit) floating-point elements, and store the results in single vector dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm_maskz_ ⚠cvtneps_ pbh Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtpbh_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlConverts packed BF16 (16-bit) floating-point elements in a to single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtpd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of dst are zeroed out.
- _mm_maskz_ ⚠cvtpd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ pd Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed double-precision (64-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtph_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers, and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtps_ ph Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠cvtsd_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower double-precision (64-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvtsd_ ss Experimental (x86 or x86-64) and avx512fConvert the lower double-precision (64-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvtsepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtsepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtsepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 32-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst.
- _mm_maskz_ ⚠cvtsepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 8-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtsepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 16-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtsepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed signed 64-bit integers in a to packed 32-bit integers with signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtsh_ sd Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠cvtsh_ ss Experimental (x86 or x86-64) and avx512fp16Convert the lower half-precision (16-bit) floating-point element in b to a single-precision (32-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvtss_ sd Experimental (x86 or x86-64) and avx512fConvert the lower single-precision (32-bit) floating-point element in b to a double-precision (64-bit) floating-point element, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠cvtss_ sh Experimental (x86 or x86-64) and avx512fp16Convert the lower single-precision (32-bit) floating-point element in b to a half-precision (16-bit) floating-point elements, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠cvttpd_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttpd_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvttpd_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttpd_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed double-precision (64-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvttph_ epi16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttph_ epi32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttph_ epi64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttph_ epu16 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed unsigned 16-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttph_ epu32 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 32-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttph_ epu64 Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed 64-bit unsigned integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttps_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttps_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed signed 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvttps_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed double-precision (32-bit) floating-point elements in a to packed unsigned 32-bit integers with truncation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvttps_ epu64 Experimental (x86 or x86-64) and avx512dq,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed unsigned 64-bit integers with truncation, and store the result in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠cvtusepi16_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed unsigned 16-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtusepi32_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtusepi32_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 32-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtusepi64_ epi8 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 8-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtusepi64_ epi16 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 16-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtusepi64_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlConvert packed unsigned 64-bit integers in a to packed unsigned 32-bit integers with unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtxph_ ps Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed half-precision (16-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠cvtxps_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert packed single-precision (32-bit) floating-point elements in a to packed half-precision (16-bit) floating-point elements, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of dst are zeroed out.
- _mm_maskz_ ⚠dbsad_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in a compared to those in b, and store the 16-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from a, and the last two SADs use the uppper 8-bit quadruplet of the lane from a. Quadruplets from b are selected from within 128-bit lanes according to the control in imm8, and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
- _mm_maskz_ ⚠div_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠div_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlDivide packed half-precision (16-bit) floating-point elements in a by b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠div_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDivide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠div_ round_ sd Experimental (x86 or x86-64) and avx512fDivide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠div_ round_ sh Experimental (x86 or x86-64) and avx512fp16Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ ⚠div_ round_ ss Experimental (x86 or x86-64) and avx512fDivide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠div_ sd Experimental (x86 or x86-64) and avx512fDivide the lower double-precision (64-bit) floating-point element in a by the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠div_ sh Experimental (x86 or x86-64) and avx512fp16Divide the lower half-precision (16-bit) floating-point elements in a by b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ ⚠div_ ss Experimental (x86 or x86-64) and avx512fDivide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠dpbf16_ ps Experimental (x86 or x86-64) and avx512bf16,avx512vlCompute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Intel’s documentation
- _mm_maskz_ ⚠dpbusd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠dpbusds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠dpwssd_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠dpwssds_ epi32 Experimental (x86 or x86-64) and avx512vnni,avx512vlMultiply groups of 2 adjacent pairs of signed 16-bit integers in a with corresponding 16-bit integers in b, producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in src using signed saturation, and store the packed 32-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expand_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expand_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expand_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expand_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expand_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expand_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from a (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi8 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 8-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlLoad contiguous active 16-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 32-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active 64-bit integers from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active double-precision (64-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠expandloadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad contiguous active single-precision (32-bit) floating-point elements from unaligned memory at mem_addr (those with their respective bit set in mask k), and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fcmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠fcmadd_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1, or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠fcmadd_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex number in a by the complex conjugate of the lower complex number in b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠fcmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a by the complex conjugates of packed complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠fcmul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠fcmul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a by the complex conjugates of the lower complex numbers in b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1], or the complex conjugateconjugate = vec.fp16[0] - i * vec.fp16[1].
- _mm_maskz_ ⚠fixupimm_ pd Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed double-precision (64-bit) floating-point elements in a and b using packed 64-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_maskz_ ⚠fixupimm_ ps Experimental (x86 or x86-64) and avx512f,avx512vlFix up packed single-precision (32-bit) floating-point elements in a and b using packed 32-bit integers in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). imm8 is used to set the required flags reporting.
- _mm_maskz_ ⚠fixupimm_ round_ sd Experimental (x86 or x86-64) and avx512fFix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠fixupimm_ round_ ss Experimental (x86 or x86-64) and avx512fFix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠fixupimm_ sd Experimental (x86 or x86-64) and avx512fFix up the lower double-precision (64-bit) floating-point elements in a and b using the lower 64-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. imm8 is used to set the required flags reporting.
- _mm_maskz_ ⚠fixupimm_ ss Experimental (x86 or x86-64) and avx512fFix up the lower single-precision (32-bit) floating-point elements in a and b using the lower 32-bit integer in c, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. imm8 is used to set the required flags reporting.
- _mm_maskz_ ⚠fmadd_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, accumulate to the corresponding complex numbers in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠fmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmadd_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠fmadd_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠fmadd_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fmadd_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠fmadd_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, accumulate to the lower complex number in c, and store the result in the lower elements of dst using zeromask k (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠fmadd_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠fmadd_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fmadd_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fmaddsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmaddsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmaddsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmsub_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠fmsub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fmsub_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠fmsub_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠fmsub_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract packed elements in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fmsub_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fmsubadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmsubadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c to/from the intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmsubadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fmul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠fmul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠fmul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠fnmadd_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fnmadd_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract the intermediate result from packed elements in c, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fnmadd_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fnmadd_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠fnmadd_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fnmadd_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠fnmadd_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠fnmadd_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fnmadd_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and add the negated intermediate result to the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fnmsub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fnmsub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (the element is zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fnmsub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠fnmsub_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠fnmsub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fnmsub_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠fnmsub_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠fnmsub_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, and subtract the intermediate result from the lower element in c. Store the result in the lower element of dst using zeromask k (the element is zeroed out when the mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠fnmsub_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point elements in a and b, and subtract the lower element in c from the negated intermediate result. Store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠getexp_ pd Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed double-precision (64-bit) floating-point element in a to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_maskz_ ⚠getexp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlConvert the exponent of each packed half-precision (16-bit) floating-point element in a to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculatesfloor(log2(x))for each element.
- _mm_maskz_ ⚠getexp_ ps Experimental (x86 or x86-64) and avx512f,avx512vlConvert the exponent of each packed single-precision (32-bit) floating-point element in a to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates floor(log2(x)) for each element.
- _mm_maskz_ ⚠getexp_ round_ sd Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠getexp_ round_ sh Experimental (x86 or x86-64) and avx512fp16Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculatesfloor(log2(x))for the lower element. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_maskz_ ⚠getexp_ round_ ss Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠getexp_ sd Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower double-precision (64-bit) floating-point element in b to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_maskz_ ⚠getexp_ sh Experimental (x86 or x86-64) and avx512fp16Convert the exponent of the lower half-precision (16-bit) floating-point element in b to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculatesfloor(log2(x))for the lower element.
- _mm_maskz_ ⚠getexp_ ss Experimental (x86 or x86-64) and avx512fConvert the exponent of the lower single-precision (32-bit) floating-point element in b to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates floor(log2(x)) for the lower element.
- _mm_maskz_ ⚠getmant_ pd Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_maskz_ ⚠getmant_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlNormalize the mantissas of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_maskz_ ⚠getmant_ ps Experimental (x86 or x86-64) and avx512f,avx512vlNormalize the mantissas of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
- _mm_maskz_ ⚠getmant_ round_ sd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠getmant_ round_ sh Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter
- _mm_maskz_ ⚠getmant_ round_ ss Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠getmant_ sd Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠getmant_ sh Experimental (x86 or x86-64) and avx512fp16Normalize the mantissas of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. This intrinsic essentially calculates±(2^k)*|x.significand|, where k depends on the interval range defined by norm and the sign depends on sign and the source sign.
- _mm_maskz_ ⚠getmant_ ss Experimental (x86 or x86-64) and avx512fNormalize the mantissas of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interv and the sign depends on sc and the source sign.
 The mantissa is normalized to the interval specified by interv, which can take the following values:
 _MM_MANT_NORM_1_2 // interval [1, 2)
 _MM_MANT_NORM_p5_2 // interval [0.5, 2)
 _MM_MANT_NORM_p5_1 // interval [0.5, 1)
 _MM_MANT_NORM_p75_1p5 // interval [0.75, 1.5)
 The sign is determined by sc which can take the following values:
 _MM_MANT_SIGN_src // sign = sign(src)
 _MM_MANT_SIGN_zero // sign = 0
 _MM_MANT_SIGN_nan // dst = NaN if sign(src) = 1
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠gf2p8affine_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the packed bytes in x. That is computes a*x+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_maskz_ ⚠gf2p8affineinv_ epi64_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms an affine transformation on the inverted packed bytes in x. That is computes a*inv(x)+b over the Galois Field 2^8 for each packed byte with a being a 8x8 bit matrix and b being a constant 8-bit immediate value. The inverse of a byte is defined with respect to the reduction polynomial x^8+x^4+x^3+x+1. The inverse of 0 is 0. Each pack of 8 bytes in x is paired with the 64-bit word at the same position in a.
- _mm_maskz_ ⚠gf2p8mul_ epi8 Experimental (x86 or x86-64) and gfni,avx512bw,avx512vlPerforms a multiplication in GF(2^8) on the packed bytes. The field is in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
- _mm_maskz_ ⚠load_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ sd Experimental (x86 or x86-64) and avx512fLoad a double-precision (64-bit) floating-point element from memory into the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and set the upper element of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠load_ sh Experimental (x86 or x86-64) and avx512fp16Load a half-precision (16-bit) floating-point element from memory into the lower element of a new vector using zeromask k (the element is zeroed out when mask bit 0 is not set), and zero the upper elements.
- _mm_maskz_ ⚠load_ ss Experimental (x86 or x86-64) and avx512fLoad a single-precision (32-bit) floating-point element from memory into the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and set the upper 3 packed elements of dst to zero. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_maskz_ ⚠loadu_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 8-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlLoad packed 16-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 32-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed 64-bit integers from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed double-precision (64-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠loadu_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoad packed single-precision (32-bit) floating-point elements from memory into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). mem_addr does not need to be aligned on any particular boundary.
- _mm_maskz_ ⚠lzcnt_ epi32 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 32-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠lzcnt_ epi64 Experimental (x86 or x86-64) and avx512cd,avx512vlCounts the number of leading zero bits in each packed 64-bit integer in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠madd52hi_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠madd52lo_ epu64 Experimental (x86 or x86-64) and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element ofbandcto form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer ina, and store the results indstusing writemaskk(elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠madd_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠maddubs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_maskz_ ⚠max_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠max_ round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠max_ round_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_maskz_ ⚠max_ round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠max_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠max_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_maskz_ ⚠max_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed signed 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 8-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlCompare packed unsigned 16-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 32-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_maskz_ ⚠min_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠min_ round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠min_ round_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_maskz_ ⚠min_ round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠min_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠min_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_maskz_ ⚠min_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠mov_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 8-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mov_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMove packed 16-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mov_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 32-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mov_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlMove packed 64-bit integers from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mov_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMove packed double-precision (64-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mov_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMove packed single-precision (32-bit) floating-point elements from a into dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠move_ sd Experimental (x86 or x86-64) and avx512fMove the lower double-precision (64-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠move_ sh Experimental (x86 or x86-64) and avx512fp16Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠move_ ss Experimental (x86 or x86-64) and avx512fMove the lower single-precision (32-bit) floating-point element from b to the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠movedup_ pd Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed double-precision (64-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠movehdup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate odd-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠moveldup_ ps Experimental (x86 or x86-64) and avx512f,avx512vlDuplicate even-indexed single-precision (32-bit) floating-point elements from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mul_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low signed 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mul_ epu32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mul_ pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst using zeromask k (the element is zeroed out when corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠mul_ pd Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mul_ ps Experimental (x86 or x86-64) and avx512f,avx512vlMultiply packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mul_ round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠mul_ round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠mul_ round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ ⚠mul_ round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠mul_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_maskz_ ⚠mul_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠mul_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ ⚠mul_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠mulhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mulhi_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mulhrs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mullo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlMultiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mullo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlMultiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠mullo_ epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indstusing zeromaskk(elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠multishift_ epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠or_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠or_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠or_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠or_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise OR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠packs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠packs_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠packus_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠packus_ epi32 Experimental (x86 or x86-64) and avx512bw,avx512vlConvert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permute_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permute_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutevar_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutevar_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutex2var_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutex2var_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutex2var_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutex2var_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutex2var_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutex2var_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutexvar_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠permutexvar_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠popcnt_ epi8 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ ⚠popcnt_ epi16 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ ⚠popcnt_ epi32 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ ⚠popcnt_ epi64 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm_maskz_ ⚠range_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ ⚠range_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ ⚠range_ round_ sd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠range_ round_ ss Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_maskz_ ⚠range_ sd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ ⚠range_ ss Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_maskz_ ⚠rcp14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rcp14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rcp14_ sd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rcp14_ ss Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rcp_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indstusing zeromaskk(elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_maskz_ ⚠rcp_ sh Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_maskz_ ⚠reduce_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ ⚠reduce_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlExtract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠reduce_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out if the corresponding mask bit is not set). Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ ⚠reduce_ round_ sd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ ⚠reduce_ round_ sh Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠reduce_ round_ ss Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ ⚠reduce_ sd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ ⚠reduce_ sh Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠reduce_ ss Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_maskz_ ⚠rol_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠rol_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠rolv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠rolv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠ror_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠ror_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠rorv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠rorv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠roundscale_ pd Experimental (x86 or x86-64) and avx512f,avx512vlRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠roundscale_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlRound packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠roundscale_ ps Experimental (x86 or x86-64) and avx512f,avx512vlRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠roundscale_ round_ sd Experimental (x86 or x86-64) and avx512fRound the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠roundscale_ round_ sh Experimental (x86 or x86-64) and avx512fp16Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠roundscale_ round_ ss Experimental (x86 or x86-64) and avx512fRound the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠roundscale_ sd Experimental (x86 or x86-64) and avx512fRound the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠roundscale_ sh Experimental (x86 or x86-64) and avx512fp16Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠roundscale_ ss Experimental (x86 or x86-64) and avx512fRound the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_maskz_ ⚠rsqrt14_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rsqrt14_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rsqrt14_ sd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rsqrt14_ ss Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_maskz_ ⚠rsqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_maskz_ ⚠rsqrt_ sh Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_maskz_ ⚠scalef_ pd Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠scalef_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlScale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠scalef_ ps Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠scalef_ round_ sd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠scalef_ round_ sh Experimental (x86 or x86-64) and avx512fp16Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠scalef_ round_ ss Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠scalef_ sd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠scalef_ sh Experimental (x86 or x86-64) and avx512fp16Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠scalef_ ss Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠set1_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast 8-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠set1_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlBroadcast the low packed 16-bit integer from a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠set1_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 32-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠set1_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBroadcast 64-bit integer a to all elements of dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shldi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shldi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shldi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shldv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shldv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shldv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shrdi_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shrdi_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shrdi_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shrdv_ epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shrdv_ epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shrdv_ epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shuffle_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle packed 8-bit integers in a according to shuffle control mask in the corresponding 8-bit element of b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shuffle_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shuffle_ pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shuffle_ ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a using the control in imm8, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shufflehi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of dst, with the low 64 bits of 128-bit lanes being copied from a to dst, using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠shufflelo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of dst, with the high 64 bits of 128-bit lanes being copied from a to dst, using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sll_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sll_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sll_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠slli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠slli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠slli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sllv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sllv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sllv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sqrt_ pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sqrt_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sqrt_ ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sqrt_ round_ sd Experimental (x86 or x86-64) and avx512fCompute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠sqrt_ round_ sh Experimental (x86 or x86-64) and avx512fp16Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ ⚠sqrt_ round_ ss Experimental (x86 or x86-64) and avx512fCompute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠sqrt_ sd Experimental (x86 or x86-64) and avx512fCompute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠sqrt_ sh Experimental (x86 or x86-64) and avx512fp16Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠sqrt_ ss Experimental (x86 or x86-64) and avx512fCompute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠sra_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sra_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sra_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srai_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srai_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srai_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srav_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srav_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srav_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srl_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srl_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srl_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srli_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srli_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srli_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srlv_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srlv_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠srlv_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 8-bit integers in b from packed 8-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed 16-bit integers in b from packed 16-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 32-bit integers in b from packed 32-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed 64-bit integers in b from packed 64-bit integers in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ pd Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlSubtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ ps Experimental (x86 or x86-64) and avx512f,avx512vlSubtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠sub_ round_ sd Experimental (x86 or x86-64) and avx512fSubtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.\
- _mm_maskz_ ⚠sub_ round_ sh Experimental (x86 or x86-64) and avx512fp16Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set). Rounding is done according to the rounding parameter, which can be one of:
- _mm_maskz_ ⚠sub_ round_ ss Experimental (x86 or x86-64) and avx512fSubtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_maskz_ ⚠sub_ sd Experimental (x86 or x86-64) and avx512fSubtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper element from a to the upper element of dst.
- _mm_maskz_ ⚠sub_ sh Experimental (x86 or x86-64) and avx512fp16Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst using zeromask k (the element is zeroed out when mask bit 0 is not set).
- _mm_maskz_ ⚠sub_ ss Experimental (x86 or x86-64) and avx512fSubtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst using zeromask k (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_maskz_ ⚠subs_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠subs_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠subs_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠subs_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlSubtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠ternarylogic_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠ternarylogic_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst using zeromask k at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpackhi_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpackhi_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpackhi_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpackhi_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpackhi_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpackhi_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpacklo_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpacklo_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlUnpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpacklo_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpacklo_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpacklo_ pd Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠unpacklo_ ps Experimental (x86 or x86-64) and avx512f,avx512vlUnpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠xor_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠xor_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
- _mm_maskz_ ⚠xor_ pd Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed double-precision (64-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_maskz_ ⚠xor_ ps Experimental (x86 or x86-64) and avx512dq,avx512vlCompute the bitwise XOR of packed single-precision (32-bit) floating point numbers in a and b and store the results in dst using zeromask k (elements are zeroed out if the corresponding bit is not set).
- _mm_max_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed maximum values in dst.
- _mm_max_ ⚠epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed maximum values in dst.
- _mm_max_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed maximum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_max_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_max_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_max_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_max_ ⚠sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the maximum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) maximum value when inputs are NaN or signed-zero values.
- _mm_min_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed signed 64-bit integers in a and b, and store packed minimum values in dst.
- _mm_min_ ⚠epu64 Experimental (x86 or x86-64) and avx512f,avx512vlCompare packed unsigned 64-bit integers in a and b, and store packed minimum values in dst.
- _mm_min_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare packed half-precision (16-bit) floating-point elements in a and b, and store packed minimum values in dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_min_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fCompare the lower double-precision (64-bit) floating-point elements in a and b, store the minimum value in the lower element of dst , and copy the upper element from a to the upper element of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_min_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_min_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fCompare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_min_ ⚠sh Experimental (x86 or x86-64) and avx512fp16,avx512vlCompare the lower half-precision (16-bit) floating-point elements in a and b, store the minimum value in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Does not follow the IEEE Standard for Floating-Point Arithmetic (IEEE 754) minimum value when inputs are NaN or signed-zero values.
- _mm_mmask_ ⚠i32gather_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 32-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i32gather_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 2 64-bit integer elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i32gather_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoads 2 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i32gather_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoads 4 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 32-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ epi32 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 2 32-bit integer elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ epi64 Experimental (x86 or x86-64) and avx512f,avx512vlLoads 2 64-bit integer elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ pd Experimental (x86 or x86-64) and avx512f,avx512vlLoads 2 double-precision (64-bit) floating-point elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_mmask_ ⚠i64gather_ ps Experimental (x86 or x86-64) and avx512f,avx512vlLoads 2 single-precision (32-bit) floating-point elements from memory starting at location base_addr at packed 64-bit integer indices stored in vindex scaled by scale using writemask k (elements are copied from src when the corresponding mask bit is not set).
- _mm_move_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Move the lower half-precision (16-bit) floating-point element from b to the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_movepi8_ ⚠mask Experimental (x86 or x86-64) and avx512bw,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 8-bit integer in a.
- _mm_movepi16_ ⚠mask Experimental (x86 or x86-64) and avx512bw,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 16-bit integer in a.
- _mm_movepi32_ ⚠mask Experimental (x86 or x86-64) and avx512dq,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 32-bit integer in a.
- _mm_movepi64_ ⚠mask Experimental (x86 or x86-64) and avx512dq,avx512vlSet each bit of mask register k based on the most significant bit of the corresponding packed 64-bit integer in a.
- _mm_movm_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlSet each packed 8-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_movm_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlSet each packed 16-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_movm_ ⚠epi32 Experimental (x86 or x86-64) and avx512dq,avx512vlSet each packed 32-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_movm_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlSet each packed 64-bit integer in dst to all ones or all zeros based on the value of the corresponding bit in k.
- _mm_mul_ ⚠pch Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed complex numbers in a and b, and store the results in dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mul_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlMultiply packed half-precision (16-bit) floating-point elements in a and b, and store the results in dst.
- _mm_mul_ ⚠round_ sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mul_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fMultiply the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_mul_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_mul_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fMultiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_mul_ ⚠sch Experimental (x86 or x86-64) and avx512fp16Multiply the lower complex numbers in a and b, and store the result in the lower elements of dst, and copy the upper 6 packed elements from a to the upper elements of dst. Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex numbercomplex = vec.fp16[0] + i * vec.fp16[1].
- _mm_mul_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Multiply the lower half-precision (16-bit) floating-point elements in a and b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_mullo_ ⚠epi64 Experimental (x86 or x86-64) and avx512dq,avx512vlMultiply packed 64-bit integers inaandb, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers indst.
- _mm_multishift_ ⚠epi64_ epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlFor each 64-bit element in b, select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of a, and store the 8 assembled bytes to the corresponding 64-bit element of dst.
- _mm_or_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 32-bit integers in a and b, and store the results in dst.
- _mm_or_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise OR of packed 64-bit integers in a and b, and store the resut in dst.
- _mm_permutex2var_ ⚠epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 32-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShuffle 64-bit integers in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlShuffle double-precision (64-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlShuffle half-precision (16-bit) floating-point elements in a and b using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutex2var_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlShuffle single-precision (32-bit) floating-point elements in a and b across lanes using the corresponding selector and index in idx, and store the results in dst.
- _mm_permutexvar_ ⚠epi8 Experimental (x86 or x86-64) and avx512vbmi,avx512vlShuffle 8-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm_permutexvar_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShuffle 16-bit integers in a across lanes using the corresponding index in idx, and store the results in dst.
- _mm_permutexvar_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlShuffle half-precision (16-bit) floating-point elements in a using the corresponding index in idx, and store the results in dst.
- _mm_popcnt_ ⚠epi8 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 8-bit integer maps the value to the number of logical 1 bits.
- _mm_popcnt_ ⚠epi16 Experimental (x86 or x86-64) and avx512bitalg,avx512vlFor each packed 16-bit integer maps the value to the number of logical 1 bits.
- _mm_popcnt_ ⚠epi32 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 32-bit integer maps the value to the number of logical 1 bits.
- _mm_popcnt_ ⚠epi64 Experimental (x86 or x86-64) and avx512vpopcntdq,avx512vlFor each packed 64-bit integer maps the value to the number of logical 1 bits.
- _mm_range_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed double-precision (64-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_range_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for packed single-precision (32-bit) floating-point elements in a and b, and store the results in dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
- _mm_range_ ⚠round_ sd Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower double-precision (64-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_range_ ⚠round_ ss Experimental (x86 or x86-64) and avx512dqCalculate the max, min, absolute max, or absolute min (depending on control in imm8) for the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. Lower 2 bits of IMM8 specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max. Upper 2 bits of IMM8 specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. Exceptions can be suppressed by passing _MM_FROUND_NO_EXC in the sae parameter.
- _mm_rcp14_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp14_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp14_ ⚠sd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp14_ ⚠ss Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rcp_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal of packed 16-bit floating-point elements inaand stores the results indst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_rcp_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_reduce_ ⚠add_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by addition. Returns the sum of all elements in a.
- _mm_reduce_ ⚠add_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by addition. Returns the sum of all elements in a.
- _mm_reduce_ ⚠add_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by addition. Returns the sum of all elements in a.
- _mm_reduce_ ⚠and_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm_reduce_ ⚠and_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise AND. Returns the bitwise AND of all elements in a.
- _mm_reduce_ ⚠max_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ ⚠max_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ ⚠max_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ ⚠max_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ ⚠max_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by maximum. Returns the maximum of all elements in a.
- _mm_reduce_ ⚠min_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ ⚠min_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ ⚠min_ epu8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 8-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ ⚠min_ epu16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed unsigned 16-bit integers in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ ⚠min_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by minimum. Returns the minimum of all elements in a.
- _mm_reduce_ ⚠mul_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm_reduce_ ⚠mul_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by multiplication. Returns the product of all elements in a.
- _mm_reduce_ ⚠mul_ ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReduce the packed half-precision (16-bit) floating-point elements in a by multiplication. Returns the product of all elements in a.
- _mm_reduce_ ⚠or_ epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 8-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm_reduce_ ⚠or_ epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlReduce the packed 16-bit integers in a by bitwise OR. Returns the bitwise OR of all elements in a.
- _mm_reduce_ ⚠pd Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed double-precision (64-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlExtract the reduced argument of packed half-precision (16-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst.
- _mm_reduce_ ⚠ps Experimental (x86 or x86-64) and avx512dq,avx512vlExtract the reduced argument of packed single-precision (32-bit) floating-point elements in a by the number of bits specified by imm8, and store the results in dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ⚠round_ sd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_reduce_ ⚠round_ ss Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ⚠sd Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower double-precision (64-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst using, and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_reduce_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Extract the reduced argument of the lower half-precision (16-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_reduce_ ⚠ss Experimental (x86 or x86-64) and avx512dqExtract the reduced argument of the lower single-precision (32-bit) floating-point element in b by the number of bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a. to the upper element of dst. Rounding is done according to the imm8 parameter, which can be one of:
- _mm_rol_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm_rol_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in imm8, and store the results in dst.
- _mm_rolv_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_rolv_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the left by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_ror_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm_ror_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in imm8, and store the results in dst.
- _mm_rorv_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 32-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_rorv_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlRotate the bits in each packed 64-bit integer in a to the right by the number of bits specified in the corresponding element of b, and store the results in dst.
- _mm_roundscale_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlRound packed double-precision (64-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlRound packed half-precision (16-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
- _mm_roundscale_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlRound packed single-precision (32-bit) floating-point elements in a to the number of fraction bits specified by imm8, and store the results in dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fRound the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_roundscale_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fRound the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ⚠sd Experimental (x86 or x86-64) and avx512fRound the lower double-precision (64-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_roundscale_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Round the lower half-precision (16-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_roundscale_ ⚠ss Experimental (x86 or x86-64) and avx512fRound the lower single-precision (32-bit) floating-point element in b to the number of fraction bits specified by imm8, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
 Rounding is done according to the imm8[2:0] parameter, which can be one of:\
- _mm_rsqrt14_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt14_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlCompute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt14_ ⚠sd Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt14_ ⚠ss Experimental (x86 or x86-64) and avx512fCompute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than 2^-14.
- _mm_rsqrt_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_rsqrt_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. The maximum relative error for this approximation is less than1.5*2^-12.
- _mm_scalef_ ⚠pd Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed double-precision (64-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_scalef_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlScale the packed half-precision (16-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_scalef_ ⚠ps Experimental (x86 or x86-64) and avx512f,avx512vlScale the packed single-precision (32-bit) floating-point elements in a using values from b, and store the results in dst.
- _mm_scalef_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_scalef_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_scalef_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_scalef_ ⚠sd Experimental (x86 or x86-64) and avx512fScale the packed double-precision (64-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.
- _mm_scalef_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Scale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_scalef_ ⚠ss Experimental (x86 or x86-64) and avx512fScale the packed single-precision (32-bit) floating-point elements in a using values from b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.
- _mm_set1_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Broadcast the half-precision (16-bit) floating-point value a to all elements of dst.
- _mm_set_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Set packed half-precision (16-bit) floating-point elements in dst with the supplied values.
- _mm_set_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Copy half-precision (16-bit) floating-point elements from a to the lower element of dst and zero the upper 7 elements.
- _mm_setr_ ⚠ph Experimental (x86 or x86-64) and avx512fp16Set packed half-precision (16-bit) floating-point elements in dst with the supplied values in reverse order.
- _mm_setzero_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReturn vector of type __m128h with all elements set to zero.
- _mm_shldi_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by imm8 bits, and store the upper 16-bits in dst).
- _mm_shldi_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by imm8 bits, and store the upper 32-bits in dst.
- _mm_shldi_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by imm8 bits, and store the upper 64-bits in dst).
- _mm_shldv_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in a and b producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 16-bits in dst.
- _mm_shldv_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in a and b producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 32-bits in dst.
- _mm_shldv_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in a and b producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of c, and store the upper 64-bits in dst.
- _mm_shrdi_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by imm8 bits, and store the lower 16-bits in dst.
- _mm_shrdi_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by imm8 bits, and store the lower 32-bits in dst.
- _mm_shrdi_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by imm8 bits, and store the lower 64-bits in dst.
- _mm_shrdv_ ⚠epi16 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 16-bit integers in b and a producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 16-bits in dst.
- _mm_shrdv_ ⚠epi32 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
- _mm_shrdv_ ⚠epi64 Experimental (x86 or x86-64) and avx512vbmi2,avx512vlConcatenate packed 64-bit integers in b and a producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 64-bits in dst.
- _mm_sllv_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm_sqrt_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlCompute the square root of packed half-precision (16-bit) floating-point elements in a, and store the results in dst.
- _mm_sqrt_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fCompute the square root of the lower double-precision (64-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_sqrt_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_sqrt_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fCompute the square root of the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_sqrt_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compute the square root of the lower half-precision (16-bit) floating-point element in b, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_sra_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by count while shifting in sign bits, and store the results in dst.
- _mm_srai_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by imm8 while shifting in sign bits, and store the results in dst.
- _mm_srav_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm_srav_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlShift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in sign bits, and store the results in dst.
- _mm_srlv_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlShift packed 16-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeros, and store the results in dst.
- _mm_store_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore 128-bits (composed of 4 packed 32-bit integers) from a into memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore 128-bits (composed of 2 packed 64-bit integers) from a into memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
- _mm_store_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlStore 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address must be aligned to 16 bytes or a general-protection exception may be generated.
- _mm_store_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Store the lower half-precision (16-bit) floating-point element from a into memory.
- _mm_storeu_ ⚠epi8 Experimental (x86 or x86-64) and avx512bw,avx512vlStore 128-bits (composed of 16 packed 8-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠epi16 Experimental (x86 or x86-64) and avx512bw,avx512vlStore 128-bits (composed of 8 packed 16-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlStore 128-bits (composed of 4 packed 32-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlStore 128-bits (composed of 2 packed 64-bit integers) from a into memory. mem_addr does not need to be aligned on any particular boundary.
- _mm_storeu_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlStore 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from a into memory. The address does not need to be aligned to any particular boundary.
- _mm_sub_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlSubtract packed half-precision (16-bit) floating-point elements in b from a, and store the results in dst.
- _mm_sub_ ⚠round_ sd Experimental (x86 or x86-64) and avx512fSubtract the lower double-precision (64-bit) floating-point element in b from the lower double-precision (64-bit) floating-point element in a, store the result in the lower element of dst, and copy the upper element from a to the upper element of dst.\
- _mm_sub_ ⚠round_ sh Experimental (x86 or x86-64) and avx512fp16Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst. Rounding is done according to the rounding parameter, which can be one of:
- _mm_sub_ ⚠round_ ss Experimental (x86 or x86-64) and avx512fSubtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the result in the lower element of dst, and copy the upper 3 packed elements from a to the upper elements of dst.\
- _mm_sub_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Subtract the lower half-precision (16-bit) floating-point elements in b from a, store the result in the lower element of dst, and copy the upper 7 packed elements from a to the upper elements of dst.
- _mm_ternarylogic_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 32-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm_ternarylogic_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlBitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in imm8. For each bit in each packed 64-bit integer, the corresponding bit from a, b, and c are used to form a 3 bit index into imm8, and the value at that bit in imm8 is written to the corresponding bit in dst.
- _mm_test_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise AND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_test_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise AND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is non-zero.
- _mm_testn_ ⚠epi8_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 8-bit integers in a and b, producing intermediate 8-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testn_ ⚠epi16_ mask Experimental (x86 or x86-64) and avx512bw,avx512vlCompute the bitwise NAND of packed 16-bit integers in a and b, producing intermediate 16-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testn_ ⚠epi32_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 32-bit integers in a and b, producing intermediate 32-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_testn_ ⚠epi64_ mask Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise NAND of packed 64-bit integers in a and b, producing intermediate 64-bit values, and set the corresponding bit in result mask k if the intermediate value is zero.
- _mm_ucomieq_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomige_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomigt_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomile_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomilt_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_ucomineq_ ⚠sh Experimental (x86 or x86-64) and avx512fp16Compare the lower half-precision (16-bit) floating-point elements in a and b for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
- _mm_undefined_ ⚠ph Experimental (x86 or x86-64) and avx512fp16,avx512vlReturn vector of type__m128hwith undefined elements. In practice, this returns the all-zero vector.
- _mm_xor_ ⚠epi32 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 32-bit integers in a and b, and store the results in dst.
- _mm_xor_ ⚠epi64 Experimental (x86 or x86-64) and avx512f,avx512vlCompute the bitwise XOR of packed 64-bit integers in a and b, and store the results in dst.
- _store_mask8 ⚠Experimental (x86 or x86-64) and avx512dqStore 8-bit mask to memory
- _store_mask16 ⚠Experimental (x86 or x86-64) and avx512fStore 16-bit mask to memory
- _store_mask32 ⚠Experimental (x86 or x86-64) and avx512bwStore 32-bit mask from a into memory.
- _store_mask64 ⚠Experimental (x86 or x86-64) and avx512bwStore 64-bit mask from a into memory.
- _tile_cmmimfp16ps ⚠Experimental amx-complexPerform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles a and b is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the imaginary part of the result. For each possible combination of (row of a, column of b), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from a and one from b). The imaginary part of the a element is multiplied with the real part of the corresponding b element, and the real part of the a element is multiplied with the imaginary part of the corresponding b elements. The two accumulated results are added, and then accumulated into the corresponding row and column of dst.
- _tile_cmmrlfp16ps ⚠Experimental amx-complexPerform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles a and b is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the real part of the result. For each possible combination of (row of a, column of b), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from a and one from b). The real part of the a element is multiplied with the real part of the corresponding b element, and the negated imaginary part of the a element is multiplied with the imaginary part of the corresponding b elements. The two accumulated results are added, and then accumulated into the corresponding row and column of dst.
- _tile_dpbf16ps ⚠Experimental amx-bf16Compute dot-product of BF16 (16-bit) floating-point pairs in tiles a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in dst, and store the 32-bit result back to tile dst.
- _tile_dpbssd ⚠Experimental amx-int8Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
- _tile_dpbsud ⚠Experimental amx-int8Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
- _tile_dpbusd ⚠Experimental amx-int8Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
- _tile_dpbuud ⚠Experimental amx-int8Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding unsigned 8-bit integers in b, producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in dst, and store the 32-bit result back to tile dst.
- _tile_dpfp16ps ⚠Experimental amx-fp16Compute dot-product of FP16 (16-bit) floating-point pairs in tiles a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in dst, and store the 32-bit result back to tile dst.
- _tile_loadconfig ⚠Experimental amx-tileLoad tile configuration from a 64-byte memory location specified by mem_addr. The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If the specified pallette_id is zero, that signifies the init state for both the tile config and the tile data, and the tiles are zeroed. Any invalid configurations will result in #GP fault.
- _tile_loadd ⚠Experimental amx-tileLoad tile rows from memory specifieid by base address and stride into destination tile dst using the tile configuration previously configured via _tile_loadconfig.
- _tile_release ⚠Experimental amx-tileRelease the tile configuration to return to the init state, which releases all storage it currently holds.
- _tile_storeconfig ⚠Experimental amx-tileStores the current tile configuration to a 64-byte memory location specified by mem_addr. The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If tiles are not configured, all zeroes will be stored to memory.
- _tile_stored ⚠Experimental amx-tileStore the tile specified by src to memory specifieid by base address and stride using the tile configuration previously configured via _tile_loadconfig.
- _tile_stream_ ⚠loadd Experimental amx-tileLoad tile rows from memory specifieid by base address and stride into destination tile dst using the tile configuration previously configured via _tile_loadconfig. This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly.
- _tile_zero ⚠Experimental amx-tileZero the tile specified by tdest.
- _xabort⚠Experimental (x86 or x86-64) and rtmForces a restricted transactional memory (RTM) region to abort.
- _xabort_code Experimental x86 or x86-64 
- _xbegin⚠Experimental (x86 or x86-64) and rtmSpecifies the start of a restricted transactional memory (RTM) code region and returns a value indicating status.
- _xend⚠Experimental (x86 or x86-64) and rtmSpecifies the end of a restricted transactional memory (RTM) code region.
- _xtest⚠Experimental (x86 or x86-64) and rtmQueries whether the processor is executing in a transactional region identified by restricted transactional memory (RTM) or hardware lock elision (HLE).
Type Aliases§
- _MM_CMPINT_ ENUM Experimental x86 or x86-64 The_MM_CMPINT_ENUMtype used to specify comparison operations in AVX-512 intrinsics.
- _MM_MANTISSA_ NORM_ ENUM Experimental x86 or x86-64 TheMM_MANTISSA_NORM_ENUMtype used to specify mantissa normalized operations in AVX-512 intrinsics.
- _MM_MANTISSA_ SIGN_ ENUM Experimental x86 or x86-64 TheMM_MANTISSA_SIGN_ENUMtype used to specify mantissa signed operations in AVX-512 intrinsics.
- _MM_PERM_ ENUM Experimental x86 or x86-64 TheMM_PERM_ENUMtype used to specify shuffle operations in AVX-512 intrinsics.
- __mmask8 Experimental x86 or x86-64 The__mmask8type used in AVX-512 intrinsics, a 8-bit integer
- __mmask16 Experimental x86 or x86-64 The__mmask16type used in AVX-512 intrinsics, a 16-bit integer
- __mmask32 Experimental x86 or x86-64 The__mmask32type used in AVX-512 intrinsics, a 32-bit integer
- __mmask64 Experimental x86 or x86-64 The__mmask64type used in AVX-512 intrinsics, a 64-bit integer