1.27.0[−][src]Module core::arch::x86_64
Platform-specific intrinsics for the x86_64 platform.
See the module documentation for more details.
Structs
| CpuidResult | x86-64 Result of the |
| __m128i | x86-64 128-bit wide integer vector type, x86-specific |
| __m128 | x86-64 128-bit wide set of four |
| __m128d | x86-64 128-bit wide set of two |
| __m256i | x86-64 256-bit wide integer vector type, x86-specific |
| __m256 | x86-64 256-bit wide set of eight |
| __m256d | x86-64 256-bit wide set of four |
| __m64 | Experimentalx86-64 64-bit wide integer vector type, x86-specific |
| __m512i | Experimentalx86-64 512-bit wide integer vector type, x86-specific |
| __m512 | Experimentalx86-64 512-bit wide set of sixteen |
| __m512d | Experimentalx86-64 512-bit wide set of eight |
Constants
| _CMP_EQ_OQ | x86-64 Equal (ordered, non-signaling) |
| _CMP_EQ_OS | x86-64 Equal (ordered, signaling) |
| _CMP_EQ_UQ | x86-64 Equal (unordered, non-signaling) |
| _CMP_EQ_US | x86-64 Equal (unordered, signaling) |
| _CMP_FALSE_OQ | x86-64 False (ordered, non-signaling) |
| _CMP_FALSE_OS | x86-64 False (ordered, signaling) |
| _CMP_GE_OQ | x86-64 Greater-than-or-equal (ordered, non-signaling) |
| _CMP_GE_OS | x86-64 Greater-than-or-equal (ordered, signaling) |
| _CMP_GT_OQ | x86-64 Greater-than (ordered, non-signaling) |
| _CMP_GT_OS | x86-64 Greater-than (ordered, signaling) |
| _CMP_LE_OQ | x86-64 Less-than-or-equal (ordered, non-signaling) |
| _CMP_LE_OS | x86-64 Less-than-or-equal (ordered, signaling) |
| _CMP_LT_OQ | x86-64 Less-than (ordered, non-signaling) |
| _CMP_LT_OS | x86-64 Less-than (ordered, signaling) |
| _CMP_NEQ_OQ | x86-64 Not-equal (ordered, non-signaling) |
| _CMP_NEQ_OS | x86-64 Not-equal (ordered, signaling) |
| _CMP_NEQ_UQ | x86-64 Not-equal (unordered, non-signaling) |
| _CMP_NEQ_US | x86-64 Not-equal (unordered, signaling) |
| _CMP_NGE_UQ | x86-64 Not-greater-than-or-equal (unordered, non-signaling) |
| _CMP_NGE_US | x86-64 Not-greater-than-or-equal (unordered, signaling) |
| _CMP_NGT_UQ | x86-64 Not-greater-than (unordered, non-signaling) |
| _CMP_NGT_US | x86-64 Not-greater-than (unordered, signaling) |
| _CMP_NLE_UQ | x86-64 Not-less-than-or-equal (unordered, non-signaling) |
| _CMP_NLE_US | x86-64 Not-less-than-or-equal (unordered, signaling) |
| _CMP_NLT_UQ | x86-64 Not-less-than (unordered, non-signaling) |
| _CMP_NLT_US | x86-64 Not-less-than (unordered, signaling) |
| _CMP_ORD_Q | x86-64 Ordered (non-signaling) |
| _CMP_ORD_S | x86-64 Ordered (signaling) |
| _CMP_TRUE_UQ | x86-64 True (unordered, non-signaling) |
| _CMP_TRUE_US | x86-64 True (unordered, signaling) |
| _CMP_UNORD_Q | x86-64 Unordered (non-signaling) |
| _CMP_UNORD_S | x86-64 Unordered (signaling) |
| _MM_EXCEPT_DENORM | x86-64 See |
| _MM_EXCEPT_DIV_ZERO | x86-64 See |
| _MM_EXCEPT_INEXACT | x86-64 See |
| _MM_EXCEPT_INVALID | x86-64 See |
| _MM_EXCEPT_MASK | x86-64 |
| _MM_EXCEPT_OVERFLOW | x86-64 See |
| _MM_EXCEPT_UNDERFLOW | x86-64 See |
| _MM_FLUSH_ZERO_MASK | x86-64 |
| _MM_FLUSH_ZERO_OFF | x86-64 See |
| _MM_FLUSH_ZERO_ON | x86-64 See |
| _MM_FROUND_CEIL | x86-64 round up and do not suppress exceptions |
| _MM_FROUND_CUR_DIRECTION | x86-64 use MXCSR.RC; see |
| _MM_FROUND_FLOOR | x86-64 round down and do not suppress exceptions |
| _MM_FROUND_NEARBYINT | x86-64 use MXCSR.RC and suppress exceptions; see |
| _MM_FROUND_NINT | x86-64 round to nearest and do not suppress exceptions |
| _MM_FROUND_NO_EXC | x86-64 suppress exceptions |
| _MM_FROUND_RAISE_EXC | x86-64 do not suppress exceptions |
| _MM_FROUND_RINT | x86-64 use MXCSR.RC and do not suppress exceptions; see
|
| _MM_FROUND_TO_NEAREST_INT | x86-64 round to nearest |
| _MM_FROUND_TO_NEG_INF | x86-64 round down |
| _MM_FROUND_TO_POS_INF | x86-64 round up |
| _MM_FROUND_TO_ZERO | x86-64 truncate |
| _MM_FROUND_TRUNC | x86-64 truncate and do not suppress exceptions |
| _MM_HINT_NTA | x86-64 See |
| _MM_HINT_T0 | x86-64 See |
| _MM_HINT_T1 | x86-64 See |
| _MM_HINT_T2 | x86-64 See |
| _MM_MASK_DENORM | x86-64 See |
| _MM_MASK_DIV_ZERO | x86-64 See |
| _MM_MASK_INEXACT | x86-64 See |
| _MM_MASK_INVALID | x86-64 See |
| _MM_MASK_MASK | x86-64 |
| _MM_MASK_OVERFLOW | x86-64 See |
| _MM_MASK_UNDERFLOW | x86-64 See |
| _MM_ROUND_DOWN | x86-64 See |
| _MM_ROUND_MASK | x86-64 |
| _MM_ROUND_NEAREST | x86-64 See |
| _MM_ROUND_TOWARD_ZERO | x86-64 See |
| _MM_ROUND_UP | x86-64 See |
| _SIDD_BIT_MASK | x86-64 Mask only: return the bit mask |
| _SIDD_CMP_EQUAL_ANY | x86-64 For each character in |
| _SIDD_CMP_EQUAL_EACH | x86-64 The strings defined by |
| _SIDD_CMP_EQUAL_ORDERED | x86-64 Search for the defined substring in the target |
| _SIDD_CMP_RANGES | x86-64 For each character in |
| _SIDD_LEAST_SIGNIFICANT | x86-64 Index only: return the least significant bit (Default) |
| _SIDD_MASKED_NEGATIVE_POLARITY | x86-64 Negates results only before the end of the string |
| _SIDD_MASKED_POSITIVE_POLARITY | x86-64 Do not negate results before the end of the string |
| _SIDD_MOST_SIGNIFICANT | x86-64 Index only: return the most significant bit |
| _SIDD_NEGATIVE_POLARITY | x86-64 Negates results |
| _SIDD_POSITIVE_POLARITY | x86-64 Do not negate results (Default) |
| _SIDD_SBYTE_OPS | x86-64 String contains signed 8-bit characters |
| _SIDD_SWORD_OPS | x86-64 String contains unsigned 16-bit characters |
| _SIDD_UBYTE_OPS | x86-64 String contains unsigned 8-bit characters (Default) |
| _SIDD_UNIT_MASK | x86-64 Mask only: return the byte mask |
| _SIDD_UWORD_OPS | x86-64 String contains unsigned 16-bit characters |
| _XCR_XFEATURE_ENABLED_MASK | x86-64
|
| _XABORT_CAPACITY | Experimentalx86-64 Transaction abort due to the transaction using too much memory. |
| _XABORT_CONFLICT | Experimentalx86-64 Transaction abort due to a memory conflict with another thread. |
| _XABORT_DEBUG | Experimentalx86-64 Transaction abort due to a debug trap. |
| _XABORT_EXPLICIT | Experimentalx86-64 Transaction explicitly aborted with xabort. The parameter passed to xabort is available with
|
| _XABORT_NESTED | Experimentalx86-64 Transaction abort in a inner nested transaction. |
| _XABORT_RETRY | Experimentalx86-64 Transaction retry is possible. |
| _XBEGIN_STARTED | Experimentalx86-64 Transaction successfully started. |
Functions
| _MM_GET_EXCEPTION_MASK⚠ | x86-64 and sseSee |
| _MM_GET_EXCEPTION_STATE⚠ | x86-64 and sseSee |
| _MM_GET_FLUSH_ZERO_MODE⚠ | x86-64 and sseSee |
| _MM_GET_ROUNDING_MODE⚠ | x86-64 and sseSee |
| _MM_SET_EXCEPTION_MASK⚠ | x86-64 and sseSee |
| _MM_SET_EXCEPTION_STATE⚠ | x86-64 and sseSee |
| _MM_SET_FLUSH_ZERO_MODE⚠ | x86-64 and sseSee |
| _MM_SET_ROUNDING_MODE⚠ | x86-64 and sseSee |
| _MM_TRANSPOSE4_PS⚠ | x86-64 and sseTranspose the 4x4 matrix formed by 4 rows of __m128 in place. |
| __cpuid⚠ | x86-64 See |
| __cpuid_count⚠ | x86-64 Returns the result of the |
| __get_cpuid_max⚠ | x86-64 Returns the highest-supported |
| __rdtscp⚠ | x86-64 Reads the current value of the processor’s time-stamp counter and
the |
| _addcarry_u32⚠ | x86-64 Adds unsigned 32-bit integers |
| _addcarry_u64⚠ | x86-64 Adds unsigned 64-bit integers |
| _addcarryx_u32⚠ | x86-64 and adxAdds unsigned 32-bit integers |
| _addcarryx_u64⚠ | x86-64 and adxAdds unsigned 64-bit integers |
| _andn_u32⚠ | x86-64 and bmi1Bitwise logical |
| _andn_u64⚠ | x86-64 and bmi1Bitwise logical |
| _bextr2_u32⚠ | x86-64 and bmi1Extracts bits of |
| _bextr2_u64⚠ | x86-64 and bmi1Extracts bits of |
| _bextr_u32⚠ | x86-64 and bmi1Extracts bits in range [ |
| _bextr_u64⚠ | x86-64 and bmi1Extracts bits in range [ |
| _blcfill_u32⚠ | x86-64 and tbmClears all bits below the least significant zero bit of |
| _blcfill_u64⚠ | x86-64 and tbmClears all bits below the least significant zero bit of |
| _blci_u32⚠ | x86-64 and tbmSets all bits of |
| _blci_u64⚠ | x86-64 and tbmSets all bits of |
| _blcic_u32⚠ | x86-64 and tbmSets the least significant zero bit of |
| _blcic_u64⚠ | x86-64 and tbmSets the least significant zero bit of |
| _blcmsk_u32⚠ | x86-64 and tbmSets the least significant zero bit of |
| _blcmsk_u64⚠ | x86-64 and tbmSets the least significant zero bit of |
| _blcs_u32⚠ | x86-64 and tbmSets the least significant zero bit of |
| _blcs_u64⚠ | x86-64 and tbmSets the least significant zero bit of |
| _blsfill_u32⚠ | x86-64 and tbmSets all bits of |
| _blsfill_u64⚠ | x86-64 and tbmSets all bits of |
| _blsi_u32⚠ | x86-64 and bmi1Extracts lowest set isolated bit. |
| _blsi_u64⚠ | x86-64 and bmi1Extracts lowest set isolated bit. |
| _blsic_u32⚠ | x86-64 and tbmClears least significant bit and sets all other bits. |
| _blsic_u64⚠ | x86-64 and tbmClears least significant bit and sets all other bits. |
| _blsmsk_u32⚠ | x86-64 and bmi1Gets mask up to lowest set bit. |
| _blsmsk_u64⚠ | x86-64 and bmi1Gets mask up to lowest set bit. |
| _blsr_u32⚠ | x86-64 and bmi1Resets the lowest set bit of |
| _blsr_u64⚠ | x86-64 and bmi1Resets the lowest set bit of |
| _bswap⚠ | x86-64 Returns an integer with the reversed byte order of x |
| _bswap64⚠ | x86-64 Returns an integer with the reversed byte order of x |
| _bzhi_u32⚠ | x86-64 and bmi2Zeroes higher bits of |
| _bzhi_u64⚠ | x86-64 and bmi2Zeroes higher bits of |
| _fxrstor⚠ | x86-64 and fxsrRestores the |
| _fxrstor64⚠ | x86-64 and fxsrRestores the |
| _fxsave⚠ | x86-64 and fxsrSaves the |
| _fxsave64⚠ | x86-64 and fxsrSaves the |
| _lzcnt_u32⚠ | x86-64 and lzcntCounts the leading most significant zero bits. |
| _lzcnt_u64⚠ | x86-64 and lzcntCounts the leading most significant zero bits. |
| _mm256_add_pd⚠ | x86-64 and avxAdds packed double-precision (64-bit) floating-point elements
in |
| _mm256_add_ps⚠ | x86-64 and avxAdds packed single-precision (32-bit) floating-point elements in |
| _mm256_and_pd⚠ | x86-64 and avxComputes the bitwise AND of a packed double-precision (64-bit)
floating-point elements in |
| _mm256_and_ps⚠ | x86-64 and avxComputes the bitwise AND of packed single-precision (32-bit) floating-point
elements in |
| _mm256_or_pd⚠ | x86-64 and avxComputes the bitwise OR packed double-precision (64-bit) floating-point
elements in |
| _mm256_or_ps⚠ | x86-64 and avxComputes the bitwise OR packed single-precision (32-bit) floating-point
elements in |
| _mm256_shuffle_pd⚠ | x86-64 and avxShuffles double-precision (64-bit) floating-point elements within 128-bit
lanes using the control in |
| _mm256_shuffle_ps⚠ | x86-64 and avxShuffles single-precision (32-bit) floating-point elements in |
| _mm256_andnot_pd⚠ | x86-64 and avxComputes the bitwise NOT of packed double-precision (64-bit) floating-point
elements in |
| _mm256_andnot_ps⚠ | x86-64 and avxComputes the bitwise NOT of packed single-precision (32-bit) floating-point
elements in |
| _mm256_max_pd⚠ | x86-64 and avxCompares packed double-precision (64-bit) floating-point elements
in |
| _mm256_max_ps⚠ | x86-64 and avxCompares packed single-precision (32-bit) floating-point elements in |
| _mm256_min_pd⚠ | x86-64 and avxCompares packed double-precision (64-bit) floating-point elements
in |
| _mm256_min_ps⚠ | x86-64 and avxCompares packed single-precision (32-bit) floating-point elements in |
| _mm256_mul_pd⚠ | x86-64 and avxMultiplies packed double-precision (64-bit) floating-point elements
in |
| _mm256_mul_ps⚠ | x86-64 and avxMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_addsub_pd⚠ | x86-64 and avxAlternatively adds and subtracts packed double-precision (64-bit)
floating-point elements in |
| _mm256_addsub_ps⚠ | x86-64 and avxAlternatively adds and subtracts packed single-precision (32-bit)
floating-point elements in |
| _mm256_sub_pd⚠ | x86-64 and avxSubtracts packed double-precision (64-bit) floating-point elements in |
| _mm256_sub_ps⚠ | x86-64 and avxSubtracts packed single-precision (32-bit) floating-point elements in |
| _mm256_div_ps⚠ | x86-64 and avxComputes the division of each of the 8 packed 32-bit floating-point elements
in |
| _mm256_div_pd⚠ | x86-64 and avxComputes the division of each of the 4 packed 64-bit floating-point elements
in |
| _mm256_round_pd⚠ | x86-64 and avxRounds packed double-precision (64-bit) floating point elements in |
| _mm256_ceil_pd⚠ | x86-64 and avxRounds packed double-precision (64-bit) floating point elements in |
| _mm256_floor_pd⚠ | x86-64 and avxRounds packed double-precision (64-bit) floating point elements in |
| _mm256_round_ps⚠ | x86-64 and avxRounds packed single-precision (32-bit) floating point elements in |
| _mm256_ceil_ps⚠ | x86-64 and avxRounds packed single-precision (32-bit) floating point elements in |
| _mm256_floor_ps⚠ | x86-64 and avxRounds packed single-precision (32-bit) floating point elements in |
| _mm256_sqrt_ps⚠ | x86-64 and avxReturns the square root of packed single-precision (32-bit) floating point
elements in |
| _mm256_sqrt_pd⚠ | x86-64 and avxReturns the square root of packed double-precision (64-bit) floating point
elements in |
| _mm256_blend_pd⚠ | x86-64 and avxBlends packed double-precision (64-bit) floating-point elements from
|
| _mm256_blend_ps⚠ | x86-64 and avxBlends packed single-precision (32-bit) floating-point elements from
|
| _mm256_blendv_pd⚠ | x86-64 and avxBlends packed double-precision (64-bit) floating-point elements from
|
| _mm256_blendv_ps⚠ | x86-64 and avxBlends packed single-precision (32-bit) floating-point elements from
|
| _mm256_dp_ps⚠ | x86-64 and avxConditionally multiplies the packed single-precision (32-bit) floating-point
elements in |
| _mm256_hadd_pd⚠ | x86-64 and avxHorizontal addition of adjacent pairs in the two packed vectors
of 4 64-bit floating points |
| _mm256_hadd_ps⚠ | x86-64 and avxHorizontal addition of adjacent pairs in the two packed vectors
of 8 32-bit floating points |
| _mm256_hsub_pd⚠ | x86-64 and avxHorizontal subtraction of adjacent pairs in the two packed vectors
of 4 64-bit floating points |
| _mm256_hsub_ps⚠ | x86-64 and avxHorizontal subtraction of adjacent pairs in the two packed vectors
of 8 32-bit floating points |
| _mm256_xor_pd⚠ | x86-64 and avxComputes the bitwise XOR of packed double-precision (64-bit) floating-point
elements in |
| _mm256_xor_ps⚠ | x86-64 and avxComputes the bitwise XOR of packed single-precision (32-bit) floating-point
elements in |
| _mm256_cmp_pd⚠ | x86-64 and avxCompares packed double-precision (64-bit) floating-point
elements in |
| _mm256_cmp_ps⚠ | x86-64 and avxCompares packed single-precision (32-bit) floating-point
elements in |
| _mm256_cvtpd_ps⚠ | x86-64 and avxConverts packed double-precision (64-bit) floating-point elements in |
| _mm256_cvtps_pd⚠ | x86-64 and avxConverts packed single-precision (32-bit) floating-point elements in |
| _mm256_zeroall⚠ | x86-64 and avxZeroes the contents of all XMM or YMM registers. |
| _mm256_zeroupper⚠ | x86-64 and avxZeroes the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. |
| _mm256_permutevar_ps⚠ | x86-64 and avxShuffles single-precision (32-bit) floating-point elements in |
| _mm256_permute_ps⚠ | x86-64 and avxShuffles single-precision (32-bit) floating-point elements in |
| _mm256_permutevar_pd⚠ | x86-64 and avxShuffles double-precision (64-bit) floating-point elements in |
| _mm256_permute_pd⚠ | x86-64 and avxShuffles double-precision (64-bit) floating-point elements in |
| _mm256_broadcast_ss⚠ | x86-64 and avxBroadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. |
| _mm256_broadcast_sd⚠ | x86-64 and avxBroadcasts a double-precision (64-bit) floating-point element from memory to all elements of the returned vector. |
| _mm256_broadcast_ps⚠ | x86-64 and avxBroadcasts 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of the returned vector. |
| _mm256_broadcast_pd⚠ | x86-64 and avxBroadcasts 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of the returned vector. |
| _mm256_load_pd⚠ | x86-64 and avxLoads 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
|
| _mm256_store_pd⚠ | x86-64 and avxStores 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from |
| _mm256_load_ps⚠ | x86-64 and avxLoads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
|
| _mm256_store_ps⚠ | x86-64 and avxStores 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from |
| _mm256_loadu_pd⚠ | x86-64 and avxLoads 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from memory into result.
|
| _mm256_storeu_pd⚠ | x86-64 and avxStores 256-bits (composed of 4 packed double-precision (64-bit)
floating-point elements) from |
| _mm256_loadu_ps⚠ | x86-64 and avxLoads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
|
| _mm256_storeu_ps⚠ | x86-64 and avxStores 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from |
| _mm256_maskload_pd⚠ | x86-64 and avxLoads packed double-precision (64-bit) floating-point elements from memory
into result using |
| _mm256_maskstore_pd⚠ | x86-64 and avxStores packed double-precision (64-bit) floating-point elements from |
| _mm256_maskload_ps⚠ | x86-64 and avxLoads packed single-precision (32-bit) floating-point elements from memory
into result using |
| _mm256_maskstore_ps⚠ | x86-64 and avxStores packed single-precision (32-bit) floating-point elements from |
| _mm256_movehdup_ps⚠ | x86-64 and avxDuplicate odd-indexed single-precision (32-bit) floating-point elements
from |
| _mm256_moveldup_ps⚠ | x86-64 and avxDuplicate even-indexed single-precision (32-bit) floating-point elements
from |
| _mm256_movedup_pd⚠ | x86-64 and avxDuplicate even-indexed double-precision (64-bit) floating-point elements
from |
| _mm256_stream_pd⚠ | x86-64 and avxMoves double-precision values from a 256-bit vector of |
| _mm256_stream_ps⚠ | x86-64 and avxMoves single-precision floating point values from a 256-bit vector
of |
| _mm256_rcp_ps⚠ | x86-64 and avxComputes the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in |
| _mm256_rsqrt_ps⚠ | x86-64 and avxComputes the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in |
| _mm256_unpackhi_pd⚠ | x86-64 and avxUnpacks and interleave double-precision (64-bit) floating-point elements
from the high half of each 128-bit lane in |
| _mm256_unpackhi_ps⚠ | x86-64 and avxUnpacks and interleave single-precision (32-bit) floating-point elements
from the high half of each 128-bit lane in |
| _mm256_unpacklo_pd⚠ | x86-64 and avxUnpacks and interleave double-precision (64-bit) floating-point elements
from the low half of each 128-bit lane in |
| _mm256_unpacklo_ps⚠ | x86-64 and avxUnpacks and interleave single-precision (32-bit) floating-point elements
from the low half of each 128-bit lane in |
| _mm256_testz_pd⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm256_testc_pd⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm256_testnzc_pd⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm256_testz_ps⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm256_testc_ps⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm256_testnzc_ps⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm256_movemask_pd⚠ | x86-64 and avxSets each bit of the returned mask based on the most significant bit of the
corresponding packed double-precision (64-bit) floating-point element in
|
| _mm256_movemask_ps⚠ | x86-64 and avxSets each bit of the returned mask based on the most significant bit of the
corresponding packed single-precision (32-bit) floating-point element in
|
| _mm256_setzero_pd⚠ | x86-64 and avxReturns vector of type __m256d with all elements set to zero. |
| _mm256_setzero_ps⚠ | x86-64 and avxReturns vector of type __m256 with all elements set to zero. |
| _mm256_set_pd⚠ | x86-64 and avxSets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values. |
| _mm256_set_ps⚠ | x86-64 and avxSets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values. |
| _mm256_setr_pd⚠ | x86-64 and avxSets packed double-precision (64-bit) floating-point elements in returned vector with the supplied values in reverse order. |
| _mm256_setr_ps⚠ | x86-64 and avxSets packed single-precision (32-bit) floating-point elements in returned vector with the supplied values in reverse order. |
| _mm256_castpd_ps⚠ | x86-64 and avxCast vector of type __m256d to type __m256. |
| _mm256_castps_pd⚠ | x86-64 and avxCast vector of type __m256 to type __m256d. |
| _mm256_undefined_ps⚠ | x86-64 and avxReturns vector of type |
| _mm256_undefined_pd⚠ | x86-64 and avxReturns vector of type |
| _mm256_broadcastsd_pd⚠ | x86-64 and avx2Broadcasts the low double-precision (64-bit) floating-point element
from |
| _mm256_broadcastss_ps⚠ | x86-64 and avx2Broadcasts the low single-precision (32-bit) floating-point element
from |
| _mm256_fmadd_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm256_fmadd_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_fmaddsub_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm256_fmaddsub_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_fmsub_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm256_fmsub_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_fmsubadd_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm256_fmsubadd_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_fnmadd_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm256_fnmadd_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_fnmsub_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm256_fnmsub_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm256_abs_epi8⚠ | x86-64 and avx2Computes the absolute values of packed 8-bit integers in |
| _mm256_abs_epi16⚠ | x86-64 and avx2Computes the absolute values of packed 16-bit integers in |
| _mm256_abs_epi32⚠ | x86-64 and avx2Computes the absolute values of packed 32-bit integers in |
| _mm256_add_epi8⚠ | x86-64 and avx2Adds packed 8-bit integers in |
| _mm256_add_epi16⚠ | x86-64 and avx2Adds packed 16-bit integers in |
| _mm256_add_epi32⚠ | x86-64 and avx2Adds packed 32-bit integers in |
| _mm256_add_epi64⚠ | x86-64 and avx2Adds packed 64-bit integers in |
| _mm256_adds_epi8⚠ | x86-64 and avx2Adds packed 8-bit integers in |
| _mm256_adds_epi16⚠ | x86-64 and avx2Adds packed 16-bit integers in |
| _mm256_adds_epu8⚠ | x86-64 and avx2Adds packed unsigned 8-bit integers in |
| _mm256_adds_epu16⚠ | x86-64 and avx2Adds packed unsigned 16-bit integers in |
| _mm256_alignr_epi8⚠ | x86-64 and avx2Concatenates pairs of 16-byte blocks in |
| _mm256_and_si256⚠ | x86-64 and avx2Computes the bitwise AND of 256 bits (representing integer data)
in |
| _mm256_andnot_si256⚠ | x86-64 and avx2Computes the bitwise NOT of 256 bits (representing integer data)
in |
| _mm256_avg_epu8⚠ | x86-64 and avx2Averages packed unsigned 8-bit integers in |
| _mm256_avg_epu16⚠ | x86-64 and avx2Averages packed unsigned 16-bit integers in |
| _mm256_blend_epi16⚠ | x86-64 and avx2Blends packed 16-bit integers from |
| _mm256_blend_epi32⚠ | x86-64 and avx2Blends packed 32-bit integers from |
| _mm256_blendv_epi8⚠ | x86-64 and avx2Blends packed 8-bit integers from |
| _mm256_broadcastb_epi8⚠ | x86-64 and avx2Broadcasts the low packed 8-bit integer from |
| _mm256_broadcastd_epi32⚠ | x86-64 and avx2Broadcasts the low packed 32-bit integer from |
| _mm256_broadcastq_epi64⚠ | x86-64 and avx2Broadcasts the low packed 64-bit integer from |
| _mm256_broadcastsi128_si256⚠ | x86-64 and avx2Broadcasts 128 bits of integer data from a to all 128-bit lanes in the 256-bit returned value. |
| _mm256_broadcastw_epi16⚠ | x86-64 and avx2Broadcasts the low packed 16-bit integer from a to all elements of the 256-bit returned value |
| _mm256_bslli_epi128⚠ | x86-64 and avx2Shifts 128-bit lanes in |
| _mm256_bsrli_epi128⚠ | x86-64 and avx2Shifts 128-bit lanes in |
| _mm256_castpd128_pd256⚠ | x86-64 and avxCasts vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. |
| _mm256_castpd256_pd128⚠ | x86-64 and avxCasts vector of type __m256d to type __m128d. |
| _mm256_castpd_si256⚠ | x86-64 and avxCasts vector of type __m256d to type __m256i. |
| _mm256_castps128_ps256⚠ | x86-64 and avxCasts vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. |
| _mm256_castps256_ps128⚠ | x86-64 and avxCasts vector of type __m256 to type __m128. |
| _mm256_castps_si256⚠ | x86-64 and avxCasts vector of type __m256 to type __m256i. |
| _mm256_castsi256_ps⚠ | x86-64 and avxCasts vector of type __m256i to type __m256. |
| _mm256_castsi256_pd⚠ | x86-64 and avxCasts vector of type __m256i to type __m256d. |
| _mm256_castsi128_si256⚠ | x86-64 and avxCasts vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. |
| _mm256_castsi256_si128⚠ | x86-64 and avxCasts vector of type __m256i to type __m128i. |
| _mm256_cmpeq_epi8⚠ | x86-64 and avx2Compares packed 8-bit integers in |
| _mm256_cmpeq_epi16⚠ | x86-64 and avx2Compares packed 16-bit integers in |
| _mm256_cmpeq_epi32⚠ | x86-64 and avx2Compares packed 32-bit integers in |
| _mm256_cmpeq_epi64⚠ | x86-64 and avx2Compares packed 64-bit integers in |
| _mm256_cmpgt_epi8⚠ | x86-64 and avx2Compares packed 8-bit integers in |
| _mm256_cmpgt_epi16⚠ | x86-64 and avx2Compares packed 16-bit integers in |
| _mm256_cmpgt_epi32⚠ | x86-64 and avx2Compares packed 32-bit integers in |
| _mm256_cmpgt_epi64⚠ | x86-64 and avx2Compares packed 64-bit integers in |
| _mm256_cvtepi32_pd⚠ | x86-64 and avxConverts packed 32-bit integers in |
| _mm256_cvtepi32_ps⚠ | x86-64 and avxConverts packed 32-bit integers in |
| _mm256_cvtepi16_epi32⚠ | x86-64 and avx2Sign-extend 16-bit integers to 32-bit integers. |
| _mm256_cvtepi16_epi64⚠ | x86-64 and avx2Sign-extend 16-bit integers to 64-bit integers. |
| _mm256_cvtepi32_epi64⚠ | x86-64 and avx2Sign-extend 32-bit integers to 64-bit integers. |
| _mm256_cvtepi8_epi16⚠ | x86-64 and avx2Sign-extend 8-bit integers to 16-bit integers. |
| _mm256_cvtepi8_epi32⚠ | x86-64 and avx2Sign-extend 8-bit integers to 32-bit integers. |
| _mm256_cvtepi8_epi64⚠ | x86-64 and avx2Sign-extend 8-bit integers to 64-bit integers. |
| _mm256_cvtepu16_epi32⚠ | x86-64 and avx2Zeroes extend packed unsigned 16-bit integers in |
| _mm256_cvtepu16_epi64⚠ | x86-64 and avx2Zero-extend the lower four unsigned 16-bit integers in |
| _mm256_cvtepu32_epi64⚠ | x86-64 and avx2Zero-extend unsigned 32-bit integers in |
| _mm256_cvtepu8_epi16⚠ | x86-64 and avx2Zero-extend unsigned 8-bit integers in |
| _mm256_cvtepu8_epi32⚠ | x86-64 and avx2Zero-extend the lower eight unsigned 8-bit integers in |
| _mm256_cvtepu8_epi64⚠ | x86-64 and avx2Zero-extend the lower four unsigned 8-bit integers in |
| _mm256_cvtpd_epi32⚠ | x86-64 and avxConverts packed double-precision (64-bit) floating-point elements in |
| _mm256_cvtps_epi32⚠ | x86-64 and avxConverts packed single-precision (32-bit) floating-point elements in |
| _mm256_cvtsd_f64⚠ | x86-64 and avx2Returns the first element of the input vector of |
| _mm256_cvtsi256_si32⚠ | x86-64 and avx2Returns the first element of the input vector of |
| _mm256_cvtss_f32⚠ | x86-64 and avxReturns the first element of the input vector of |
| _mm256_cvttpd_epi32⚠ | x86-64 and avxConverts packed double-precision (64-bit) floating-point elements in |
| _mm256_cvttps_epi32⚠ | x86-64 and avxConverts packed single-precision (32-bit) floating-point elements in |
| _mm256_extract_epi8⚠ | x86-64 and avx2Extracts an 8-bit integer from |
| _mm256_extract_epi16⚠ | x86-64 and avx2Extracts a 16-bit integer from |
| _mm256_extract_epi32⚠ | x86-64 and avx2Extracts a 32-bit integer from |
| _mm256_extract_epi64⚠ | x86-64 and avx2Extracts a 64-bit integer from |
| _mm256_extractf128_ps⚠ | x86-64 and avxExtracts 128 bits (composed of 4 packed single-precision (32-bit)
floating-point elements) from |
| _mm256_extractf128_pd⚠ | x86-64 and avxExtracts 128 bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
| _mm256_extractf128_si256⚠ | x86-64 and avxExtracts 128 bits (composed of integer data) from |
| _mm256_extracti128_si256⚠ | x86-64 and avx2Extracts 128 bits (of integer data) from |
| _mm256_hadd_epi16⚠ | x86-64 and avx2Horizontally adds adjacent pairs of 16-bit integers in |
| _mm256_hadd_epi32⚠ | x86-64 and avx2Horizontally adds adjacent pairs of 32-bit integers in |
| _mm256_hadds_epi16⚠ | x86-64 and avx2Horizontally adds adjacent pairs of 16-bit integers in |
| _mm256_hsub_epi16⚠ | x86-64 and avx2Horizontally subtract adjacent pairs of 16-bit integers in |
| _mm256_hsub_epi32⚠ | x86-64 and avx2Horizontally subtract adjacent pairs of 32-bit integers in |
| _mm256_hsubs_epi16⚠ | x86-64 and avx2Horizontally subtract adjacent pairs of 16-bit integers in |
| _mm256_i32gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm256_i32gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm256_i64gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm256_i64gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm256_i32gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm256_i32gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm256_i64gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm256_i64gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm256_insert_epi8⚠ | x86-64 and avxCopies |
| _mm256_insert_epi16⚠ | x86-64 and avxCopies |
| _mm256_insert_epi32⚠ | x86-64 and avxCopies |
| _mm256_insert_epi64⚠ | x86-64 and avxCopies |
| _mm256_insertf128_ps⚠ | x86-64 and avxCopies |
| _mm256_insertf128_pd⚠ | x86-64 and avxCopies |
| _mm256_insertf128_si256⚠ | x86-64 and avxCopies |
| _mm256_inserti128_si256⚠ | x86-64 and avx2Copies |
| _mm256_lddqu_si256⚠ | x86-64 and avxLoads 256-bits of integer data from unaligned memory into result.
This intrinsic may perform better than |
| _mm256_load_si256⚠ | x86-64 and avxLoads 256-bits of integer data from memory into result.
|
| _mm256_loadu2_m128⚠ | x86-64 and avx,sseLoads two 128-bit values (composed of 4 packed single-precision (32-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
|
| _mm256_loadu2_m128d⚠ | x86-64 and avx,sse2Loads two 128-bit values (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory, and combine them into a 256-bit
value.
|
| _mm256_loadu2_m128i⚠ | x86-64 and avx,sse2Loads two 128-bit values (composed of integer data) from memory, and combine
them into a 256-bit value.
|
| _mm256_loadu_si256⚠ | x86-64 and avxLoads 256-bits of integer data from memory into result.
|
| _mm256_madd_epi16⚠ | x86-64 and avx2Multiplies packed signed 16-bit integers in |
| _mm256_maddubs_epi16⚠ | x86-64 and avx2Vertically multiplies each unsigned 8-bit integer from |
| _mm256_mask_i32gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i32gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i64gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i64gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i32gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i32gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i64gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm256_mask_i64gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm256_maskload_epi32⚠ | x86-64 and avx2Loads packed 32-bit integers from memory pointed by |
| _mm256_maskload_epi64⚠ | x86-64 and avx2Loads packed 64-bit integers from memory pointed by |
| _mm256_maskstore_epi32⚠ | x86-64 and avx2Stores packed 32-bit integers from |
| _mm256_maskstore_epi64⚠ | x86-64 and avx2Stores packed 64-bit integers from |
| _mm256_max_epi8⚠ | x86-64 and avx2Compares packed 8-bit integers in |
| _mm256_max_epi16⚠ | x86-64 and avx2Compares packed 16-bit integers in |
| _mm256_max_epi32⚠ | x86-64 and avx2Compares packed 32-bit integers in |
| _mm256_max_epu8⚠ | x86-64 and avx2Compares packed unsigned 8-bit integers in |
| _mm256_max_epu16⚠ | x86-64 and avx2Compares packed unsigned 16-bit integers in |
| _mm256_max_epu32⚠ | x86-64 and avx2Compares packed unsigned 32-bit integers in |
| _mm256_min_epi8⚠ | x86-64 and avx2Compares packed 8-bit integers in |
| _mm256_min_epi16⚠ | x86-64 and avx2Compares packed 16-bit integers in |
| _mm256_min_epi32⚠ | x86-64 and avx2Compares packed 32-bit integers in |
| _mm256_min_epu8⚠ | x86-64 and avx2Compares packed unsigned 8-bit integers in |
| _mm256_min_epu16⚠ | x86-64 and avx2Compares packed unsigned 16-bit integers in |
| _mm256_min_epu32⚠ | x86-64 and avx2Compares packed unsigned 32-bit integers in |
| _mm256_movemask_epi8⚠ | x86-64 and avx2Creates mask from the most significant bit of each 8-bit element in |
| _mm256_mpsadbw_epu8⚠ | x86-64 and avx2Computes the sum of absolute differences (SADs) of quadruplets of unsigned
8-bit integers in |
| _mm256_mul_epi32⚠ | x86-64 and avx2Multiplies the low 32-bit integers from each packed 64-bit element in
|
| _mm256_mul_epu32⚠ | x86-64 and avx2Multiplies the low unsigned 32-bit integers from each packed 64-bit
element in |
| _mm256_mulhi_epi16⚠ | x86-64 and avx2Multiplies the packed 16-bit integers in |
| _mm256_mulhi_epu16⚠ | x86-64 and avx2Multiplies the packed unsigned 16-bit integers in |
| _mm256_mulhrs_epi16⚠ | x86-64 and avx2Multiplies packed 16-bit integers in |
| _mm256_mullo_epi16⚠ | x86-64 and avx2Multiplies the packed 16-bit integers in |
| _mm256_mullo_epi32⚠ | x86-64 and avx2Multiplies the packed 32-bit integers in |
| _mm256_or_si256⚠ | x86-64 and avx2Computes the bitwise OR of 256 bits (representing integer data) in |
| _mm256_packs_epi16⚠ | x86-64 and avx2Converts packed 16-bit integers from |
| _mm256_packs_epi32⚠ | x86-64 and avx2Converts packed 32-bit integers from |
| _mm256_packus_epi16⚠ | x86-64 and avx2Converts packed 16-bit integers from |
| _mm256_packus_epi32⚠ | x86-64 and avx2Converts packed 32-bit integers from |
| _mm256_permute2f128_ps⚠ | x86-64 and avxShuffles 256 bits (composed of 8 packed single-precision (32-bit)
floating-point elements) selected by |
| _mm256_permute2f128_pd⚠ | x86-64 and avxShuffles 256 bits (composed of 4 packed double-precision (64-bit)
floating-point elements) selected by |
| _mm256_permute2f128_si256⚠ | x86-64 and avxShuffles 258-bits (composed of integer data) selected by |
| _mm256_permute2x128_si256⚠ | x86-64 and avx2Shuffles 128-bits of integer data selected by |
| _mm256_permute4x64_pd⚠ | x86-64 and avx2Shuffles 64-bit floating-point elements in |
| _mm256_permute4x64_epi64⚠ | x86-64 and avx2Permutes 64-bit integers from |
| _mm256_permutevar8x32_ps⚠ | x86-64 and avx2Shuffles eight 32-bit foating-point elements in |
| _mm256_permutevar8x32_epi32⚠ | x86-64 and avx2Permutes packed 32-bit integers from |
| _mm256_sad_epu8⚠ | x86-64 and avx2Computes the absolute differences of packed unsigned 8-bit integers in |
| _mm256_set1_pd⚠ | x86-64 and avxBroadcasts double-precision (64-bit) floating-point value |
| _mm256_set1_ps⚠ | x86-64 and avxBroadcasts single-precision (32-bit) floating-point value |
| _mm256_set1_epi8⚠ | x86-64 and avxBroadcasts 8-bit integer |
| _mm256_set1_epi16⚠ | x86-64 and avxBroadcasts 16-bit integer |
| _mm256_set1_epi32⚠ | x86-64 and avxBroadcasts 32-bit integer |
| _mm256_set1_epi64x⚠ | x86-64 and avxBroadcasts 64-bit integer |
| _mm256_set_epi8⚠ | x86-64 and avxSets packed 8-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_set_epi16⚠ | x86-64 and avxSets packed 16-bit integers in returned vector with the supplied values. |
| _mm256_set_epi32⚠ | x86-64 and avxSets packed 32-bit integers in returned vector with the supplied values. |
| _mm256_set_epi64x⚠ | x86-64 and avxSets packed 64-bit integers in returned vector with the supplied values. |
| _mm256_set_m128⚠ | x86-64 and avxSets packed __m256 returned vector with the supplied values. |
| _mm256_set_m128d⚠ | x86-64 and avxSets packed __m256d returned vector with the supplied values. |
| _mm256_set_m128i⚠ | x86-64 and avxSets packed __m256i returned vector with the supplied values. |
| _mm256_setr_epi8⚠ | x86-64 and avxSets packed 8-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_epi16⚠ | x86-64 and avxSets packed 16-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_epi32⚠ | x86-64 and avxSets packed 32-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_epi64x⚠ | x86-64 and avxSets packed 64-bit integers in returned vector with the supplied values in reverse order. |
| _mm256_setr_m128⚠ | x86-64 and avxSets packed __m256 returned vector with the supplied values. |
| _mm256_setr_m128d⚠ | x86-64 and avxSets packed __m256d returned vector with the supplied values. |
| _mm256_setr_m128i⚠ | x86-64 and avxSets packed __m256i returned vector with the supplied values. |
| _mm256_setzero_si256⚠ | x86-64 and avxReturns vector of type __m256i with all elements set to zero. |
| _mm256_shuffle_epi8⚠ | x86-64 and avx2Shuffles bytes from |
| _mm256_shuffle_epi32⚠ | x86-64 and avx2Shuffles 32-bit integers in 128-bit lanes of |
| _mm256_shufflehi_epi16⚠ | x86-64 and avx2Shuffles 16-bit integers in the high 64 bits of 128-bit lanes of |
| _mm256_shufflelo_epi16⚠ | x86-64 and avx2Shuffles 16-bit integers in the low 64 bits of 128-bit lanes of |
| _mm256_sign_epi8⚠ | x86-64 and avx2Negates packed 8-bit integers in |
| _mm256_sign_epi16⚠ | x86-64 and avx2Negates packed 16-bit integers in |
| _mm256_sign_epi32⚠ | x86-64 and avx2Negates packed 32-bit integers in |
| _mm256_sll_epi16⚠ | x86-64 and avx2Shifts packed 16-bit integers in |
| _mm256_sll_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_sll_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm256_slli_epi16⚠ | x86-64 and avx2Shifts packed 16-bit integers in |
| _mm256_slli_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_slli_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm256_slli_si256⚠ | x86-64 and avx2Shifts 128-bit lanes in |
| _mm256_sllv_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_sllv_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm256_sra_epi16⚠ | x86-64 and avx2Shifts packed 16-bit integers in |
| _mm256_sra_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_srai_epi16⚠ | x86-64 and avx2Shifts packed 16-bit integers in |
| _mm256_srai_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_srav_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_srl_epi16⚠ | x86-64 and avx2Shifts packed 16-bit integers in |
| _mm256_srl_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_srl_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm256_srli_epi16⚠ | x86-64 and avx2Shifts packed 16-bit integers in |
| _mm256_srli_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_srli_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm256_srli_si256⚠ | x86-64 and avx2Shifts 128-bit lanes in |
| _mm256_srlv_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm256_srlv_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm256_store_si256⚠ | x86-64 and avxStores 256-bits of integer data from |
| _mm256_storeu2_m128⚠ | x86-64 and avx,sseStores the high and low 128-bit halves (each composed of 4 packed
single-precision (32-bit) floating-point elements) from |
| _mm256_storeu2_m128d⚠ | x86-64 and avx,sse2Stores the high and low 128-bit halves (each composed of 2 packed
double-precision (64-bit) floating-point elements) from |
| _mm256_storeu2_m128i⚠ | x86-64 and avx,sse2Stores the high and low 128-bit halves (each composed of integer data) from
|
| _mm256_storeu_si256⚠ | x86-64 and avxStores 256-bits of integer data from |
| _mm256_stream_si256⚠ | x86-64 and avxMoves integer data from a 256-bit integer vector to a 32-byte aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon) |
| _mm256_sub_epi8⚠ | x86-64 and avx2Subtract packed 8-bit integers in |
| _mm256_sub_epi16⚠ | x86-64 and avx2Subtract packed 16-bit integers in |
| _mm256_sub_epi32⚠ | x86-64 and avx2Subtract packed 32-bit integers in |
| _mm256_sub_epi64⚠ | x86-64 and avx2Subtract packed 64-bit integers in |
| _mm256_subs_epi8⚠ | x86-64 and avx2Subtract packed 8-bit integers in |
| _mm256_subs_epi16⚠ | x86-64 and avx2Subtract packed 16-bit integers in |
| _mm256_subs_epu8⚠ | x86-64 and avx2Subtract packed unsigned 8-bit integers in |
| _mm256_subs_epu16⚠ | x86-64 and avx2Subtract packed unsigned 16-bit integers in |
| _mm256_testc_si256⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing integer data) in |
| _mm256_testnzc_si256⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing integer data) in |
| _mm256_testz_si256⚠ | x86-64 and avxComputes the bitwise AND of 256 bits (representing integer data) in |
| _mm256_undefined_si256⚠ | x86-64 and avxReturns vector of type __m256i with undefined elements. |
| _mm256_unpackhi_epi8⚠ | x86-64 and avx2Unpacks and interleave 8-bit integers from the high half of each
128-bit lane in |
| _mm256_unpackhi_epi16⚠ | x86-64 and avx2Unpacks and interleave 16-bit integers from the high half of each
128-bit lane of |
| _mm256_unpackhi_epi32⚠ | x86-64 and avx2Unpacks and interleave 32-bit integers from the high half of each
128-bit lane of |
| _mm256_unpackhi_epi64⚠ | x86-64 and avx2Unpacks and interleave 64-bit integers from the high half of each
128-bit lane of |
| _mm256_unpacklo_epi8⚠ | x86-64 and avx2Unpacks and interleave 8-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_epi16⚠ | x86-64 and avx2Unpacks and interleave 16-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_epi32⚠ | x86-64 and avx2Unpacks and interleave 32-bit integers from the low half of each
128-bit lane of |
| _mm256_unpacklo_epi64⚠ | x86-64 and avx2Unpacks and interleave 64-bit integers from the low half of each
128-bit lane of |
| _mm256_xor_si256⚠ | x86-64 and avx2Computes the bitwise XOR of 256 bits (representing integer data)
in |
| _mm256_zextpd128_pd256⚠ | x86-64 and avx,sse2Constructs a 256-bit floating-point vector of |
| _mm256_zextps128_ps256⚠ | x86-64 and avx,sseConstructs a 256-bit floating-point vector of |
| _mm256_zextsi128_si256⚠ | x86-64 and avx,sse2Constructs a 256-bit integer vector from a 128-bit integer vector. The lower 128 bits contain the value of the source vector. The upper 128 bits are set to zero. |
| _mm_abs_epi8⚠ | x86-64 and ssse3Computes the absolute value of packed 8-bit signed integers in |
| _mm_abs_epi16⚠ | x86-64 and ssse3Computes the absolute value of each of the packed 16-bit signed integers in
|
| _mm_abs_epi32⚠ | x86-64 and ssse3Computes the absolute value of each of the packed 32-bit signed integers in
|
| _mm_add_epi8⚠ | x86-64 and sse2Adds packed 8-bit integers in |
| _mm_add_epi16⚠ | x86-64 and sse2Adds packed 16-bit integers in |
| _mm_add_epi32⚠ | x86-64 and sse2Adds packed 32-bit integers in |
| _mm_add_epi64⚠ | x86-64 and sse2Adds packed 64-bit integers in |
| _mm_add_pd⚠ | x86-64 and sse2Adds packed double-precision (64-bit) floating-point elements in |
| _mm_add_ps⚠ | x86-64 and sseAdds __m128 vectors. |
| _mm_add_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_add_ss⚠ | x86-64 and sseAdds the first component of |
| _mm_adds_epi8⚠ | x86-64 and sse2Adds packed 8-bit integers in |
| _mm_adds_epi16⚠ | x86-64 and sse2Adds packed 16-bit integers in |
| _mm_adds_epu8⚠ | x86-64 and sse2Adds packed unsigned 8-bit integers in |
| _mm_adds_epu16⚠ | x86-64 and sse2Adds packed unsigned 16-bit integers in |
| _mm_addsub_pd⚠ | x86-64 and sse3Alternatively add and subtract packed double-precision (64-bit)
floating-point elements in |
| _mm_addsub_ps⚠ | x86-64 and sse3Alternatively add and subtract packed single-precision (32-bit)
floating-point elements in |
| _mm_aesdec_si128⚠ | x86-64 and aesPerforms one round of an AES decryption flow on data (state) in |
| _mm_aesdeclast_si128⚠ | x86-64 and aesPerforms the last round of an AES decryption flow on data (state) in |
| _mm_aesenc_si128⚠ | x86-64 and aesPerforms one round of an AES encryption flow on data (state) in |
| _mm_aesenclast_si128⚠ | x86-64 and aesPerforms the last round of an AES encryption flow on data (state) in |
| _mm_aesimc_si128⚠ | x86-64 and aesPerforms the |
| _mm_aeskeygenassist_si128⚠ | x86-64 and aesAssist in expanding the AES cipher key. |
| _mm_alignr_epi8⚠ | x86-64 and ssse3Concatenate 16-byte blocks in |
| _mm_and_pd⚠ | x86-64 and sse2Computes the bitwise AND of packed double-precision (64-bit) floating-point
elements in |
| _mm_and_ps⚠ | x86-64 and sseBitwise AND of packed single-precision (32-bit) floating-point elements. |
| _mm_and_si128⚠ | x86-64 and sse2Computes the bitwise AND of 128 bits (representing integer data) in |
| _mm_andnot_pd⚠ | x86-64 and sse2Computes the bitwise NOT of |
| _mm_andnot_ps⚠ | x86-64 and sseBitwise AND-NOT of packed single-precision (32-bit) floating-point elements. |
| _mm_andnot_si128⚠ | x86-64 and sse2Computes the bitwise NOT of 128 bits (representing integer data) in |
| _mm_avg_epu8⚠ | x86-64 and sse2Averages packed unsigned 8-bit integers in |
| _mm_avg_epu16⚠ | x86-64 and sse2Averages packed unsigned 16-bit integers in |
| _mm_blend_epi16⚠ | x86-64 and sse4.1Blend packed 16-bit integers from |
| _mm_blend_epi32⚠ | x86-64 and avx2Blends packed 32-bit integers from |
| _mm_blend_pd⚠ | x86-64 and sse4.1Blend packed double-precision (64-bit) floating-point elements from |
| _mm_blend_ps⚠ | x86-64 and sse4.1Blend packed single-precision (32-bit) floating-point elements from |
| _mm_blendv_epi8⚠ | x86-64 and sse4.1Blend packed 8-bit integers from |
| _mm_blendv_pd⚠ | x86-64 and sse4.1Blend packed double-precision (64-bit) floating-point elements from |
| _mm_blendv_ps⚠ | x86-64 and sse4.1Blend packed single-precision (32-bit) floating-point elements from |
| _mm_broadcast_ss⚠ | x86-64 and avxBroadcasts a single-precision (32-bit) floating-point element from memory to all elements of the returned vector. |
| _mm_broadcastb_epi8⚠ | x86-64 and avx2Broadcasts the low packed 8-bit integer from |
| _mm_broadcastd_epi32⚠ | x86-64 and avx2Broadcasts the low packed 32-bit integer from |
| _mm_broadcastq_epi64⚠ | x86-64 and avx2Broadcasts the low packed 64-bit integer from |
| _mm_broadcastsd_pd⚠ | x86-64 and avx2Broadcasts the low double-precision (64-bit) floating-point element
from |
| _mm_broadcastss_ps⚠ | x86-64 and avx2Broadcasts the low single-precision (32-bit) floating-point element
from |
| _mm_broadcastw_epi16⚠ | x86-64 and avx2Broadcasts the low packed 16-bit integer from a to all elements of the 128-bit returned value |
| _mm_bslli_si128⚠ | x86-64 and sse2Shifts |
| _mm_bsrli_si128⚠ | x86-64 and sse2Shifts |
| _mm_castpd_ps⚠ | x86-64 and sse2Casts a 128-bit floating-point vector of |
| _mm_castpd_si128⚠ | x86-64 and sse2Casts a 128-bit floating-point vector of |
| _mm_castps_pd⚠ | x86-64 and sse2Casts a 128-bit floating-point vector of |
| _mm_castps_si128⚠ | x86-64 and sse2Casts a 128-bit floating-point vector of |
| _mm_castsi128_pd⚠ | x86-64 and sse2Casts a 128-bit integer vector into a 128-bit floating-point vector
of |
| _mm_castsi128_ps⚠ | x86-64 and sse2Casts a 128-bit integer vector into a 128-bit floating-point vector
of |
| _mm_ceil_pd⚠ | x86-64 and sse4.1Round the packed double-precision (64-bit) floating-point elements in |
| _mm_ceil_ps⚠ | x86-64 and sse4.1Round the packed single-precision (32-bit) floating-point elements in |
| _mm_ceil_sd⚠ | x86-64 and sse4.1Round the lower double-precision (64-bit) floating-point element in |
| _mm_ceil_ss⚠ | x86-64 and sse4.1Round the lower single-precision (32-bit) floating-point element in |
| _mm_clflush⚠ | x86-64 and sse2Invalidates and flushes the cache line that contains |
| _mm_clmulepi64_si128⚠ | x86-64 and pclmulqdqPerforms a carry-less multiplication of two 64-bit polynomials over the finite field GF(2^k). |
| _mm_cmp_pd⚠ | x86-64 and avx,sse2Compares packed double-precision (64-bit) floating-point
elements in |
| _mm_cmp_ps⚠ | x86-64 and avx,sseCompares packed single-precision (32-bit) floating-point
elements in |
| _mm_cmp_sd⚠ | x86-64 and avx,sse2Compares the lower double-precision (64-bit) floating-point element in
|
| _mm_cmp_ss⚠ | x86-64 and avx,sseCompares the lower single-precision (32-bit) floating-point element in
|
| _mm_cmpeq_epi8⚠ | x86-64 and sse2Compares packed 8-bit integers in |
| _mm_cmpeq_epi16⚠ | x86-64 and sse2Compares packed 16-bit integers in |
| _mm_cmpeq_epi32⚠ | x86-64 and sse2Compares packed 32-bit integers in |
| _mm_cmpeq_epi64⚠ | x86-64 and sse4.1Compares packed 64-bit integers in |
| _mm_cmpeq_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpeq_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpeq_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpeq_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpestra⚠ | x86-64 and sse4.2Compares packed strings in |
| _mm_cmpestrc⚠ | x86-64 and sse4.2Compares packed strings in |
| _mm_cmpestri⚠ | x86-64 and sse4.2Compares packed strings |
| _mm_cmpestrm⚠ | x86-64 and sse4.2Compares packed strings in |
| _mm_cmpestro⚠ | x86-64 and sse4.2Compares packed strings in |
| _mm_cmpestrs⚠ | x86-64 and sse4.2Compares packed strings in |
| _mm_cmpestrz⚠ | x86-64 and sse4.2Compares packed strings in |
| _mm_cmpge_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpge_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpge_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpge_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpgt_epi8⚠ | x86-64 and sse2Compares packed 8-bit integers in |
| _mm_cmpgt_epi16⚠ | x86-64 and sse2Compares packed 16-bit integers in |
| _mm_cmpgt_epi32⚠ | x86-64 and sse2Compares packed 32-bit integers in |
| _mm_cmpgt_epi64⚠ | x86-64 and sse4.2Compares packed 64-bit integers in |
| _mm_cmpgt_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpgt_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpgt_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpgt_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpistra⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmpistrc⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmpistri⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmpistrm⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmpistro⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmpistrs⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmpistrz⚠ | x86-64 and sse4.2Compares packed strings with implicit lengths in |
| _mm_cmple_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmple_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmple_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmple_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmplt_epi8⚠ | x86-64 and sse2Compares packed 8-bit integers in |
| _mm_cmplt_epi16⚠ | x86-64 and sse2Compares packed 16-bit integers in |
| _mm_cmplt_epi32⚠ | x86-64 and sse2Compares packed 32-bit integers in |
| _mm_cmplt_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmplt_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmplt_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmplt_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpneq_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpneq_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpneq_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpneq_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpnge_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpnge_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpnge_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpnge_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpngt_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpngt_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpngt_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpngt_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpnle_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpnle_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpnle_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpnle_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpnlt_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpnlt_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpnlt_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpnlt_ss⚠ | x86-64 and sseCompares the lowest |
| _mm_cmpord_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpord_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpord_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpord_ss⚠ | x86-64 and sseChecks if the lowest |
| _mm_cmpunord_pd⚠ | x86-64 and sse2Compares corresponding elements in |
| _mm_cmpunord_ps⚠ | x86-64 and sseCompares each of the four floats in |
| _mm_cmpunord_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_cmpunord_ss⚠ | x86-64 and sseChecks if the lowest |
| _mm_comieq_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_comieq_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_comige_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_comige_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_comigt_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_comigt_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_comile_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_comile_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_comilt_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_comilt_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_comineq_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_comineq_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_crc32_u8⚠ | x86-64 and sse4.2Starting with the initial value in |
| _mm_crc32_u16⚠ | x86-64 and sse4.2Starting with the initial value in |
| _mm_crc32_u32⚠ | x86-64 and sse4.2Starting with the initial value in |
| _mm_crc32_u64⚠ | x86-64 and sse4.2Starting with the initial value in |
| _mm_cvt_si2ss⚠ | x86-64 and sseAlias for |
| _mm_cvt_ss2si⚠ | x86-64 and sseAlias for |
| _mm_cvtepi32_pd⚠ | x86-64 and sse2Converts the lower two packed 32-bit integers in |
| _mm_cvtepi32_ps⚠ | x86-64 and sse2Converts packed 32-bit integers in |
| _mm_cvtepi16_epi32⚠ | x86-64 and sse4.1Sign extend packed 16-bit integers in |
| _mm_cvtepi16_epi64⚠ | x86-64 and sse4.1Sign extend packed 16-bit integers in |
| _mm_cvtepi32_epi64⚠ | x86-64 and sse4.1Sign extend packed 32-bit integers in |
| _mm_cvtepi8_epi16⚠ | x86-64 and sse4.1Sign extend packed 8-bit integers in |
| _mm_cvtepi8_epi32⚠ | x86-64 and sse4.1Sign extend packed 8-bit integers in |
| _mm_cvtepi8_epi64⚠ | x86-64 and sse4.1Sign extend packed 8-bit integers in the low 8 bytes of |
| _mm_cvtepu16_epi32⚠ | x86-64 and sse4.1Zeroes extend packed unsigned 16-bit integers in |
| _mm_cvtepu16_epi64⚠ | x86-64 and sse4.1Zeroes extend packed unsigned 16-bit integers in |
| _mm_cvtepu32_epi64⚠ | x86-64 and sse4.1Zeroes extend packed unsigned 32-bit integers in |
| _mm_cvtepu8_epi16⚠ | x86-64 and sse4.1Zeroes extend packed unsigned 8-bit integers in |
| _mm_cvtepu8_epi32⚠ | x86-64 and sse4.1Zeroes extend packed unsigned 8-bit integers in |
| _mm_cvtepu8_epi64⚠ | x86-64 and sse4.1Zeroes extend packed unsigned 8-bit integers in |
| _mm_cvtpd_epi32⚠ | x86-64 and sse2Converts packed double-precision (64-bit) floating-point elements in |
| _mm_cvtpd_ps⚠ | x86-64 and sse2Converts packed double-precision (64-bit) floating-point elements in |
| _mm_cvtps_epi32⚠ | x86-64 and sse2Converts packed single-precision (32-bit) floating-point elements in |
| _mm_cvtps_pd⚠ | x86-64 and sse2Converts packed single-precision (32-bit) floating-point elements in |
| _mm_cvtsd_f64⚠ | x86-64 and sse2Returns the lower double-precision (64-bit) floating-point element of |
| _mm_cvtsd_si32⚠ | x86-64 and sse2Converts the lower double-precision (64-bit) floating-point element in a to a 32-bit integer. |
| _mm_cvtsd_si64⚠ | x86-64 and sse2Converts the lower double-precision (64-bit) floating-point element in a to a 64-bit integer. |
| _mm_cvtsd_si64x⚠ | x86-64 and sse2Alias for |
| _mm_cvtsd_ss⚠ | x86-64 and sse2Converts the lower double-precision (64-bit) floating-point element in |
| _mm_cvtsi32_ss⚠ | x86-64 and sseConverts a 32 bit integer to a 32 bit float. The result vector is the input
vector |
| _mm_cvtsi32_sd⚠ | x86-64 and sse2Returns |
| _mm_cvtsi64_ss⚠ | x86-64 and sseConverts a 64 bit integer to a 32 bit float. The result vector is the input
vector |
| _mm_cvtsi64_sd⚠ | x86-64 and sse2Returns |
| _mm_cvtsi64x_sd⚠ | x86-64 and sse2Returns |
| _mm_cvtsi128_si32⚠ | x86-64 and sse2Returns the lowest element of |
| _mm_cvtsi128_si64⚠ | x86-64 and sse2Returns the lowest element of |
| _mm_cvtsi128_si64x⚠ | x86-64 and sse2Returns the lowest element of |
| _mm_cvtsi32_si128⚠ | x86-64 and sse2Returns a vector whose lowest element is |
| _mm_cvtsi64_si128⚠ | x86-64 and sse2Returns a vector whose lowest element is |
| _mm_cvtsi64x_si128⚠ | x86-64 and sse2Returns a vector whose lowest element is |
| _mm_cvtss_f32⚠ | x86-64 and sseExtracts the lowest 32 bit float from the input vector. |
| _mm_cvtss_sd⚠ | x86-64 and sse2Converts the lower single-precision (32-bit) floating-point element in |
| _mm_cvtss_si32⚠ | x86-64 and sseConverts the lowest 32 bit float in the input vector to a 32 bit integer. |
| _mm_cvtss_si64⚠ | x86-64 and sseConverts the lowest 32 bit float in the input vector to a 64 bit integer. |
| _mm_cvtt_ss2si⚠ | x86-64 and sseAlias for |
| _mm_cvttpd_epi32⚠ | x86-64 and sse2Converts packed double-precision (64-bit) floating-point elements in |
| _mm_cvttps_epi32⚠ | x86-64 and sse2Converts packed single-precision (32-bit) floating-point elements in |
| _mm_cvttsd_si32⚠ | x86-64 and sse2Converts the lower double-precision (64-bit) floating-point element in |
| _mm_cvttsd_si64⚠ | x86-64 and sse2Converts the lower double-precision (64-bit) floating-point element in |
| _mm_cvttsd_si64x⚠ | x86-64 and sse2Alias for |
| _mm_cvttss_si32⚠ | x86-64 and sseConverts the lowest 32 bit float in the input vector to a 32 bit integer with truncation. |
| _mm_cvttss_si64⚠ | x86-64 and sseConverts the lowest 32 bit float in the input vector to a 64 bit integer with truncation. |
| _mm_div_pd⚠ | x86-64 and sse2Divide packed double-precision (64-bit) floating-point elements in |
| _mm_div_ps⚠ | x86-64 and sseDivides __m128 vectors. |
| _mm_div_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_div_ss⚠ | x86-64 and sseDivides the first component of |
| _mm_dp_pd⚠ | x86-64 and sse4.1Returns the dot product of two __m128d vectors. |
| _mm_dp_ps⚠ | x86-64 and sse4.1Returns the dot product of two __m128 vectors. |
| _mm_extract_epi8⚠ | x86-64 and sse4.1Extracts an 8-bit integer from |
| _mm_extract_epi16⚠ | x86-64 and sse2Returns the |
| _mm_extract_epi32⚠ | x86-64 and sse4.1Extracts an 32-bit integer from |
| _mm_extract_epi64⚠ | x86-64 and sse4.1Extracts an 64-bit integer from |
| _mm_extract_ps⚠ | x86-64 and sse4.1Extracts a single-precision (32-bit) floating-point element from |
| _mm_extract_si64⚠ | x86-64 and sse4aExtracts the bit range specified by |
| _mm_floor_pd⚠ | x86-64 and sse4.1Round the packed double-precision (64-bit) floating-point elements in |
| _mm_floor_ps⚠ | x86-64 and sse4.1Round the packed single-precision (32-bit) floating-point elements in |
| _mm_floor_sd⚠ | x86-64 and sse4.1Round the lower double-precision (64-bit) floating-point element in |
| _mm_floor_ss⚠ | x86-64 and sse4.1Round the lower single-precision (32-bit) floating-point element in |
| _mm_fmadd_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm_fmadd_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm_fmadd_sd⚠ | x86-64 and fmaMultiplies the lower double-precision (64-bit) floating-point elements in
|
| _mm_fmadd_ss⚠ | x86-64 and fmaMultiplies the lower single-precision (32-bit) floating-point elements in
|
| _mm_fmaddsub_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm_fmaddsub_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm_fmsub_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm_fmsub_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm_fmsub_sd⚠ | x86-64 and fmaMultiplies the lower double-precision (64-bit) floating-point elements in
|
| _mm_fmsub_ss⚠ | x86-64 and fmaMultiplies the lower single-precision (32-bit) floating-point elements in
|
| _mm_fmsubadd_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm_fmsubadd_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm_fnmadd_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm_fnmadd_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm_fnmadd_sd⚠ | x86-64 and fmaMultiplies the lower double-precision (64-bit) floating-point elements in
|
| _mm_fnmadd_ss⚠ | x86-64 and fmaMultiplies the lower single-precision (32-bit) floating-point elements in
|
| _mm_fnmsub_pd⚠ | x86-64 and fmaMultiplies packed double-precision (64-bit) floating-point elements in |
| _mm_fnmsub_ps⚠ | x86-64 and fmaMultiplies packed single-precision (32-bit) floating-point elements in |
| _mm_fnmsub_sd⚠ | x86-64 and fmaMultiplies the lower double-precision (64-bit) floating-point elements in
|
| _mm_fnmsub_ss⚠ | x86-64 and fmaMultiplies the lower single-precision (32-bit) floating-point elements in
|
| _mm_getcsr⚠ | x86-64 and sseGets the unsigned 32-bit value of the MXCSR control and status register. |
| _mm_hadd_epi16⚠ | x86-64 and ssse3Horizontally adds the adjacent pairs of values contained in 2 packed
128-bit vectors of |
| _mm_hadd_epi32⚠ | x86-64 and ssse3Horizontally adds the adjacent pairs of values contained in 2 packed
128-bit vectors of |
| _mm_hadd_pd⚠ | x86-64 and sse3Horizontally adds adjacent pairs of double-precision (64-bit)
floating-point elements in |
| _mm_hadd_ps⚠ | x86-64 and sse3Horizontally adds adjacent pairs of single-precision (32-bit)
floating-point elements in |
| _mm_hadds_epi16⚠ | x86-64 and ssse3Horizontally adds the adjacent pairs of values contained in 2 packed
128-bit vectors of |
| _mm_hsub_epi16⚠ | x86-64 and ssse3Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of |
| _mm_hsub_epi32⚠ | x86-64 and ssse3Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of |
| _mm_hsub_pd⚠ | x86-64 and sse3Horizontally subtract adjacent pairs of double-precision (64-bit)
floating-point elements in |
| _mm_hsub_ps⚠ | x86-64 and sse3Horizontally adds adjacent pairs of single-precision (32-bit)
floating-point elements in |
| _mm_hsubs_epi16⚠ | x86-64 and ssse3Horizontally subtract the adjacent pairs of values contained in 2
packed 128-bit vectors of |
| _mm_i32gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm_i32gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm_i64gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm_i64gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm_i32gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm_i32gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm_i64gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm_i64gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm_insert_epi8⚠ | x86-64 and sse4.1Returns a copy of |
| _mm_insert_epi16⚠ | x86-64 and sse2Returns a new vector where the |
| _mm_insert_epi32⚠ | x86-64 and sse4.1Returns a copy of |
| _mm_insert_epi64⚠ | x86-64 and sse4.1Returns a copy of |
| _mm_insert_ps⚠ | x86-64 and sse4.1Select a single value in |
| _mm_insert_si64⚠ | x86-64 and sse4aInserts the |
| _mm_lddqu_si128⚠ | x86-64 and sse3Loads 128-bits of integer data from unaligned memory.
This intrinsic may perform better than |
| _mm_lfence⚠ | x86-64 and sse2Performs a serializing operation on all load-from-memory instructions that were issued prior to this instruction. |
| _mm_load1_ps⚠ | x86-64 and sseConstruct a |
| _mm_load1_pd⚠ | x86-64 and sse2Loads a double-precision (64-bit) floating-point element from memory into both elements of returned vector. |
| _mm_load_pd⚠ | x86-64 and sse2Loads 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
|
| _mm_load_pd1⚠ | x86-64 and sse2Loads a double-precision (64-bit) floating-point element from memory into both elements of returned vector. |
| _mm_load_ps⚠ | x86-64 and sseLoads four |
| _mm_load_ps1⚠ | x86-64 and sseAlias for |
| _mm_load_sd⚠ | x86-64 and sse2Loads a 64-bit double-precision value to the low element of a 128-bit integer vector and clears the upper element. |
| _mm_load_si128⚠ | x86-64 and sse2Loads 128-bits of integer data from memory into a new vector. |
| _mm_load_ss⚠ | x86-64 and sseConstruct a |
| _mm_loaddup_pd⚠ | x86-64 and sse3Loads a double-precision (64-bit) floating-point element from memory into both elements of return vector. |
| _mm_loadh_pd⚠ | x86-64 and sse2Loads a double-precision value into the high-order bits of a 128-bit
vector of |
| _mm_loadl_epi64⚠ | x86-64 and sse2Loads 64-bit integer from memory into first element of returned vector. |
| _mm_loadl_pd⚠ | x86-64 and sse2Loads a double-precision value into the low-order bits of a 128-bit
vector of |
| _mm_loadr_pd⚠ | x86-64 and sse2Loads 2 double-precision (64-bit) floating-point elements from memory into
the returned vector in reverse order. |
| _mm_loadr_ps⚠ | x86-64 and sseLoads four |
| _mm_loadu_pd⚠ | x86-64 and sse2Loads 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from memory into the returned vector.
|
| _mm_loadu_ps⚠ | x86-64 and sseLoads four |
| _mm_loadu_si128⚠ | x86-64 and sse2Loads 128-bits of integer data from memory into a new vector. |
| _mm_madd_epi16⚠ | x86-64 and sse2Multiplies and then horizontally add signed 16 bit integers in |
| _mm_maddubs_epi16⚠ | x86-64 and ssse3Multiplies corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, add pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination. |
| _mm_mask_i32gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i32gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i64gather_ps⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i64gather_pd⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i32gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i32gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i64gather_epi32⚠ | x86-64 and avx2Returns values from |
| _mm_mask_i64gather_epi64⚠ | x86-64 and avx2Returns values from |
| _mm_maskload_epi32⚠ | x86-64 and avx2Loads packed 32-bit integers from memory pointed by |
| _mm_maskload_epi64⚠ | x86-64 and avx2Loads packed 64-bit integers from memory pointed by |
| _mm_maskload_pd⚠ | x86-64 and avxLoads packed double-precision (64-bit) floating-point elements from memory
into result using |
| _mm_maskload_ps⚠ | x86-64 and avxLoads packed single-precision (32-bit) floating-point elements from memory
into result using |
| _mm_maskmoveu_si128⚠ | x86-64 and sse2Conditionally store 8-bit integer elements from |
| _mm_maskstore_epi32⚠ | x86-64 and avx2Stores packed 32-bit integers from |
| _mm_maskstore_epi64⚠ | x86-64 and avx2Stores packed 64-bit integers from |
| _mm_maskstore_pd⚠ | x86-64 and avxStores packed double-precision (64-bit) floating-point elements from |
| _mm_maskstore_ps⚠ | x86-64 and avxStores packed single-precision (32-bit) floating-point elements from |
| _mm_max_epi8⚠ | x86-64 and sse4.1Compares packed 8-bit integers in |
| _mm_max_epi16⚠ | x86-64 and sse2Compares packed 16-bit integers in |
| _mm_max_epi32⚠ | x86-64 and sse4.1Compares packed 32-bit integers in |
| _mm_max_epu8⚠ | x86-64 and sse2Compares packed unsigned 8-bit integers in |
| _mm_max_epu16⚠ | x86-64 and sse4.1Compares packed unsigned 16-bit integers in |
| _mm_max_epu32⚠ | x86-64 and sse4.1Compares packed unsigned 32-bit integers in |
| _mm_max_pd⚠ | x86-64 and sse2Returns a new vector with the maximum values from corresponding elements in
|
| _mm_max_ps⚠ | x86-64 and sseCompares packed single-precision (32-bit) floating-point elements in |
| _mm_max_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_max_ss⚠ | x86-64 and sseCompares the first single-precision (32-bit) floating-point element of |
| _mm_mfence⚠ | x86-64 and sse2Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction. |
| _mm_min_epi8⚠ | x86-64 and sse4.1Compares packed 8-bit integers in |
| _mm_min_epi16⚠ | x86-64 and sse2Compares packed 16-bit integers in |
| _mm_min_epi32⚠ | x86-64 and sse4.1Compares packed 32-bit integers in |
| _mm_min_epu8⚠ | x86-64 and sse2Compares packed unsigned 8-bit integers in |
| _mm_min_epu16⚠ | x86-64 and sse4.1Compares packed unsigned 16-bit integers in |
| _mm_min_epu32⚠ | x86-64 and sse4.1Compares packed unsigned 32-bit integers in |
| _mm_min_pd⚠ | x86-64 and sse2Returns a new vector with the minimum values from corresponding elements in
|
| _mm_min_ps⚠ | x86-64 and sseCompares packed single-precision (32-bit) floating-point elements in |
| _mm_min_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_min_ss⚠ | x86-64 and sseCompares the first single-precision (32-bit) floating-point element of |
| _mm_minpos_epu16⚠ | x86-64 and sse4.1Finds the minimum unsigned 16-bit element in the 128-bit __m128i vector, returning a vector containing its value in its first position, and its index in its second position; all other elements are set to zero. |
| _mm_move_epi64⚠ | x86-64 and sse2Returns a vector where the low element is extracted from |
| _mm_move_sd⚠ | x86-64 and sse2Constructs a 128-bit floating-point vector of |
| _mm_move_ss⚠ | x86-64 and sseReturns a |
| _mm_movedup_pd⚠ | x86-64 and sse3Duplicate the low double-precision (64-bit) floating-point element
from |
| _mm_movehdup_ps⚠ | x86-64 and sse3Duplicate odd-indexed single-precision (32-bit) floating-point elements
from |
| _mm_movehl_ps⚠ | x86-64 and sseCombine higher half of |
| _mm_moveldup_ps⚠ | x86-64 and sse3Duplicate even-indexed single-precision (32-bit) floating-point elements
from |
| _mm_movelh_ps⚠ | x86-64 and sseCombine lower half of |
| _mm_movemask_epi8⚠ | x86-64 and sse2Returns a mask of the most significant bit of each element in |
| _mm_movemask_pd⚠ | x86-64 and sse2Returns a mask of the most significant bit of each element in |
| _mm_movemask_ps⚠ | x86-64 and sseReturns a mask of the most significant bit of each element in |
| _mm_mpsadbw_epu8⚠ | x86-64 and sse4.1Subtracts 8-bit unsigned integer values and computes the absolute values of the differences to the corresponding bits in the destination. Then sums of the absolute differences are returned according to the bit fields in the immediate operand. |
| _mm_mul_epi32⚠ | x86-64 and sse4.1Multiplies the low 32-bit integers from each packed 64-bit
element in |
| _mm_mul_epu32⚠ | x86-64 and sse2Multiplies the low unsigned 32-bit integers from each packed 64-bit element
in |
| _mm_mul_pd⚠ | x86-64 and sse2Multiplies packed double-precision (64-bit) floating-point elements in |
| _mm_mul_ps⚠ | x86-64 and sseMultiplies __m128 vectors. |
| _mm_mul_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_mul_ss⚠ | x86-64 and sseMultiplies the first component of |
| _mm_mulhi_epi16⚠ | x86-64 and sse2Multiplies the packed 16-bit integers in |
| _mm_mulhi_epu16⚠ | x86-64 and sse2Multiplies the packed unsigned 16-bit integers in |
| _mm_mulhrs_epi16⚠ | x86-64 and ssse3Multiplies packed 16-bit signed integer values, truncate the 32-bit
product to the 18 most significant bits by right-shifting, round the
truncated value by adding 1, and write bits |
| _mm_mullo_epi16⚠ | x86-64 and sse2Multiplies the packed 16-bit integers in |
| _mm_mullo_epi32⚠ | x86-64 and sse4.1Multiplies the packed 32-bit integers in |
| _mm_or_pd⚠ | x86-64 and sse2Computes the bitwise OR of |
| _mm_or_ps⚠ | x86-64 and sseBitwise OR of packed single-precision (32-bit) floating-point elements. |
| _mm_or_si128⚠ | x86-64 and sse2Computes the bitwise OR of 128 bits (representing integer data) in |
| _mm_packs_epi16⚠ | x86-64 and sse2Converts packed 16-bit integers from |
| _mm_packs_epi32⚠ | x86-64 and sse2Converts packed 32-bit integers from |
| _mm_packus_epi16⚠ | x86-64 and sse2Converts packed 16-bit integers from |
| _mm_packus_epi32⚠ | x86-64 and sse4.1Converts packed 32-bit integers from |
| _mm_pause⚠ | x86-64 Provides a hint to the processor that the code sequence is a spin-wait loop. |
| _mm_permute_pd⚠ | x86-64 and avx,sse2Shuffles double-precision (64-bit) floating-point elements in |
| _mm_permute_ps⚠ | x86-64 and avx,sseShuffles single-precision (32-bit) floating-point elements in |
| _mm_permutevar_pd⚠ | x86-64 and avxShuffles double-precision (64-bit) floating-point elements in |
| _mm_permutevar_ps⚠ | x86-64 and avxShuffles single-precision (32-bit) floating-point elements in |
| _mm_prefetch⚠ | x86-64 and sseFetch the cache line that contains address |
| _mm_rcp_ps⚠ | x86-64 and sseReturns the approximate reciprocal of packed single-precision (32-bit)
floating-point elements in |
| _mm_rcp_ss⚠ | x86-64 and sseReturns the approximate reciprocal of the first single-precision
(32-bit) floating-point element in |
| _mm_round_pd⚠ | x86-64 and sse4.1Round the packed double-precision (64-bit) floating-point elements in |
| _mm_round_ps⚠ | x86-64 and sse4.1Round the packed single-precision (32-bit) floating-point elements in |
| _mm_round_sd⚠ | x86-64 and sse4.1Round the lower double-precision (64-bit) floating-point element in |
| _mm_round_ss⚠ | x86-64 and sse4.1Round the lower single-precision (32-bit) floating-point element in |
| _mm_rsqrt_ps⚠ | x86-64 and sseReturns the approximate reciprocal square root of packed single-precision
(32-bit) floating-point elements in |
| _mm_rsqrt_ss⚠ | x86-64 and sseReturns the approximate reciprocal square root of the fist single-precision
(32-bit) floating-point elements in |
| _mm_sad_epu8⚠ | x86-64 and sse2Sum the absolute differences of packed unsigned 8-bit integers. |
| _mm_set1_ps⚠ | x86-64 and sseConstruct a |
| _mm_set1_pd⚠ | x86-64 and sse2Broadcasts double-precision (64-bit) floating-point value a to all elements of the return value. |
| _mm_set1_epi8⚠ | x86-64 and sse2Broadcasts 8-bit integer |
| _mm_set1_epi16⚠ | x86-64 and sse2Broadcasts 16-bit integer |
| _mm_set1_epi32⚠ | x86-64 and sse2Broadcasts 32-bit integer |
| _mm_set1_epi64x⚠ | x86-64 and sse2Broadcasts 64-bit integer |
| _mm_set_epi8⚠ | x86-64 and sse2Sets packed 8-bit integers with the supplied values. |
| _mm_set_epi16⚠ | x86-64 and sse2Sets packed 16-bit integers with the supplied values. |
| _mm_set_epi32⚠ | x86-64 and sse2Sets packed 32-bit integers with the supplied values. |
| _mm_set_epi64x⚠ | x86-64 and sse2Sets packed 64-bit integers with the supplied values, from highest to lowest. |
| _mm_set_pd⚠ | x86-64 and sse2Sets packed double-precision (64-bit) floating-point elements in the return value with the supplied values. |
| _mm_set_pd1⚠ | x86-64 and sse2Broadcasts double-precision (64-bit) floating-point value a to all elements of the return value. |
| _mm_set_ps⚠ | x86-64 and sseConstruct a |
| _mm_set_ps1⚠ | x86-64 and sseAlias for |
| _mm_set_sd⚠ | x86-64 and sse2Copies double-precision (64-bit) floating-point element |
| _mm_set_ss⚠ | x86-64 and sseConstruct a |
| _mm_setcsr⚠ | x86-64 and sseSets the MXCSR register with the 32-bit unsigned integer value. |
| _mm_setr_epi8⚠ | x86-64 and sse2Sets packed 8-bit integers with the supplied values in reverse order. |
| _mm_setr_epi16⚠ | x86-64 and sse2Sets packed 16-bit integers with the supplied values in reverse order. |
| _mm_setr_epi32⚠ | x86-64 and sse2Sets packed 32-bit integers with the supplied values in reverse order. |
| _mm_setr_pd⚠ | x86-64 and sse2Sets packed double-precision (64-bit) floating-point elements in the return value with the supplied values in reverse order. |
| _mm_setr_ps⚠ | x86-64 and sseConstruct a |
| _mm_setzero_pd⚠ | x86-64 and sse2Returns packed double-precision (64-bit) floating-point elements with all zeros. |
| _mm_setzero_ps⚠ | x86-64 and sseConstruct a |
| _mm_setzero_si128⚠ | x86-64 and sse2Returns a vector with all elements set to zero. |
| _mm_sfence⚠ | x86-64 and ssePerforms a serializing operation on all store-to-memory instructions that were issued prior to this instruction. |
| _mm_sha1msg1_epu32⚠ | x86-64 and shaPerforms an intermediate calculation for the next four SHA1 message values
(unsigned 32-bit integers) using previous message values from |
| _mm_sha1msg2_epu32⚠ | x86-64 and shaPerforms the final calculation for the next four SHA1 message values
(unsigned 32-bit integers) using the intermediate result in |
| _mm_sha1nexte_epu32⚠ | x86-64 and shaCalculate SHA1 state variable E after four rounds of operation from the
current SHA1 state variable |
| _mm_sha1rnds4_epu32⚠ | x86-64 and shaPerforms four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D)
from |
| _mm_sha256msg1_epu32⚠ | x86-64 and shaPerforms an intermediate calculation for the next four SHA256 message values
(unsigned 32-bit integers) using previous message values from |
| _mm_sha256msg2_epu32⚠ | x86-64 and shaPerforms the final calculation for the next four SHA256 message values
(unsigned 32-bit integers) using previous message values from |
| _mm_sha256rnds2_epu32⚠ | x86-64 and shaPerforms 2 rounds of SHA256 operation using an initial SHA256 state
(C,D,G,H) from |
| _mm_shuffle_epi8⚠ | x86-64 and ssse3Shuffles bytes from |
| _mm_shuffle_epi32⚠ | x86-64 and sse2Shuffles 32-bit integers in |
| _mm_shuffle_pd⚠ | x86-64 and sse2Constructs a 128-bit floating-point vector of |
| _mm_shuffle_ps⚠ | x86-64 and sseShuffles packed single-precision (32-bit) floating-point elements in |
| _mm_shufflehi_epi16⚠ | x86-64 and sse2Shuffles 16-bit integers in the high 64 bits of |
| _mm_shufflelo_epi16⚠ | x86-64 and sse2Shuffles 16-bit integers in the low 64 bits of |
| _mm_sign_epi8⚠ | x86-64 and ssse3Negates packed 8-bit integers in |
| _mm_sign_epi16⚠ | x86-64 and ssse3Negates packed 16-bit integers in |
| _mm_sign_epi32⚠ | x86-64 and ssse3Negates packed 32-bit integers in |
| _mm_sll_epi16⚠ | x86-64 and sse2Shifts packed 16-bit integers in |
| _mm_sll_epi32⚠ | x86-64 and sse2Shifts packed 32-bit integers in |
| _mm_sll_epi64⚠ | x86-64 and sse2Shifts packed 64-bit integers in |
| _mm_slli_epi16⚠ | x86-64 and sse2Shifts packed 16-bit integers in |
| _mm_slli_epi32⚠ | x86-64 and sse2Shifts packed 32-bit integers in |
| _mm_slli_epi64⚠ | x86-64 and sse2Shifts packed 64-bit integers in |
| _mm_slli_si128⚠ | x86-64 and sse2Shifts |
| _mm_sllv_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm_sllv_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm_sqrt_pd⚠ | x86-64 and sse2Returns a new vector with the square root of each of the values in |
| _mm_sqrt_ps⚠ | x86-64 and sseReturns the square root of packed single-precision (32-bit) floating-point
elements in |
| _mm_sqrt_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_sqrt_ss⚠ | x86-64 and sseReturns the square root of the first single-precision (32-bit)
floating-point element in |
| _mm_sra_epi16⚠ | x86-64 and sse2Shifts packed 16-bit integers in |
| _mm_sra_epi32⚠ | x86-64 and sse2Shifts packed 32-bit integers in |
| _mm_srai_epi16⚠ | x86-64 and sse2Shifts packed 16-bit integers in |
| _mm_srai_epi32⚠ | x86-64 and sse2Shifts packed 32-bit integers in |
| _mm_srav_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm_srl_epi16⚠ | x86-64 and sse2Shifts packed 16-bit integers in |
| _mm_srl_epi32⚠ | x86-64 and sse2Shifts packed 32-bit integers in |
| _mm_srl_epi64⚠ | x86-64 and sse2Shifts packed 64-bit integers in |
| _mm_srli_epi16⚠ | x86-64 and sse2Shifts packed 16-bit integers in |
| _mm_srli_epi32⚠ | x86-64 and sse2Shifts packed 32-bit integers in |
| _mm_srli_epi64⚠ | x86-64 and sse2Shifts packed 64-bit integers in |
| _mm_srli_si128⚠ | x86-64 and sse2Shifts |
| _mm_srlv_epi32⚠ | x86-64 and avx2Shifts packed 32-bit integers in |
| _mm_srlv_epi64⚠ | x86-64 and avx2Shifts packed 64-bit integers in |
| _mm_store1_ps⚠ | x86-64 and sseStores the lowest 32 bit float of |
| _mm_store1_pd⚠ | x86-64 and sse2Stores the lower double-precision (64-bit) floating-point element from |
| _mm_store_pd⚠ | x86-64 and sse2Stores 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
| _mm_store_pd1⚠ | x86-64 and sse2Stores the lower double-precision (64-bit) floating-point element from |
| _mm_store_ps⚠ | x86-64 and sseStores four 32-bit floats into aligned memory. |
| _mm_store_ps1⚠ | x86-64 and sseAlias for |
| _mm_store_sd⚠ | x86-64 and sse2Stores the lower 64 bits of a 128-bit vector of |
| _mm_store_si128⚠ | x86-64 and sse2Stores 128-bits of integer data from |
| _mm_store_ss⚠ | x86-64 and sseStores the lowest 32 bit float of |
| _mm_storeh_pd⚠ | x86-64 and sse2Stores the upper 64 bits of a 128-bit vector of |
| _mm_storel_epi64⚠ | x86-64 and sse2Stores the lower 64-bit integer |
| _mm_storel_pd⚠ | x86-64 and sse2Stores the lower 64 bits of a 128-bit vector of |
| _mm_storer_pd⚠ | x86-64 and sse2Stores 2 double-precision (64-bit) floating-point elements from |
| _mm_storer_ps⚠ | x86-64 and sseStores four 32-bit floats into aligned memory in reverse order. |
| _mm_storeu_pd⚠ | x86-64 and sse2Stores 128-bits (composed of 2 packed double-precision (64-bit)
floating-point elements) from |
| _mm_storeu_ps⚠ | x86-64 and sseStores four 32-bit floats into memory. There are no restrictions on memory
alignment. For aligned memory |
| _mm_storeu_si128⚠ | x86-64 and sse2Stores 128-bits of integer data from |
| _mm_stream_pd⚠ | x86-64 and sse2Stores a 128-bit floating point vector of |
| _mm_stream_ps⚠ | x86-64 and sseStores |
| _mm_stream_sd⚠ | x86-64 and sse4aNon-temporal store of |
| _mm_stream_si32⚠ | x86-64 and sse2Stores a 32-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). |
| _mm_stream_si64⚠ | x86-64 and sse2Stores a 64-bit integer value in the specified memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). |
| _mm_stream_si128⚠ | x86-64 and sse2Stores a 128-bit integer vector to a 128-bit aligned memory location. To minimize caching, the data is flagged as non-temporal (unlikely to be used again soon). |
| _mm_stream_ss⚠ | x86-64 and sse4aNon-temporal store of |
| _mm_sub_epi8⚠ | x86-64 and sse2Subtracts packed 8-bit integers in |
| _mm_sub_epi16⚠ | x86-64 and sse2Subtracts packed 16-bit integers in |
| _mm_sub_epi32⚠ | x86-64 and sse2Subtract packed 32-bit integers in |
| _mm_sub_epi64⚠ | x86-64 and sse2Subtract packed 64-bit integers in |
| _mm_sub_pd⚠ | x86-64 and sse2Subtract packed double-precision (64-bit) floating-point elements in |
| _mm_sub_ps⚠ | x86-64 and sseSubtracts __m128 vectors. |
| _mm_sub_sd⚠ | x86-64 and sse2Returns a new vector with the low element of |
| _mm_sub_ss⚠ | x86-64 and sseSubtracts the first component of |
| _mm_subs_epi8⚠ | x86-64 and sse2Subtract packed 8-bit integers in |
| _mm_subs_epi16⚠ | x86-64 and sse2Subtract packed 16-bit integers in |
| _mm_subs_epu8⚠ | x86-64 and sse2Subtract packed unsigned 8-bit integers in |
| _mm_subs_epu16⚠ | x86-64 and sse2Subtract packed unsigned 16-bit integers in |
| _mm_test_all_ones⚠ | x86-64 and sse4.1Tests whether the specified bits in |
| _mm_test_all_zeros⚠ | x86-64 and sse4.1Tests whether the specified bits in a 128-bit integer vector are all zeros. |
| _mm_test_mix_ones_zeros⚠ | x86-64 and sse4.1Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones. |
| _mm_testc_pd⚠ | x86-64 and avxComputes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm_testc_ps⚠ | x86-64 and avxComputes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm_testc_si128⚠ | x86-64 and sse4.1Tests whether the specified bits in a 128-bit integer vector are all ones. |
| _mm_testnzc_pd⚠ | x86-64 and avxComputes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm_testnzc_ps⚠ | x86-64 and avxComputes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm_testnzc_si128⚠ | x86-64 and sse4.1Tests whether the specified bits in a 128-bit integer vector are neither all zeros nor all ones. |
| _mm_testz_pd⚠ | x86-64 and avxComputes the bitwise AND of 128 bits (representing double-precision (64-bit)
floating-point elements) in |
| _mm_testz_ps⚠ | x86-64 and avxComputes the bitwise AND of 128 bits (representing single-precision (32-bit)
floating-point elements) in |
| _mm_testz_si128⚠ | x86-64 and sse4.1Tests whether the specified bits in a 128-bit integer vector are all zeros. |
| _mm_tzcnt_32⚠ | x86-64 and bmi1Counts the number of trailing least significant zero bits. |
| _mm_tzcnt_64⚠ | x86-64 and bmi1Counts the number of trailing least significant zero bits. |
| _mm_ucomieq_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_ucomieq_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_ucomige_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_ucomige_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_ucomigt_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_ucomigt_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_ucomile_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_ucomile_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_ucomilt_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_ucomilt_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_ucomineq_sd⚠ | x86-64 and sse2Compares the lower element of |
| _mm_ucomineq_ss⚠ | x86-64 and sseCompares two 32-bit floats from the low-order bits of |
| _mm_undefined_pd⚠ | x86-64 and sse2Returns vector of type __m128d with undefined elements. |
| _mm_undefined_ps⚠ | x86-64 and sseReturns vector of type __m128 with undefined elements. |
| _mm_undefined_si128⚠ | x86-64 and sse2Returns vector of type __m128i with undefined elements. |
| _mm_unpackhi_epi8⚠ | x86-64 and sse2Unpacks and interleave 8-bit integers from the high half of |
| _mm_unpackhi_epi16⚠ | x86-64 and sse2Unpacks and interleave 16-bit integers from the high half of |
| _mm_unpackhi_epi32⚠ | x86-64 and sse2Unpacks and interleave 32-bit integers from the high half of |
| _mm_unpackhi_epi64⚠ | x86-64 and sse2Unpacks and interleave 64-bit integers from the high half of |
| _mm_unpackhi_pd⚠ | x86-64 and sse2The resulting |
| _mm_unpackhi_ps⚠ | x86-64 and sseUnpacks and interleave single-precision (32-bit) floating-point elements
from the higher half of |
| _mm_unpacklo_epi8⚠ | x86-64 and sse2Unpacks and interleave 8-bit integers from the low half of |
| _mm_unpacklo_epi16⚠ | x86-64 and sse2Unpacks and interleave 16-bit integers from the low half of |
| _mm_unpacklo_epi32⚠ | x86-64 and sse2Unpacks and interleave 32-bit integers from the low half of |
| _mm_unpacklo_epi64⚠ | x86-64 and sse2Unpacks and interleave 64-bit integers from the low half of |
| _mm_unpacklo_pd⚠ | x86-64 and sse2The resulting |
| _mm_unpacklo_ps⚠ | x86-64 and sseUnpacks and interleave single-precision (32-bit) floating-point elements
from the lower half of |
| _mm_xor_pd⚠ | x86-64 and sse2Computes the bitwise OR of |
| _mm_xor_ps⚠ | x86-64 and sseBitwise exclusive OR of packed single-precision (32-bit) floating-point elements. |
| _mm_xor_si128⚠ | x86-64 and sse2Computes the bitwise XOR of 128 bits (representing integer data) in |
| _mulx_u32⚠ | x86-64 and bmi2Unsigned multiply without affecting flags. |
| _mulx_u64⚠ | x86-64 and bmi2Unsigned multiply without affecting flags. |
| _pdep_u32⚠ | x86-64 and bmi2Scatter contiguous low order bits of |
| _pdep_u64⚠ | x86-64 and bmi2Scatter contiguous low order bits of |
| _pext_u32⚠ | x86-64 and bmi2Gathers the bits of |
| _pext_u64⚠ | x86-64 and bmi2Gathers the bits of |
| _popcnt32⚠ | x86-64 and popcntCounts the bits that are set. |
| _popcnt64⚠ | x86-64 and popcntCounts the bits that are set. |
| _rdrand16_step⚠ | x86-64 and rdrandRead a hardware generated 16-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise. |
| _rdrand32_step⚠ | x86-64 and rdrandRead a hardware generated 32-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise. |
| _rdrand64_step⚠ | x86-64 and rdrandRead a hardware generated 64-bit random value and store the result in val. Returns 1 if a random value was generated, and 0 otherwise. |
| _rdseed16_step⚠ | x86-64 and rdseedRead a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdseed32_step⚠ | x86-64 and rdseedRead a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdseed64_step⚠ | x86-64 and rdseedRead a 64-bit NIST SP800-90B and SP800-90C compliant random value and store in val. Return 1 if a random value was generated, and 0 otherwise. |
| _rdtsc⚠ | x86-64 Reads the current value of the processor’s time-stamp counter. |
| _subborrow_u32⚠ | x86-64 Adds unsigned 32-bit integers |
| _subborrow_u64⚠ | x86-64 Adds unsigned 64-bit integers |
| _t1mskc_u32⚠ | x86-64 and tbmClears all bits below the least significant zero of |
| _t1mskc_u64⚠ | x86-64 and tbmClears all bits below the least significant zero of |
| _tzcnt_u32⚠ | x86-64 and bmi1Counts the number of trailing least significant zero bits. |
| _tzcnt_u64⚠ | x86-64 and bmi1Counts the number of trailing least significant zero bits. |
| _tzmsk_u32⚠ | x86-64 and tbmSets all bits below the least significant one of |
| _tzmsk_u64⚠ | x86-64 and tbmSets all bits below the least significant one of |
| _xgetbv⚠ | x86-64 and xsaveReads the contents of the extended control register |
| _xrstor⚠ | x86-64 and xsavePerforms a full or partial restore of the enabled processor states using
the state information stored in memory at |
| _xrstor64⚠ | x86-64 and xsavePerforms a full or partial restore of the enabled processor states using
the state information stored in memory at |
| _xrstors⚠ | x86-64 and xsave,xsavesPerforms a full or partial restore of the enabled processor states using the
state information stored in memory at |
| _xrstors64⚠ | x86-64 and xsave,xsavesPerforms a full or partial restore of the enabled processor states using the
state information stored in memory at |
| _xsave⚠ | x86-64 and xsavePerforms a full or partial save of the enabled processor states to memory at
|
| _xsave64⚠ | x86-64 and xsavePerforms a full or partial save of the enabled processor states to memory at
|
| _xsavec⚠ | x86-64 and xsave,xsavecPerforms a full or partial save of the enabled processor states to memory
at |
| _xsavec64⚠ | x86-64 and xsave,xsavecPerforms a full or partial save of the enabled processor states to memory
at |
| _xsaveopt⚠ | x86-64 and xsave,xsaveoptPerforms a full or partial save of the enabled processor states to memory at
|
| _xsaveopt64⚠ | x86-64 and xsave,xsaveoptPerforms a full or partial save of the enabled processor states to memory at
|
| _xsaves⚠ | x86-64 and xsave,xsavesPerforms a full or partial save of the enabled processor states to memory at
|
| _xsaves64⚠ | x86-64 and xsave,xsavesPerforms a full or partial save of the enabled processor states to memory at
|
| _xsetbv⚠ | x86-64 and xsaveCopies 64-bits from |
| _MM_SHUFFLE | Experimentalx86-64 A utility function for creating masks to use with Intel shuffle and permute intrinsics. |
| _bittest⚠ | Experimentalx86-64 Returns the bit in position |
| _bittest64⚠ | Experimentalx86-64 Returns the bit in position |
| _bittestandcomplement⚠ | Experimentalx86-64 Returns the bit in position |
| _bittestandcomplement64⚠ | Experimentalx86-64 Returns the bit in position |
| _bittestandreset⚠ | Experimentalx86-64 Returns the bit in position |
| _bittestandreset64⚠ | Experimentalx86-64 Returns the bit in position |
| _bittestandset⚠ | Experimentalx86-64 Returns the bit in position |
| _bittestandset64⚠ | Experimentalx86-64 Returns the bit in position |
| _m_empty⚠ | Experimentalx86-64 and mmxEmpty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures. |
| _m_maskmovq⚠ | Experimentalx86-64 and sse,mmxConditionally copies the values from each 8-bit element in the first 64-bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64-bit integer vector operand. |
| _m_paddb⚠ | Experimentalx86-64 and mmxAdds packed 8-bit integers in |
| _m_paddd⚠ | Experimentalx86-64 and mmxAdds packed 32-bit integers in |
| _m_paddsb⚠ | Experimentalx86-64 and mmxAdds packed 8-bit integers in |
| _m_paddsw⚠ | Experimentalx86-64 and mmxAdds packed 16-bit integers in |
| _m_paddusb⚠ | Experimentalx86-64 and mmxAdds packed unsigned 8-bit integers in |
| _m_paddusw⚠ | Experimentalx86-64 and mmxAdds packed unsigned 16-bit integers in |
| _m_paddw⚠ | Experimentalx86-64 and mmxAdds packed 16-bit integers in |
| _m_pavgb⚠ | Experimentalx86-64 and sse,mmxComputes the rounded averages of the packed unsigned 8-bit integer values and writes the averages to the corresponding bits in the destination. |
| _m_pavgw⚠ | Experimentalx86-64 and sse,mmxComputes the rounded averages of the packed unsigned 16-bit integer values and writes the averages to the corresponding bits in the destination. |
| _m_pextrw⚠ | Experimentalx86-64 and sse,mmxExtracts 16-bit element from a 64-bit vector of |
| _m_pinsrw⚠ | Experimentalx86-64 and sse,mmxCopies data from the 64-bit vector of |
| _m_pmaxsw⚠ | Experimentalx86-64 and sse,mmxCompares the packed 16-bit signed integers of |
| _m_pmaxub⚠ | Experimentalx86-64 and sse,mmxCompares the packed 8-bit signed integers of |
| _m_pminsw⚠ | Experimentalx86-64 and sse,mmxCompares the packed 16-bit signed integers of |
| _m_pminub⚠ | Experimentalx86-64 and sse,mmxCompares the packed 8-bit signed integers of |
| _m_pmovmskb⚠ | Experimentalx86-64 and sse,mmxTakes the most significant bit from each 8-bit element in a 64-bit integer vector to create a 16-bit mask value. Zero-extends the value to 32-bit integer and writes it to the destination. |
| _m_pmulhuw⚠ | Experimentalx86-64 and sse,mmxMultiplies packed 16-bit unsigned integer values and writes the high-order 16 bits of each 32-bit product to the corresponding bits in the destination. |
| _m_psadbw⚠ | Experimentalx86-64 and sse,mmxSubtracts the corresponding 8-bit unsigned integer values of the two
64-bit vector operands and computes the absolute value for each of the
difference. Then sum of the 8 absolute differences is written to the
bits |
| _m_pshufw⚠ | Experimentalx86-64 and sse,mmxShuffles the 4 16-bit integers from a 64-bit integer vector to the destination, as specified by the immediate value operand. |
| _m_psubb⚠ | Experimentalx86-64 and mmxSubtract packed 8-bit integers in |
| _m_psubd⚠ | Experimentalx86-64 and mmxSubtract packed 32-bit integers in |
| _m_psubsb⚠ | Experimentalx86-64 and mmxSubtract packed 8-bit integers in |
| _m_psubsw⚠ | Experimentalx86-64 and mmxSubtract packed 16-bit integers in |
| _m_psubusb⚠ | Experimentalx86-64 and mmxSubtract packed unsigned 8-bit integers in |
| _m_psubusw⚠ | Experimentalx86-64 and mmxSubtract packed unsigned 16-bit integers in |
| _m_psubw⚠ | Experimentalx86-64 and mmxSubtract packed 16-bit integers in |
| _mm256_cvtph_ps⚠ | Experimentalx86-64 and f16cConverts the 8 x 16-bit half-precision float values in the 128-bit vector
|
| _mm256_cvtps_ph⚠ | Experimentalx86-64 and f16cConverts the 8 x 32-bit float values in the 256-bit vector |
| _mm256_madd52hi_epu64⚠ | Experimentalx86-64 and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element of
|
| _mm256_madd52lo_epu64⚠ | Experimentalx86-64 and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element of
|
| _mm512_abs_epi32⚠ | Experimentalx86-64 and avx512fComputes the absolute values of packed 32-bit integers in |
| _mm512_madd52hi_epu64⚠ | Experimentalx86-64 and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element of
|
| _mm512_madd52lo_epu64⚠ | Experimentalx86-64 and avx512ifmaMultiply packed unsigned 52-bit integers in each 64-bit element of
|
| _mm512_mask_abs_epi32⚠ | Experimentalx86-64 and avx512fComputes the absolute value of packed 32-bit integers in |
| _mm512_maskz_abs_epi32⚠ | Experimentalx86-64 and avx512fComputes the absolute value of packed 32-bit integers in |
| _mm512_set1_epi64⚠ | Experimentalx86-64 and avx512fBroadcast 64-bit integer |
| _mm512_setr_epi32⚠ | Experimentalx86-64 and avx512fSets packed 32-bit integers in |
| _mm512_setzero_si512⚠ | Experimentalx86-64 and avx512fReturns vector of type |
| _mm_abs_pi8⚠ | Experimentalx86-64 and ssse3,mmxComputes the absolute value of packed 8-bit integers in |
| _mm_abs_pi16⚠ | Experimentalx86-64 and ssse3,mmxComputes the absolute value of packed 8-bit integers in |
| _mm_abs_pi32⚠ | Experimentalx86-64 and ssse3,mmxComputes the absolute value of packed 32-bit integers in |
| _mm_add_pi8⚠ | Experimentalx86-64 and mmxAdds packed 8-bit integers in |
| _mm_add_pi16⚠ | Experimentalx86-64 and mmxAdds packed 16-bit integers in |
| _mm_add_pi32⚠ | Experimentalx86-64 and mmxAdds packed 32-bit integers in |
| _mm_add_si64⚠ | Experimentalx86-64 and sse2,mmxAdds two signed or unsigned 64-bit integer values, returning the lower 64 bits of the sum. |
| _mm_adds_pi8⚠ | Experimentalx86-64 and mmxAdds packed 8-bit integers in |
| _mm_adds_pi16⚠ | Experimentalx86-64 and mmxAdds packed 16-bit integers in |
| _mm_adds_pu8⚠ | Experimentalx86-64 and mmxAdds packed unsigned 8-bit integers in |
| _mm_adds_pu16⚠ | Experimentalx86-64 and mmxAdds packed unsigned 16-bit integers in |
| _mm_alignr_pi8⚠ | Experimentalx86-64 and ssse3,mmxConcatenates the two 64-bit integer vector operands, and right-shifts the result by the number of bytes specified in the immediate operand. |
| _mm_avg_pu8⚠ | Experimentalx86-64 and sse,mmxComputes the rounded averages of the packed unsigned 8-bit integer values and writes the averages to the corresponding bits in the destination. |
| _mm_avg_pu16⚠ | Experimentalx86-64 and sse,mmxComputes the rounded averages of the packed unsigned 16-bit integer values and writes the averages to the corresponding bits in the destination. |
| _mm_cmpgt_pi8⚠ | Experimentalx86-64 and mmxCompares whether each element of |
| _mm_cmpgt_pi16⚠ | Experimentalx86-64 and mmxCompares whether each element of |
| _mm_cmpgt_pi32⚠ | Experimentalx86-64 and mmxCompares whether each element of |
| _mm_cvt_pi2ps⚠ | Experimentalx86-64 and sse,mmxConverts two elements of a 64-bit vector of |
| _mm_cvt_ps2pi⚠ | Experimentalx86-64 and sse,mmxConverts the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_cvtpd_pi32⚠ | Experimentalx86-64 and sse2,mmxConverts the two double-precision floating-point elements of a
128-bit vector of |
| _mm_cvtph_ps⚠ | Experimentalx86-64 and f16cConverts the 4 x 16-bit half-precision float values in the lowest 64-bit of
the 128-bit vector |
| _mm_cvtpi8_ps⚠ | Experimentalx86-64 and sse,mmxConverts the lower 4 8-bit values of |
| _mm_cvtpi16_ps⚠ | Experimentalx86-64 and sse,mmxConverts a 64-bit vector of |
| _mm_cvtpi32_ps⚠ | Experimentalx86-64 and sse,mmxConverts two elements of a 64-bit vector of |
| _mm_cvtpi32_pd⚠ | Experimentalx86-64 and sse2,mmxConverts the two signed 32-bit integer elements of a 64-bit vector of
|
| _mm_cvtpi32x2_ps⚠ | Experimentalx86-64 and sse,mmxConverts the two 32-bit signed integer values from each 64-bit vector
operand of |
| _mm_cvtps_ph⚠ | Experimentalx86-64 and f16cConverts the 4 x 32-bit float values in the 128-bit vector |
| _mm_cvtps_pi8⚠ | Experimentalx86-64 and sse,mmxConverts packed single-precision (32-bit) floating-point elements in |
| _mm_cvtps_pi16⚠ | Experimentalx86-64 and sse,mmxConverts packed single-precision (32-bit) floating-point elements in |
| _mm_cvtps_pi32⚠ | Experimentalx86-64 and sse,mmxConverts the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_cvtpu8_ps⚠ | Experimentalx86-64 and sse,mmxConverts the lower 4 8-bit values of |
| _mm_cvtpu16_ps⚠ | Experimentalx86-64 and sse,mmxConverts a 64-bit vector of |
| _mm_cvtsi32_si64⚠ | Experimentalx86-64 and mmxCopies 32-bit integer |
| _mm_cvtsi64_si32⚠ | Experimentalx86-64 and mmxReturn the lower 32-bit integer in |
| _mm_cvtt_ps2pi⚠ | Experimentalx86-64 and sse,mmxConverts the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_cvttpd_pi32⚠ | Experimentalx86-64 and sse2,mmxConverts the two double-precision floating-point elements of a
128-bit vector of |
| _mm_cvttps_pi32⚠ | Experimentalx86-64 and sse,mmxConverts the two lower packed single-precision (32-bit) floating-point
elements in |
| _mm_empty⚠ | Experimentalx86-64 and mmxEmpty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures. |
| _mm_extract_pi16⚠ | Experimentalx86-64 and sse,mmxExtracts 16-bit element from a 64-bit vector of |
| _mm_hadd_pi16⚠ | Experimentalx86-64 and ssse3,mmxHorizontally adds the adjacent pairs of values contained in 2 packed
64-bit vectors of |
| _mm_hadd_pi32⚠ | Experimentalx86-64 and ssse3,mmxHorizontally adds the adjacent pairs of values contained in 2 packed
64-bit vectors of |
| _mm_hadds_pi16⚠ | Experimentalx86-64 and ssse3,mmxHorizontally adds the adjacent pairs of values contained in 2 packed
64-bit vectors of |
| _mm_hsub_pi16⚠ | Experimentalx86-64 and ssse3,mmxHorizontally subtracts the adjacent pairs of values contained in 2
packed 64-bit vectors of |
| _mm_hsub_pi32⚠ | Experimentalx86-64 and ssse3,mmxHorizontally subtracts the adjacent pairs of values contained in 2
packed 64-bit vectors of |
| _mm_hsubs_pi16⚠ | Experimentalx86-64 and ssse3,mmxHorizontally subtracts the adjacent pairs of values contained in 2
packed 64-bit vectors of |
| _mm_insert_pi16⚠ | Experimentalx86-64 and sse,mmxCopies data from the 64-bit vector of |
| _mm_loadh_pi⚠ | Experimentalx86-64 and sseSets the upper two single-precision floating-point values with 64 bits of
data loaded from the address |
| _mm_loadl_pi⚠ | Experimentalx86-64 and sseLoads two floats from |
| _mm_madd52hi_epu64⚠ | Experimentalx86-64 and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element of
|
| _mm_madd52lo_epu64⚠ | Experimentalx86-64 and avx512ifma,avx512vlMultiply packed unsigned 52-bit integers in each 64-bit element of
|
| _mm_maddubs_pi16⚠ | Experimentalx86-64 and ssse3,mmxMultiplies corresponding pairs of packed 8-bit unsigned integer values contained in the first source operand and packed 8-bit signed integer values contained in the second source operand, adds pairs of contiguous products with signed saturation, and writes the 16-bit sums to the corresponding bits in the destination. |
| _mm_maskmove_si64⚠ | Experimentalx86-64 and sse,mmxConditionally copies the values from each 8-bit element in the first 64-bit integer vector operand to the specified memory location, as specified by the most significant bit in the corresponding element in the second 64-bit integer vector operand. |
| _mm_max_pi16⚠ | Experimentalx86-64 and sse,mmxCompares the packed 16-bit signed integers of |
| _mm_max_pu8⚠ | Experimentalx86-64 and sse,mmxCompares the packed 8-bit signed integers of |
| _mm_min_pi16⚠ | Experimentalx86-64 and sse,mmxCompares the packed 16-bit signed integers of |
| _mm_min_pu8⚠ | Experimentalx86-64 and sse,mmxCompares the packed 8-bit signed integers of |
| _mm_movemask_pi8⚠ | Experimentalx86-64 and sse,mmxTakes the most significant bit from each 8-bit element in a 64-bit integer vector to create a 16-bit mask value. Zero-extends the value to 32-bit integer and writes it to the destination. |
| _mm_movepi64_pi64⚠ | Experimentalx86-64 and sse2,mmxReturns the lower 64 bits of a 128-bit integer vector as a 64-bit integer. |
| _mm_movpi64_epi64⚠ | Experimentalx86-64 and sse2,mmxMoves the 64-bit operand to a 128-bit integer vector, zeroing the upper bits. |
| _mm_mul_su32⚠ | Experimentalx86-64 and sse2,mmxMultiplies 32-bit unsigned integer values contained in the lower bits of the two 64-bit integer vectors and returns the 64-bit unsigned product. |
| _mm_mulhi_pu16⚠ | Experimentalx86-64 and sse,mmxMultiplies packed 16-bit unsigned integer values and writes the high-order 16 bits of each 32-bit product to the corresponding bits in the destination. |
| _mm_mulhrs_pi16⚠ | Experimentalx86-64 and ssse3,mmxMultiplies packed 16-bit signed integer values, truncates the 32-bit
products to the 18 most significant bits by right-shifting, rounds the
truncated value by adding 1, and writes bits |
| _mm_mullo_pi16⚠ | Experimentalx86-64 and sse,mmxMultiplies packed 16-bit integer values and writes the low-order 16 bits of each 32-bit product to the corresponding bits in the destination. |
| _mm_packs_pi16⚠ | Experimentalx86-64 and mmxConverts packed 16-bit integers from |
| _mm_packs_pi32⚠ | Experimentalx86-64 and mmxConverts packed 32-bit integers from |
| _mm_sad_pu8⚠ | Experimentalx86-64 and sse,mmxSubtracts the corresponding 8-bit unsigned integer values of the two
64-bit vector operands and computes the absolute value for each of the
difference. Then sum of the 8 absolute differences is written to the
bits |
| _mm_set1_epi64⚠ | Experimentalx86-64 and sse2,mmxInitializes both values in a 128-bit vector of |
| _mm_set1_pi8⚠ | Experimentalx86-64 and mmxBroadcasts 8-bit integer a to all all elements of dst. |
| _mm_set1_pi16⚠ | Experimentalx86-64 and mmxBroadcasts 16-bit integer a to all all elements of dst. |
| _mm_set1_pi32⚠ | Experimentalx86-64 and mmxBroadcasts 32-bit integer a to all all elements of dst. |
| _mm_set_epi64⚠ | Experimentalx86-64 and sse2,mmxInitializes both 64-bit values in a 128-bit vector of |
| _mm_set_pi8⚠ | Experimentalx86-64 and mmxSets packed 8-bit integers in dst with the supplied values. |
| _mm_set_pi16⚠ | Experimentalx86-64 and mmxSets packed 16-bit integers in dst with the supplied values. |
| _mm_set_pi32⚠ | Experimentalx86-64 and mmxSets packed 32-bit integers in dst with the supplied values. |
| _mm_setr_epi64⚠ | Experimentalx86-64 and sse2,mmxConstructs a 128-bit integer vector, initialized in reverse order with the specified 64-bit integral values. |
| _mm_setr_pi8⚠ | Experimentalx86-64 and mmxSets packed 8-bit integers in dst with the supplied values in reverse order. |
| _mm_setr_pi16⚠ | Experimentalx86-64 and mmxSets packed 16-bit integers in dst with the supplied values in reverse order. |
| _mm_setr_pi32⚠ | Experimentalx86-64 and mmxSets packed 32-bit integers in dst with the supplied values in reverse order. |
| _mm_setzero_si64⚠ | Experimentalx86-64 and mmxConstructs a 64-bit integer vector initialized to zero. |
| _mm_shuffle_pi8⚠ | Experimentalx86-64 and ssse3,mmxShuffles packed 8-bit integers in |
| _mm_shuffle_pi16⚠ | Experimentalx86-64 and sse,mmxShuffles the 4 16-bit integers from a 64-bit integer vector to the destination, as specified by the immediate value operand. |
| _mm_sign_pi8⚠ | Experimentalx86-64 and ssse3,mmxNegates packed 8-bit integers in |
| _mm_sign_pi16⚠ | Experimentalx86-64 and ssse3,mmxNegates packed 16-bit integers in |
| _mm_sign_pi32⚠ | Experimentalx86-64 and ssse3,mmxNegates packed 32-bit integers in |
| _mm_storeh_pi⚠ | Experimentalx86-64 and sseStores the upper half of |
| _mm_storel_pi⚠ | Experimentalx86-64 and sseStores the lower half of |
| _mm_stream_pi⚠ | Experimentalx86-64 and sse,mmxStores 64-bits of integer data from a into memory using a non-temporal memory hint. |
| _mm_sub_pi8⚠ | Experimentalx86-64 and mmxSubtract packed 8-bit integers in |
| _mm_sub_pi16⚠ | Experimentalx86-64 and mmxSubtract packed 16-bit integers in |
| _mm_sub_pi32⚠ | Experimentalx86-64 and mmxSubtract packed 32-bit integers in |
| _mm_sub_si64⚠ | Experimentalx86-64 and sse2,mmxSubtracts signed or unsigned 64-bit integer values and writes the difference to the corresponding bits in the destination. |
| _mm_subs_pi8⚠ | Experimentalx86-64 and mmxSubtract packed 8-bit integers in |
| _mm_subs_pi16⚠ | Experimentalx86-64 and mmxSubtract packed 16-bit integers in |
| _mm_subs_pu8⚠ | Experimentalx86-64 and mmxSubtract packed unsigned 8-bit integers in |
| _mm_subs_pu16⚠ | Experimentalx86-64 and mmxSubtract packed unsigned 16-bit integers in |
| _mm_unpackhi_pi8⚠ | Experimentalx86-64 and mmxUnpacks the upper four elements from two |
| _mm_unpackhi_pi16⚠ | Experimentalx86-64 and mmxUnpacks the upper two elements from two |
| _mm_unpackhi_pi32⚠ | Experimentalx86-64 and mmxUnpacks the upper element from two |
| _mm_unpacklo_pi8⚠ | Experimentalx86-64 and mmxUnpacks the lower four elements from two |
| _mm_unpacklo_pi16⚠ | Experimentalx86-64 and mmxUnpacks the lower two elements from two |
| _mm_unpacklo_pi32⚠ | Experimentalx86-64 and mmxUnpacks the lower element from two |
| _xabort⚠ | Experimentalx86-64 and rtmForces a restricted transactional memory (RTM) region to abort. |
| _xabort_code | Experimentalx86-64 Retrieves the parameter passed to |
| _xbegin⚠ | Experimentalx86-64 and rtmSpecifies the start of a restricted transactional memory (RTM) code region and returns a value indicating status. |
| _xend⚠ | Experimentalx86-64 and rtmSpecifies the end of a restricted transactional memory (RTM) code region. |
| _xtest⚠ | Experimentalx86-64 and rtmQueries whether the processor is executing in a transactional region identified by restricted transactional memory (RTM) or hardware lock elision (HLE). |
| cmpxchg16b⚠ | Experimentalx86-64 and cmpxchg16bCompares and exchange 16 bytes (128 bits) of data atomically. |
| has_cpuid | Experimentalx86-64 Does the host support the |
| ud2⚠ | Experimentalx86-64 Generates the trap instruction |
Type Definitions
| __mmask16 | Experimentalx86-64 The |