Available on x86-64 and target feature 
avx only.Expand description
Moves single-precision floating point values from a 256-bit vector
of [8 x float] to a 32-byte aligned memory location. To minimize
caching, the data is flagged as non-temporal (unlikely to be used again
soon).