Available on x86-64 and target feature 
avx only.Expand description
Loads 256-bits (composed of 8 packed single-precision (32-bit)
floating-point elements) from memory into result.
mem_addr does not need to be aligned on any particular boundary.