Available on x86-64 and target feature 
fma only.Expand description
Multiplies the lower single-precision (32-bit) floating-point elements in
a and b, and add the intermediate result to the lower element in c.
Stores the result in the lower element of the returned value, and copy the
3 upper elements from a to the upper elements of the result.