Streaming SIMD Extensions Intrinsics Implementation
Regular Streaming SIMD Extensions (SSE) intrinsics work on 4 32-bit
single precision values. On ItaniumŪ-based systems, basic operations like
add and compare require two SIMD instructions. All
can be executed in the same cycle so the throughput is one basic SSE operation
per cycle or 4 32-bit single precision operations per cycle.
Key to the table entries
- A = Expected to give significant performance gain
over non-intrinsic-based code equivalent.
- B = Non-intrinsic-based source code would be better;
the intrinsic's implementation may map directly to native instructions
but they offer no significant performance gain.
- C = Requires contorted implementation for particular
microarchitecture. Will result in very poor performance if used.