Integer Intrinsics Using Streaming SIMD Extensions

The results of each intrinsic operation are placed in registers. The information about what is placed in each register appears in the tables below, in the detailed explanation of each intrinsic. R, R0, R1...R7 represent the registers in which results are placed.

To see detailed information about an intrinsic, click on that intrinsic name in the following table.

The prototypes for Streaming SIMD Extensions (SSE) intrinsics are in the xmmintrin.h header file.The prototypes for Streaming SIMD Extensions (SSE) intrinsics are in the xmmintrin.h header file.

Before using these intrinsics, you must empty the multimedia state for the MMX(TM) technology register. See The EMMS Instruction: Why You Need It for more details.

Intrinsic
Name
Operation Corresponding SSE
Instruction
_mm_extract_pi16 Extract one of four words PEXTRW
_mm_insert_pi16 Insert word PINSRW
_mm_max_pi16 Compute maximum PMAXSW

_mm_max_pu8

Compute maximum, unsigned PMAXUB
_mm_min_pi16 Compute minimum PMINSW
_mm_min_pu8 Compute minimum, unsigned PMINUB
_mm_movemask_pi8 Create eight-bit mask PMOVMSKB
_mm_mulhi_pu16 Multiply, return high bits PMULHUW
_mm_shuffle_pi16 Return a combination of four words PSHUFW
_mm_maskmove_si64 Conditional Store MASKMOVQ
_mm_avg_pu8 Compute rounded average PAVGB
_mm_avg_pu16 Compute rounded average PAVGW
_mm_sad_pu8 Compute sum of absolute differences PSADBW

 

int _mm_extract_pi16(__m64 a, int n)

Extracts one of the four words of a. The selector n must be an immediate.

R
(n==0) ? a0 : ( (n==1) ? a1 : ( (n==2) ? a2 : a3 ) )

 

__m64 _mm_insert_pi16(__m64 a, int d, int n)

Inserts word d into one of four words of a. The selector n must be an immediate.

R0 R1 R2 R3
(n==0) ? d : a0; (n==1) ? d : a1; (n==2) ? d : a2; (n==3) ? d : a3;

 

__m64 _mm_max_pi16(__m64 a, __m64 b)

Computes the element-wise maximum of the words in a and b.

R0 R1 R2 R3
min(a0, b0) min(a1, b1) min(a2, b2) min(a3, b3)

 

__m64 _mm_max_pu8(__m64 a, __m64 b)

Computes the element-wise maximum of the unsigned bytes in a and b.

R0 R1 ... R7
min(a0, b0) min(a1, b1) ... min(a7, b7)

 

__m64 _mm_min_pi16(__m64 a, __m64 b)

Computes the element-wise minimum of the words in a and b.

R0 R1 R2 R3
min(a0, b0) min(a1, b1) min(a2, b2) min(a3, b3)

 

__m64 _mm_min_pu8(__m64 a, __m64 b)

Computes the element-wise minimum of the unsigned bytes in a and b.

R0 R1 ... R7
min(a0, b0) min(a1, b1) ... min(a7, b7)

 

__m64 _mm_movemask_pi8(__m64 b)

Creates an 8-bit mask from the most significant bits of the bytes in a.

R
sign(a7)<<7 | sign(a6)<<6 |... | sign(a0)

 

__m64 _mm_mulhi_pu16(__m64 a, __m64 b)

Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results.

R0 R1 R2 R3
hiword(a0 * b0) hiword(a1 * b1) hiword(a2 * b2) hiword(a3 * b3)

 

__m64 _mm_shuffle_pi16(__m64 a, int n)

Returns a combination of the four words of a. The selector n must be an immediate.

R0 R1 R2 R3
word (n&0x3) of a word ((n>>2)&0x3) of a word ((n>>4)&0x3) of a word ((n>>6)&0x3) of a

 

void _mm_maskmove_si64(__m64 d, __m64 n, char *p)

Conditionally store byte elements of d to address p. The high bit of each byte in the selector n determines whether the corresponding byte in d will be stored.

if (sign(n0)) if (sign(n1)) ... if (sign(n7))
p[0] := d0 p[1] := d1 ... p[7] := d7

 

__m64 _mm_avg_pu8(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned bytes in a and b.

R0 R1 ... R7
(t >> 1) | (t & 0x01), where t = (unsigned char)a0 + (unsigned char)b0 (t >> 1) | (t & 0x01), where t = (unsigned char)a1 + (unsigned char)b1 ... ((t >> 1) | (t & 0x01)), where t = (unsigned char)a7 + (unsigned char)b7

 

__m64 _mm_avg_pu16(__m64 a, __m64 b)

Computes the (rounded) averages of the unsigned short in a and b.

R0 R1 ... R7
(t >> 1) | (t & 0x01), where  t = (unsigned int)a0 + (unsigned int)b0 (t >> 1) | (t & 0x01), where  t = (unsigned int)a1 + (unsigned int)b1 ... (t >> 1) | (t & 0x01), where  t = (unsigned int)a7 + (unsigned int)b7

 

__m64 _mm_sad_pu8(__m64 a, __m64 b)

Computes the sum of the absolute differences of the unsigned bytes in a and b, returning the value in the lower word. The upper three words are cleared.

R0 R1 R2 R3

abs(a0-b0) +... + abs(a7-b7)

0

0

0