Multimedia Additions

The prototypes for these intrinsics are in the ia64intrin.h header file.

For detailed information about an intrinsic, click on the intrinsic name in the following table.

Intrinsic	Operation	Corresponding Itanium Instruction
_m64_czx1l	Compute Zero Index	czx1.l
_m64_czx1r	Compute Zero Index	czx1.r
_m64_czx2l	Compute Zero Index	czx2.l
_m64_czx2r	Compute Zero Index	czx2.r
_m64_mix1l	Mix	mix1.l
_m64_mix1r	Mix	mix1.r
_m64_mix2l	Mix	mix2.l
_m64_mix2r	Mix	mix2.r
_m64_mix4l	Mix	mix4.l
_m64_mix4r	Mix	mix4.r
_m64_mux1	Permutation	mux1
_m64_mux2	Permutation	mux2
_m64_padd1uus	Parallel add	padd1.uus
_m64_padd2uus	Parallel add	padd2.uus
_m64_pavg1_nraz	Parallel average	pavg1
_m64_pavg2_nraz	Parallel average	pavg2
_m64_pavgsub1	Parallel average subtract	pavgsub1
_m64_pavgsub2	Parallel average subtract	pavgsub2
_m64_pmpy2r	Parallel multiply	pmpy2.r
_m64_pmpy2l	Parallel multiply	pmpy2.l
_m64_pmpyshr2	Parallel multiply and shift right	pmpyshr2
_m64_pmpyshr2u	Parallel multiply and shift right	pmpyshr2.u
_m64_pshladd2	Parallel shift left and add	pshladd2
_m64_pshradd2	Parallel shift right and add	pshradd2
_m64_psub1uus	Parallel subtract	psub1.uus
_m64_psub2uus	Parallel subtract	psub2.uus

__int64 _m64_czx1l(__m64 a)

The 64-bit value a is scanned for a zero element from the most significant element to the least significant element, and the index of the first zero element is returned. The element width is 8 bits, so the range of the result is from 0 - 7. If no zero element is found, the default result is 8.

__int64 _m64_czx1r(__m64 a)

The 64-bit value a is scanned for a zero element from the least significant element to the most significant element, and the index of the first zero element is returned. The element width is 8 bits, so the range of the result is from 0 - 7. If no zero element is found, the default result is 8.

__int64 _m64_czx2l(__m64 a)

The 64-bit value a is scanned for a zero element from the most significant element to the least significant element, and the index of the first zero element is returned. The element width is 16 bits, so the range of the result is from 0 - 3. If no zero element is found, the default result is 4.

__int64 _m64_czx2r(__m64 a)

The 64-bit value a is scanned for a zero element from the least significant element to the most significant element, and the index of the first zero element is returned. The element width is 16 bits, so the range of the result is from 0 - 3. If no zero element is found, the default result is 4.

__m64 _m64_mix1l(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 1-byte groups, starting from the left, as shown in Figure 1, and return the result.

__m64 _m64_mix1r(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 1-byte groups, starting from the right, as shown in Figure 2, and return the result.

__m64 _m64_mix2l(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 2-byte groups, starting from the left, as shown in Figure 3, and return the result.

__m64 _m64_mix2r(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 2-byte groups, starting from the right, as shown in Figure 4, and return the result.

__m64 _m64_mix4l(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 4-byte groups, starting from the left, as shown in Figure 5, and return the result.

__m64 _m64_mix4r(__m64 a, __m64 b)

Interleave 64-bit quantities a and b in 4-byte groups, starting from the right, as shown in Figure 6, and return the result.

__m64 _m64_mux1(__m64 a, const int n)

Based on the value of n, a permutation is performed on a as shown in Figure 7, and the result is returned. Table 1 shows the possible values of n.

Table 1. Values of n for m64_mux1 Operation

	n
@brcst	0
@mix	8
@shuf	9
@alt	0xA
@rev	0xB

__m64 _m64_mux2(__m64 a, const int n)

Based on the value of n, a permutation is performed on a as shown in Figure 8, and the result is returned.

__m64 _m64_pavgsub1(__m64 a, __m64 b)

The unsigned data elements (bytes) of b are subtracted from the unsigned data elements (bytes) of a and the results of the subtraction are then each independently shifted to the right by one position. The high-order bits of each element are filled with the borrow bits of the subtraction.

__m64 _m64_pavgsub2(__m64 a, __m64 b)

The unsigned data elements (double bytes) of b are subtracted from the unsigned data elements (double bytes) of a and the results of the subtraction are then each independently shifted to the right by one position. The high-order bits of each element are filled with the borrow bits of the subtraction.

__m64 _m64_pmpy2l(__m64 a, __m64 b)

Two signed 16-bit data elements of a, starting with the most significant data element, are multiplied by the corresponding two signed 16-bit data elements of b, and the two 32-bit results are returned as shown in Figure 9.

__m64 _m64_pmpy2r(__m64 a, __m64 b)

Two signed 16-bit data elements of a, starting with the least significant data element, are multiplied by the corresponding two signed 16-bit data elements of b, and the two 32-bit results are returned as shown in Figure 10.

__m64 _m64_pmpyshr2(__m64 a, __m64 b, const int count)

The four signed 16-bit data elements of a are multiplied by the corresponding signed 16-bit data elements of b, yielding four 32-bit products. Each product is then shifted to the right count bits and the least significant 16 bits of each shifted product form 4 16-bit results, which are returned as one 64-bit word.

__m64 _m64_pmpyshr2u(__m64 a, __m64 b, const int count)

The four unsigned 16-bit data elements of a are multiplied by the corresponding unsigned 16-bit data elements of b, yielding four 32-bit products. Each product is then shifted to the right count bits and the least significant 16 bits of each shifted product form 4 16-bit results, which are returned as one 64-bit word.

__m64 _m64_pshladd2(__m64 a, const int count, __m64 b)

a is shifted to the left by count bits and then is added to b. The upper 32 bits of the result are forced to 0, and then bits [31:30] of b are copied to bits [62:61] of the result. The result is returned.

__m64 _m64_pshradd2(__m64 a, const int count, __m64 b)

The four signed 16-bit data elements of a are each independently shifted to the right by count bits (the high order bits of each element are filled with the initial value of the sign bits of the data elements in a); they are then added to the four signed 16-bit data elements of b. The result is returned.

__m64 _m64_padd1uus(__m64 a, __m64 b)

a is added to b as eight separate byte-wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_padd2uus(__m64 a, __m64 b)

a is added to b as four separate 16-bit wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_psub1uus(__m64 a, __m64 b)

a is subtracted from b as eight separate byte-wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_psub2uus(__m64 a, __m64 b)

a is subtracted from b as four separate 16-bit wide elements. The elements of a are treated as unsigned, while the elements of b are treated as signed. The results are treated as unsigned and are returned as one 64-bit word.

__m64 _m64_pavg1_nraz(__m64 a, __m64 b)

The unsigned byte-wide data elements of a are added to the unsigned byte-wide data elements of b and the results of each add are then independently shifted to the right by one position. The high-order bits of each element are filled with the carry bits of the sums.

__m64 _m64_pavg2_nraz(__m64 a, __m64 b)

The unsigned 16-bit wide data elements of a are added to the unsigned 16-bit wide data elements of b and the results of each add are then independently shifted to the right by one position. The high-order bits of each element are filled with the carry bits of the sums.