Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 7 years ago .

In c/c++, the explicit vectorization intrinsics provided by immintrin.h , I would argue, is a kludge. That is, for each CPU instruction set (e.g. SSE, AVX2,AVX512,...) and for each number type (i.e. float, double, int, etc), there is a unique function for the same fundamental operations, such as _mm_add_epi8 , _mm_add_epi16 , _mm256_add_epi8 , _mm256_add_epi16 , all for the basic + . So if you code with intrinsics for AVX, you have to recode when you upgrade to AVX2 and then for AVX512 and so on.

On the other hand, some of the compilers basic, built-in operator (i.e. '+', '-', etc) seem to work nicely (provide data is aligned) on all types (vector and not) and some mixed type operations (as illustrated in snippet below), leading to better readability and scalability (with simd instruction set).

// no need for #include "immintrin.h"
#ifndef __AVX2__
#define SIMD_LEN 16
#else
#define SIMD_LEN 32
#endif
typedef int num_t;
num_t a[N], b[N];
// for any num_t and SIMD_LEN, explicitly vectorize b[n] = 2*a[n] + 4 
typedef num_t    vec_t __attribute__ ((__vector_size__ (SIMD_LEN)));
vec_t *vA = (vec_t*)a;
vec_t *vB = (vec_t*)b;
int nNums = SIMD_LEN/sizeof(num_t);
for (int n=0; n < (N/nNums); n++) 
    vB[n] = 2*vA[n] + 4;

Obviously such flexibity/scalability is not going to be available for ALL operations, but it seems that immintrin.h is not structured to allow natural expressions as much as it could.

To that end, is there an alternative "intrinsic" header to the immintrin.h family that allows more natural expression, as illustrated above? At least one that covers many of the universal, scalable op, like horizontal add, unaligned load, compare, etc?

And for the purpose of this question, I'm not interested in "just let the compiler vectorize". That simply answers the question of whether to use intrinsics or not.

gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html are partially gcc-specific, but fairly nice to use. – EOF May 2, 2016 at 15:49 something interesting about gcc vector extension gcc.gnu.org/bugzilla/show_bug.cgi?id=68123 – user3528438 May 2, 2016 at 17:06 I'm aware of built-in vector-extension as shown in ihttps://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html. Indeed, immintrin.h is simply macro and func built on them. What I'm looking for an alternative to immintrin.h that builds upon the same vector-extension but using a different syntax that will not break when migrating code from SSE to AVX to AVX2 to AVX512 to whatever. – codechimp May 3, 2016 at 0:00

The main argument to use a typedef with __attribute__ ((__vector_size__... is, that it produces easier sourcecode.

The main argument to prefer immintrin.h is, that it is less compiler-specific.

You can find out more about the limitations of each by web-searching for the combination of immintrin and gcc vector extension.

In any case, the rest of your application should hardly notice which of them you are using!:

I would try to defer the decision as long as possible by abstracting all of this into a mathvector class/struct. It can have a simple non-vectorized implementation at first. Develop all other parts of your application first. You can then always make the mathvector class become vectorized in the future.

While immintrin.h, as it is, may be less compiler-specific, but it is very much CPU and variable type specific, which is NOT exactly a good trade-off. But in any case, I'm not making an argument in favor of using __attribute.... Indeed, I'm inclined not to use it. But I am looking for something better than that de facto immintrin.h, which in fact is jjust a set of inline func and macro that defines a standard syntax for explicit vectorization thru the use of compiler specific __attribute__.... I'm just unsatisfied with said 'syntax'. – codechimp May 2, 2016 at 23:54