Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed
7 years ago
.
In c/c++, the explicit vectorization intrinsics provided by
immintrin.h
, I would argue, is a kludge. That is, for each CPU instruction set (e.g. SSE, AVX2,AVX512,...) and for each number type (i.e. float, double, int, etc), there is a unique function for the same fundamental operations, such as
_mm_add_epi8
,
_mm_add_epi16
,
_mm256_add_epi8
,
_mm256_add_epi16
, all for the basic
+
. So if you code with intrinsics for AVX, you have to recode when you upgrade to AVX2 and then for AVX512 and so on.
On the other hand, some of the compilers basic, built-in operator (i.e. '+', '-', etc) seem to work nicely (provide data is aligned) on all types (vector and not) and some mixed type operations (as illustrated in snippet below), leading to better readability and scalability (with simd instruction set).
// no need for #include "immintrin.h"
#ifndef __AVX2__
#define SIMD_LEN 16
#else
#define SIMD_LEN 32
#endif
typedef int num_t;
num_t a[N], b[N];
// for any num_t and SIMD_LEN, explicitly vectorize b[n] = 2*a[n] + 4
typedef num_t vec_t __attribute__ ((__vector_size__ (SIMD_LEN)));
vec_t *vA = (vec_t*)a;
vec_t *vB = (vec_t*)b;
int nNums = SIMD_LEN/sizeof(num_t);
for (int n=0; n < (N/nNums); n++)
vB[n] = 2*vA[n] + 4;
Obviously such flexibity/scalability is not going to be available for ALL operations, but it seems that immintrin.h
is not structured to allow natural expressions as much as it could.
To that end, is there an alternative "intrinsic" header to the immintrin.h
family that allows more natural expression, as illustrated above? At least one that covers many of the universal, scalable op, like horizontal add, unaligned load, compare, etc?
And for the purpose of this question, I'm not interested in "just let the compiler vectorize". That simply answers the question of whether to use intrinsics or not.
–
–
–
The main argument to use a typedef with __attribute__ ((__vector_size__...
is, that it produces easier sourcecode.
The main argument to prefer immintrin.h
is, that it is less compiler-specific.
You can find out more about the limitations of each by web-searching for the combination of immintrin
and gcc vector extension
.
In any case, the rest of your application should hardly notice which of them you are using!:
I would try to defer the decision as long as possible by abstracting all of this into a mathvector class/struct. It can have a simple non-vectorized implementation at first. Develop all other parts of your application first. You can then always make the mathvector class become vectorized in the future.
–