I am currently working on the Model part of Sigma, i.e the World data storage (currently only orientation and absolute position) and their access methods.
I structured the data layout in memory to respect the vectorization paradigm and improve memory transfer bandwith when iterating over the array, and I duplicated the storage in a map to improve responsiveness when doing a request by entity id.
I implemented the "Absolute to Relative" function that is heavily vectorized.
I had to swith to inline asm when using SSE, since the intrinsecs caused an overhead with a additional memory fetch that I could identify (I recommend you the free tool CODEXL if you have an AMD). Since inline asm is compiler-dependant, I am currently producing SSE inline code for gcc (AT&T syntax). doing the same for VS (Intel syntax) will be a TODO. Naturally, if the CPU does not support SSE, the scalar version is used in place. Also, an AVX version should be done also, but I don't have a CPU supporting it.
I have some questions regarding code using CPU capabilities:
- Do we compile a version for each technoly used (i.e SSE, AVX, etc.) or do we detect capabilities at runtime ?
- If the answer to the above is "yes", how do we integrate specific code in the source (#include or or code surrounded by #ifdef #end) ? where to put the source files ?
- Do we stick to a compiler to avoid duplicate builtin code (such as cpu detection code or asm inline) ? Is gcc enough widely used to stick to it ?
- If the answer to the above is "no", how do we integrate specific code in the source (#include or or code surrounded by #ifdef #end) ? where to put the source files ?
- Does someone has a nice cpu detection capability code that compiles and works everywhere ?