Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Clang and GCC have a differing approach to Intrinsics, and Clang is more likely than GCC to deviate from the Intel guide's specified opcodes & algorithms, and this is particularly noticeable with AVX-512 instructions. It's understandable given their respective architectures. Sometimes the result is an improvement, sometimes it is a detriment.

A couple of years ago I worked on a heavily vectorized project that was intended to compile with either, and wound up maintaining inline asm and .S in the repository for specific targets alongside the C reference version. That made for some ugly Makefile shenanigans, and also meant including benchmarking as part of the test suite. It adds up to considerable maintenance burden, so the takeaway for me was that using Intrinsics as a low-level means to improve on the autovectorizer should be only very sparingly considered.

Edit to add: quick example, from my notes during that project, https://godbolt.org/z/T4Pjhrz5d ; the GCC output is what was expected, the Clang output was a surprise, and noticeably slower in practice, even when inlined. When looped (or similarly if unrolled), uiCA clocks it at 7 cycles to GCC's 4, and this was borne out by their benchmark performance in application code, in which this function was performed a few billion times in the course of a brute-forcing algorithm (which is to say, it mattered). I recall finding other issues where a dive into the LLVM codebase suggested that Clang 16 might be entirely unable to issue some masked AVX-512 instructions due to internal refactorings.



I've run into the same behavior with clang and intrinsics. Well, I appreciate the fact that they're trying to optimize the intrinsics usage, there really does need to be a flag or pragma you can pass that says more along the lines of "no really, give me what I asked for." In some cases I have found that the code it produces is a significant pessimization from what I had selected.


Did you file bugs with testcases?


Sadly didn’t have time (this was not a funded project and I am far from sufficiently up to speed on LLVM internals). I still hope to get around to writing something up during the next break.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: