"UB only if the hardware doesn't like it" sounds like you want to shift the complexity from the developers who know the problem domain best to the packagers.
As soon as the thing is packaged to run on an raspberry or something else that doesn't like it, it will start to generate CVEs and be a major pain.
This shouldn't ever be a security vulnerability, outside of perhaps denial of service from segfaults (though I'm pretty sure you'd find hardware with no page faults before finding one with pages less than 4KB; and of course, if you wanted to not be hard-coding 4KB, a compiler providing a "minimum page size" constant for the target architecture should be possible, and could return 1 on page-less hardware). But, yes, as with many optimizations, getting them wrong could end up badly.
For the case of specific vector extensions that imply specific cache line sizes, and loads that do not span multiple cache lines, I don't think you could run into issues.
As soon as the thing is packaged to run on an raspberry or something else that doesn't like it, it will start to generate CVEs and be a major pain.