
Originally Posted by
Bulat Ziganshin
on haswell, unaligned access is free and afair, evex encoded (avx) instructions allow to use unalighned memory operands
It's certainly not true for all Intel architectures with all SSE instructions though, as I did indeed have crashes caused by this. I had code that I optimised like this:
Code:
unsigned char *cp = fp->uncomp_p;
i = 0;
#ifdef ALLOW_UAC
int n = b->len & ~3;
for (; i < n; i+=4) {
//*cp++ = *dat++ + '!';
*(uint32_t *)cp = *(uint32_t *)dat + 0x21212121;
cp += 4;
dat += 4;
}
#endif
for (; i < b->len; i++) {
*cp++ = *dat++ + '!';
}
ALLOW_UAC being a #define on architectures that permit unaligned memory accesses. Basically it's adding '!' to a quality string to turn it from raw binary to printable ascii (FASTQ format conversion), and trying to do the copy + '!' 32-bits at a time. It crashed here as cp and/or dat isn't word aligned.
Code:
0x426a43 <bam_put_seq+3219>: and $0x18,%al
0x426a45 <bam_put_seq+3221>: add $0x1,%r12d
0x426a49 <bam_put_seq+3225>: mov 0x18(%rsp),%rax
0x426a4e <bam_put_seq+3230>: lea 0x0(,%r12,4),%r10d
0x426a56 <bam_put_seq+3238>: lea -0x28(%rbx,%r8,1),%r15
0x426a5b <bam_put_seq+3243>: xor %r8d,%r8d
=> 0x426a5e <bam_put_seq+3246>: movdqa (%rax,%r8,1),%xmm1
0x426a64 <bam_put_seq+3252>: add $0x1,%ebp
0x426a67 <bam_put_seq+3255>: paddd %xmm0,%xmm1
0x426a6b <bam_put_seq+3259>: movups %xmm1,(%r15,%r8,1)
0x426a70 <bam_put_seq+3264>: add $0x10,%r8
0x426a74 <bam_put_seq+3268>: cmp %r12d,%ebp
0x426a77 <bam_put_seq+3271>: jb 0x426a5e <bam_put_seq+3246>
This was generated by gcc 4.9.2 -O3. With -O2 it doesn't do this. I resolved it hackily with a __attribute__((optimize("no-tree-vectorize"))) statement before the function, but it's not particularly ideal! (The alternative is making the loop counter volatile, but that's an even more bizarre workaround.)
Anyway, try compiling with -O2 to see if the problem goes away.