Thursday, May 5, 2011

FLAGS dependencies on x86

One big annoyance I have with x86 in my work on DynamoRIO is the special FLAGS register. You can't read or write to it like a normal register, and 2/3 of the bits are reserved or always 0 or 1. The only easy way to get flags in or out is to use pushf and popf, but in DynamoRIO we can't use those without switching off the application stack, since it could be doing things like using the x86_64 Linux red zone. So, we try to avoid clobbering FLAGS in inserted code whenever possible. For example:

You wanted to subtract a constant? Instead of "sub $-1, %rax" try "lea -1(%rax), %rax". Works for add, too.

With the more complex base, displacement, index, scale, and offset form of the x86 memory operand, you can get even more fancy, accomplishing small shifts and multiplies and register to register adds using the LEA instruction. You can even get a 3-operand add like so:
lea 0(%rax, %rbx, 1), %rcx

Which adds %rax and %rbx and stores the result in %rcx. Another old trick is multiplication by 5, which works like so:
lea 0(%rax, %rax, 4), %rax

This is powerful enough to perform register to register adds and simple arithmetic with constants, but we can't yet subtract a register from another. Unfortunately, SUB and NEG both touch FLAGS. However, the NOT instruction does not. This means, assuming %rax is dead, we can rewrite "sub %rax, %rbx" like so:
not %rax
lea 1(%rax, %rbx, 1), %rbx

Remembering how two's complement works, negation is equivalent to a complement and add one. Addition is commutative, so we can fold the increment into the displacement field of the LEA.

Finally, you can also use JRCXZ (JECXZ in 32-bit land) to perform a jump if RCX is zero, without looking at flags. Putting it all together, you can implement equality checks without changing flags like so (assuming one of %rdi or %rsi and %rcx are dead or have been saved):

Flags code:
cmp %rdi, %rsi
je .Llabel

No flags code:
not %rsi
lea 1(%rdi, %rsi, 1), %rcx
jrcxz .Llabel

I haven't checked, but I imagine the compare is faster. However, in DynamoRIO, we have to save the flags for the cmp, which requires three instructions to save and to restore. With that factored in, I have a feeling the no flags version may be preferable. I will have to implement the no-flags transformation and see if it performs better.

Unfortunately, the toy client I am developing with has code like this:
lea -1(%rdx), %eax
test %rax, %rdi
jz .Llabel

Which performs an alignment check (assuming rdx is a power of two). I'm not sure how to implement an arithmetic AND (which is what TEST does) without touching flags.