Bogus code is clobbering a global variable.
TEST:
bug-37afb963-f2f2-46a3-b222-7cd883af7539.c
RESULT:
FAIL
SRC:
void printf();
long a;
typedef struct b {
int c;
long d;
unsigned e;
} S;
S f;
S g() { }
int main() {
int h = &h == 0;
f = g();
printf("%llx\n", a);
}
CC-OUTPUT:
QBE_PATH:
/nix/store/cr39hzmlxpic2zml505ashd2vpj0ypy4-single-exec/test.qbe
ASSEMBLY_PATH:
/nix/store/cr39hzmlxpic2zml505ashd2vpj0ypy4-single-exec/test.s
TEST-OUTPUT-DIFF:
--- output.expected 2019-02-21 03:22:02.885670528 +0000
+++ output.actual 2019-02-21 03:22:02.970670696 +0000
@@ -1 +0,0 @@
-0
EXIT-CODE:
139
I thought if we make return an assignment to an alloca then return that alloca, then zero init all allocas, it might fix this. Though should probably work out the root cause.
Since this struct is longer than 16 bytes, it is passed in memory. The caller (
main
) allocates storage for the struct, and passes it in$rdi
. However, the callee (g
) is expected to return that same address in$rax
, but since we are currently just emitting a plainret
, it probably just ends up reading from whatever address$rax
pointed to previously.On my system I get a segfault. I'm not really sure how it could end up modifying
a
, since it only reads from that address.In any case, I think the fix you suggested is a reasonable way to solve this.
Fix applied. We can't return an uninitialized struct filled with garbage anymore because we zero init it.
I reopened this issue as #12 was reverted, it can be retested after qbe fixes.
Isn't this program triggering undefined behavior anyway? One possible solution could be to return the constant 0 on fall-through returns for non-void functions. What do you think?
Yeah, this is undefined behavior.
I think returning 0 on fall-through returns is a good idea. Though I haven't thought through what would happen for functions returning structs.
It looks like it is only undefined if the caller uses the result. i.e. if
main
didn't assign the result ofg()
tof
, the example would be fine.
I checked the standard and you're right, undefined behavior is triggered by the assignment. But even then, I think cc+qbe are not doing anything wrong because as far as I understand the clobber happens at the assignment in
main
.I now remember that
ret
with no argument is actually explicitly allowed in qbe to permit the compilation of fall-through returns. Usingret 0
is actually not an idea as good as I thought because that would make the test program crash insideg
; this behavior would violate the standard in case the return value is not used by the caller.That "dynamic" aspect of C about fall-through returns is really badly broken: 1.
return;
in a function which has a non-void return type is inconsistent with falling through; and 2. any function with "return type" T is actually a function which, upon return, either gives you a T back or void...
Strangely, I did not get a crash with
ret 0
. It gets compiled to.text .globl g g: pushq %rbp movq %rsp, %rbp movq %rdi, %rax leaq 16(%rip), %rcx movq (%rcx), %rcx movq %rcx, 16(%rax) leaq 8(%rip), %rcx movq (%rcx), %rcx movq %rcx, 8(%rax) leaq 0(%rip), %rcx movq (%rcx), %rcx movq %rcx, 0(%rax) leave ret /* end function g */
I'm not quite sure, but I think it may be copying instructions of
g
into the result and not dereferencingNULL
as I expected. But anyway, this probably doesn't matter since as we realized, we can't crash ing
.For 1,
return;
is not allowed in a non-void function, and it causes a constraint error (6.8.6.4p1). We just error out in this case.For 2, does this cause any problems in practice? In QBE, we would have
%result =:b.1 call $g()
regardless of whether the result is used, so we just need to make sure that this does not do anything bad ifg
didn't return anything and result is not used later on. If there is an issue, maybe it be resolved if QBE just always setsrax
to the caller passedrdi
on an emptyret
when the return type has class MEMORY? It looks like this is what gcc does, and is even required by the ABI:If the type has class MEMORY, ... On return
%rax
will contain the address that has been passed in by the caller in%rdi
.
fwiw, I ran these tests through compcert -interp and clang -fsanitize=undefined which supposedly captures all undefined behavior as an error, though I didn't read the standard myself.
I guess it depends on what it means to "use" a value. The standard says:
If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
To clarify, my points 1 and 2 were mostly rants about the standard, not questions or anything really actionable.
The behavior of
ret 0
is indeed bogus, but it's a bug that only triggers when accessing memory at a constant location (e.g., like one would do when writing an OS), so I'll postpone these memory references are truly used.Returning in
rax
therdi
passed seems like a good idea. I'll see if that's easy to support.