Missing return in function returning struct

Bogus code is clobbering a global variable.

void printf();
long a;

typedef struct b {
  int c;
  long d;
  unsigned e;
} S;

S f;

S g() { }

int main() {
  int h = &h == 0;
  f = g();
  printf("%llx\n", a);

--- output.expected     2019-02-21 03:22:02.885670528 +0000
+++ output.actual       2019-02-21 03:22:02.970670696 +0000
@@ -1 +0,0 @@
Assigned to
3 years ago
3 years ago
No labels applied.

~ach 3 years ago

I thought if we make return an assignment to an alloca then return that alloca, then zero init all allocas, it might fix this. Though should probably work out the root cause.

~mcf 3 years ago

Since this struct is longer than 16 bytes, it is passed in memory. The caller (main) allocates storage for the struct, and passes it in $rdi. However, the callee (g) is expected to return that same address in $rax, but since we are currently just emitting a plain ret, it probably just ends up reading from whatever address $rax pointed to previously.

On my system I get a segfault. I'm not really sure how it could end up modifying a, since it only reads from that address.

In any case, I think the fix you suggested is a reasonable way to solve this.

~ach REPORTED FIXED 3 years ago

Fix applied. We can't return an uninitialized struct filled with garbage anymore because we zero init it.

~ach FIXED REPORTED 3 years ago

~ach 3 years ago

I reopened this issue as #12 was reverted, it can be retested after qbe fixes.

~qcx 3 years ago

Isn't this program triggering undefined behavior anyway? One possible solution could be to return the constant 0 on fall-through returns for non-void functions. What do you think?

~mcf 3 years ago

Yeah, this is undefined behavior.

I think returning 0 on fall-through returns is a good idea. Though I haven't thought through what would happen for functions returning structs.

~mcf 3 years ago

It looks like it is only undefined if the caller uses the result. i.e. if main didn't assign the result of g() to f, the example would be fine.

~qcx 3 years ago

I checked the standard and you're right, undefined behavior is triggered by the assignment. But even then, I think cc+qbe are not doing anything wrong because as far as I understand the clobber happens at the assignment in main.

I now remember that ret with no argument is actually explicitly allowed in qbe to permit the compilation of fall-through returns. Using ret 0 is actually not an idea as good as I thought because that would make the test program crash inside g; this behavior would violate the standard in case the return value is not used by the caller.

That "dynamic" aspect of C about fall-through returns is really badly broken: 1. return; in a function which has a non-void return type is inconsistent with falling through; and 2. any function with "return type" T is actually a function which, upon return, either gives you a T back or void...

~mcf 3 years ago

Strangely, I did not get a crash with ret 0. It gets compiled to

.globl g
	pushq %rbp
	movq %rsp, %rbp
	movq %rdi, %rax
	leaq 16(%rip), %rcx
	movq (%rcx), %rcx
	movq %rcx, 16(%rax)
	leaq 8(%rip), %rcx
	movq (%rcx), %rcx
	movq %rcx, 8(%rax)
	leaq 0(%rip), %rcx
	movq (%rcx), %rcx
	movq %rcx, 0(%rax)
/* end function g */

I'm not quite sure, but I think it may be copying instructions of g into the result and not dereferencing NULL as I expected. But anyway, this probably doesn't matter since as we realized, we can't crash in g.

For 1, return; is not allowed in a non-void function, and it causes a constraint error ( We just error out in this case.

For 2, does this cause any problems in practice? In QBE, we would have %result =:b.1 call $g() regardless of whether the result is used, so we just need to make sure that this does not do anything bad if g didn't return anything and result is not used later on. If there is an issue, maybe it be resolved if QBE just always sets rax to the caller passed rdi on an empty ret when the return type has class MEMORY? It looks like this is what gcc does, and is even required by the ABI:

If the type has class MEMORY, ... On return %rax will contain the address that has been passed in by the caller in %rdi.

~ach 3 years ago

fwiw, I ran these tests through compcert -interp and clang -fsanitize=undefined which supposedly captures all undefined behavior as an error, though I didn't read the standard myself.

~mcf 3 years ago

I guess it depends on what it means to "use" a value. The standard says:

If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.

~qcx 3 years ago

To clarify, my points 1 and 2 were mostly rants about the standard, not questions or anything really actionable.

The behavior of ret 0 is indeed bogus, but it's a bug that only triggers when accessing memory at a constant location (e.g., like one would do when writing an OS), so I'll postpone these memory references are truly used.

Returning in rax the rdi passed seems like a good idea. I'll see if that's easy to support.

Register here or Log in to comment, or comment via email.