~mcf/cproc#1: 
Variable-length arrays

This probably needs a stack save/restore feature in QBE. I looked at what clang/LLVM does for this, and they have stacksave and stackrestore intrinsic functions. I'm imagining something similar for QBE

    %stack =l stacksave
    %vla =l alloc8 %size
    ...
    stackrestore %stack

I'm not sure if this would be easy to implement, or if it would be tricky due to interference with QBE's own use of the stack.

Status
REPORTED
Submitter
~mcf
Assigned to
No-one
Submitted
4 years ago
Updated
9 months ago
Labels
qbe

~mpech 3 years ago

I will wait this feature to test PicoLisp (C implementation).

~sircmpwn 3 years ago

~mcf, can you write up a summary of your thoughts on this feature here? What blocks it, and possible workarounds? I remember it was something with qbe.

~sircmpwn 3 years ago

I wrote up an example of an implementation that might work for VLAs in a loop:

export function w $main() {
@start
	%iters =l copy 8
	%a.sz =l copy 8
	%a.0 =l alloc8 %a.sz
	%i =l copy %iters
	%sz =l copy %a.sz
@loop.start
@loop
	# Ensure that %a is at least as big as %sz
	%a.resize =w cugel %a.sz, %sz
	jnz %a.resize, @a.sufficient, @a.resize
@a.resize
	%a.oldsz =l copy %a.sz
	%a.sz =l copy %sz
	%a.1 =l alloc8 %a.sz
	%a.i =l copy 0
@a.copy
	%a.0.n =l add %a.0, %a.i
	%a.1.n =l add %a.1, %a.i
	%a.n =l loadl %a.0.n
	storel %a.n, %a.1.n
	%a.i =l add %a.i, 1
	%a.d =l sub %a.sz, %a.i
	jnz %a.d, @a.copy, @a.resized
@a.resized
	%a.0 =l copy %a.1
@a.sufficient
	call $printf(l $fmt, l %a.0)

	%i =w sub %i, 1
	%sz =l add %sz, 1
	jnz %i, @loop, @end
@end
	ret 0
}

data $fmt = { b "a.0: %p\n", b 0 }

Obviously the resulting code is going to suck but that's what you get for putting a VLA in a loop.

~mcf 3 years ago

I updated the description with what I concluded last time I investigated this.

Regarding your proposal, I think that has worst-case quadratic memory usage, which might not be acceptable. For example, something that should be safe, like

for (size_t i = 0; i < 8192; ++i) {
	char a[i];
	f(a, i);
}

would use 32 MiB on the stack. A possible variation could be to allocate in powers of 2, which should use use at most twice the required size.

I don't think the copy is necessary since VLAs can't be resized in C.

In any case, I don't it's worth trying to emulate full VLAs until QBE has the proper support. For now, supporting VLAs that don't appear under any labels or loop bodies is probably sufficient for most code.

~sircmpwn 3 years ago

I'm not convinced that qbe would do anything much different from this. Do you think otherwise?

~sircmpwn 3 years ago

Though +1 to allocating by powers of 2.

~mcf 3 years ago

Yes, I think QBE should be able to save and restore the stack pointer.

Consider the following program:

#include <stddef.h>
size_t x;
void g(char *);
void f(void) {
	for (;;) {
		char a[x];
		g(a);
	}
}

clang turns this into this LLVM IR:

define dso_local void @f() local_unnamed_addr #0 {
  br label %1

1:                                                ; preds = %1, %0
  %2 = load i64, i64* @x, align 8, !tbaa !2
  %3 = call i8* @llvm.stacksave()
  %4 = alloca i8, i64 %2, align 16
  call void @g(i8* nonnull %4) <a href="/~mcf/cproc/1" title="~mcf/cproc#1: Variable-length arrays">#1</a>
  call void @llvm.stackrestore(i8* %3)
  br label %1
}

which gets translated to this x86_64 asm:

f:                                      # @f
        pushq   %rbp
        movq    %rsp, %rbp
        pushq   %rbx
        pushq   %rax
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        movq    %rsp, %rbx
        movq    x(%rip), %rax
        movq    %rsp, %rdi
        addq    $15, %rax
        andq    $-16, %rax
        subq    %rax, %rdi
        movq    %rdi, %rsp
        callq   g
        movq    %rbx, %rsp
        jmp     .LBB0_1

Rather than allocating more and more stack space, the stack pointer is saved at the beginning of the loop body, and restored at the end.

D. Olsson 1 year, 1 month ago · edit

When possible, I would love to help with testing, documentation or anything else that a not-compiler programmer can help with regarding this todo.

I'm trying to compile skalibs, s6 and redo-c with cproc. And all three of these projects require VLAs.

/D. Olsson

Nathanael Möcking 9 months ago · edit

If I may interject, but what would be the problems of implementing VLAs as syntactic sugar for the apparently implemented __builtin_alloca, which has a very similar functionality? This seems to be a very straight-forward way to go.

~mcf 9 months ago

See the earlier discussion. The problem is that alloca and VLAs behave differently in a loop. Memory allocated by alloca gets deallocated when the function returns, but VLAs get deallocated at the end of the block. Naively compiling VLAs as alloca could result in stack overflows in some of the examples above.

Register here or Log in to comment, or comment via email.