This probably needs a stack save/restore feature in QBE. I looked at what clang/LLVM does for this, and they have stacksave
and stackrestore
intrinsic functions. I'm imagining something similar for QBE
%stack =l stacksave
%vla =l alloc8 %size
...
stackrestore %stack
I'm not sure if this would be easy to implement, or if it would be tricky due to interference with QBE's own use of the stack.
I will wait this feature to test PicoLisp (C implementation).
~mcf, can you write up a summary of your thoughts on this feature here? What blocks it, and possible workarounds? I remember it was something with qbe.
I wrote up an example of an implementation that might work for VLAs in a loop:
export function w $main() { @start %iters =l copy 8 %a.sz =l copy 8 %a.0 =l alloc8 %a.sz %i =l copy %iters %sz =l copy %a.sz @loop.start @loop # Ensure that %a is at least as big as %sz %a.resize =w cugel %a.sz, %sz jnz %a.resize, @a.sufficient, @a.resize @a.resize %a.oldsz =l copy %a.sz %a.sz =l copy %sz %a.1 =l alloc8 %a.sz %a.i =l copy 0 @a.copy %a.0.n =l add %a.0, %a.i %a.1.n =l add %a.1, %a.i %a.n =l loadl %a.0.n storel %a.n, %a.1.n %a.i =l add %a.i, 1 %a.d =l sub %a.sz, %a.i jnz %a.d, @a.copy, @a.resized @a.resized %a.0 =l copy %a.1 @a.sufficient call $printf(l $fmt, l %a.0) %i =w sub %i, 1 %sz =l add %sz, 1 jnz %i, @loop, @end @end ret 0 } data $fmt = { b "a.0: %p\n", b 0 }
Obviously the resulting code is going to suck but that's what you get for putting a VLA in a loop.
I updated the description with what I concluded last time I investigated this.
Regarding your proposal, I think that has worst-case quadratic memory usage, which might not be acceptable. For example, something that should be safe, like
for (size_t i = 0; i < 8192; ++i) { char a[i]; f(a, i); }would use 32 MiB on the stack. A possible variation could be to allocate in powers of 2, which should use use at most twice the required size.
I don't think the copy is necessary since VLAs can't be resized in C.
In any case, I don't it's worth trying to emulate full VLAs until QBE has the proper support. For now, supporting VLAs that don't appear under any labels or loop bodies is probably sufficient for most code.
I'm not convinced that qbe would do anything much different from this. Do you think otherwise?
Though +1 to allocating by powers of 2.
Yes, I think QBE should be able to save and restore the stack pointer.
Consider the following program:
#include <stddef.h> size_t x; void g(char *); void f(void) { for (;;) { char a[x]; g(a); } }clang turns this into this LLVM IR:
define dso_local void @f() local_unnamed_addr #0 { br label %1 1: ; preds = %1, %0 %2 = load i64, i64* @x, align 8, !tbaa !2 %3 = call i8* @llvm.stacksave() %4 = alloca i8, i64 %2, align 16 call void @g(i8* nonnull %4) <a href="/~mcf/cproc/1" title="~mcf/cproc#1: Variable-length arrays">#1</a> call void @llvm.stackrestore(i8* %3) br label %1 }
which gets translated to this x86_64 asm:
f: # @f pushq %rbp movq %rsp, %rbp pushq %rbx pushq %rax .LBB0_1: # =>This Inner Loop Header: Depth=1 movq %rsp, %rbx movq x(%rip), %rax movq %rsp, %rdi addq $15, %rax andq $-16, %rax subq %rax, %rdi movq %rdi, %rsp callq g movq %rbx, %rsp jmp .LBB0_1Rather than allocating more and more stack space, the stack pointer is saved at the beginning of the loop body, and restored at the end.
When possible, I would love to help with testing, documentation or anything else that a not-compiler programmer can help with regarding this todo.
I'm trying to compile skalibs, s6 and redo-c with cproc. And all three of these projects require VLAs.
/D. Olsson
If I may interject, but what would be the problems of implementing VLAs as syntactic sugar for the apparently implemented __builtin_alloca, which has a very similar functionality? This seems to be a very straight-forward way to go.
See the earlier discussion. The problem is that alloca and VLAs behave differently in a loop. Memory allocated by alloca gets deallocated when the function returns, but VLAs get deallocated at the end of the block. Naively compiling VLAs as alloca could result in stack overflows in some of the examples above.