This probably needs a stack save/restore feature in QBE. I looked at what clang/LLVM does for this, and they have stacksave
and stackrestore
intrinsic functions. I'm imagining something similar for QBE
%stack =l stacksave
%vla =l alloc8 %size
...
stackrestore %stack
I'm not sure if this would be easy to implement, or if it would be tricky due to interference with QBE's own use of the stack.
I will wait this feature to test PicoLisp (C implementation).
~mcf, can you write up a summary of your thoughts on this feature here? What blocks it, and possible workarounds? I remember it was something with qbe.
I wrote up an example of an implementation that might work for VLAs in a loop:
export function w $main() { @start %iters =l copy 8 %a.sz =l copy 8 %a.0 =l alloc8 %a.sz %i =l copy %iters %sz =l copy %a.sz @loop.start @loop # Ensure that %a is at least as big as %sz %a.resize =w cugel %a.sz, %sz jnz %a.resize, @a.sufficient, @a.resize @a.resize %a.oldsz =l copy %a.sz %a.sz =l copy %sz %a.1 =l alloc8 %a.sz %a.i =l copy 0 @a.copy %a.0.n =l add %a.0, %a.i %a.1.n =l add %a.1, %a.i %a.n =l loadl %a.0.n storel %a.n, %a.1.n %a.i =l add %a.i, 1 %a.d =l sub %a.sz, %a.i jnz %a.d, @a.copy, @a.resized @a.resized %a.0 =l copy %a.1 @a.sufficient call $printf(l $fmt, l %a.0) %i =w sub %i, 1 %sz =l add %sz, 1 jnz %i, @loop, @end @end ret 0 } data $fmt = { b "a.0: %p\n", b 0 }
Obviously the resulting code is going to suck but that's what you get for putting a VLA in a loop.
I updated the description with what I concluded last time I investigated this.
Regarding your proposal, I think that has worst-case quadratic memory usage, which might not be acceptable. For example, something that should be safe, like
for (size_t i = 0; i < 8192; ++i) { char a[i]; f(a, i); }would use 32 MiB on the stack. A possible variation could be to allocate in powers of 2, which should use use at most twice the required size.
I don't think the copy is necessary since VLAs can't be resized in C.
In any case, I don't it's worth trying to emulate full VLAs until QBE has the proper support. For now, supporting VLAs that don't appear under any labels or loop bodies is probably sufficient for most code.
I'm not convinced that qbe would do anything much different from this. Do you think otherwise?
Though +1 to allocating by powers of 2.
Yes, I think QBE should be able to save and restore the stack pointer.
Consider the following program:
#include <stddef.h> size_t x; void g(char *); void f(void) { for (;;) { char a[x]; g(a); } }clang turns this into this LLVM IR:
define dso_local void @f() local_unnamed_addr #0 { br label %1 1: ; preds = %1, %0 %2 = load i64, i64* @x, align 8, !tbaa !2 %3 = call i8* @llvm.stacksave() %4 = alloca i8, i64 %2, align 16 call void @g(i8* nonnull %4) <a href="/~mcf/cproc/1" title="~mcf/cproc#1: Variable-length arrays">#1</a> call void @llvm.stackrestore(i8* %3) br label %1 }
which gets translated to this x86_64 asm:
f: # @f pushq %rbp movq %rsp, %rbp pushq %rbx pushq %rax .LBB0_1: # =>This Inner Loop Header: Depth=1 movq %rsp, %rbx movq x(%rip), %rax movq %rsp, %rdi addq $15, %rax andq $-16, %rax subq %rax, %rdi movq %rdi, %rsp callq g movq %rbx, %rsp jmp .LBB0_1Rather than allocating more and more stack space, the stack pointer is saved at the beginning of the loop body, and restored at the end.
When possible, I would love to help with testing, documentation or anything else that a not-compiler programmer can help with regarding this todo.
I'm trying to compile skalibs, s6 and redo-c with cproc. And all three of these projects require VLAs.
/D. Olsson
If I may interject, but what would be the problems of implementing VLAs as syntactic sugar for the apparently implemented __builtin_alloca, which has a very similar functionality? This seems to be a very straight-forward way to go.
See the earlier discussion. The problem is that alloca and VLAs behave differently in a loop. Memory allocated by alloca gets deallocated when the function returns, but VLAs get deallocated at the end of the block. Naively compiling VLAs as alloca could result in stack overflows in some of the examples above.
I had a silly idea. For-loop bodies can be thought of as closures, where the environment is the set of local variables in a function. The body of a for-loop could be extracted into a separate function which takes an parameter to the local variables in the function containing the loop. This would take care of saving and restoring the stack.
For nested for-loops, additional parameters could be added to point to the environment of each nested block.
Of course this has a lot of downsides, but I think qbe is technically already capable of implementing VLAs without the memory overhead.
Maybe this could work, but I think it should actually be pretty easy to add support for VLAs to QBE, just a matter of syntax.
In terms of implementing in cproc, I think the best order would be:
- Add support for variably modified types, such as
int (*array)[f()]
. To do this, we'd add a new fieldstruct expr *expr
to the array struct instruct type
, and perhaps astruct value *size
field as well. The size would be computed by evaluating the expression at the appropriate time (I'd need to do some more reading of the spec). Several operators, such assizeof
, and pointer arithmetic would have to be changed to use the dynamic size instead of the constant size in the type.- Most uses of VLAs occur at the top-level of the function, so we should be able to support those easily with just one alloc.
- The general case is what needs QBE support. I think stacksave/stackrestore instructions would be the best interface for frontends.
Michael Forney referenced this ticket in commit 1e66f1c.
Added #88 to track the VLA loop issue. For now, we will just ignore it to unblock VLA support.
Nihal Jere referenced this ticket in commit 4f206ac.