~icefox/garnetc#6: 
Thoughts on backends

About QBE: I've looked at this project as well, and it seems quite nice. Very small at just 10k LOC! A couple of cons though: it generates ASM instead of machine code directly for some reason, and this actually appears to degrade the performance a fair bit. Also, development in upstream entered a hiatus in 2019-05, after which one Michael Forney started maintaining a personal fork at https://github.com/michaelforney/qbe. Now it seems upstream has merged these changes and become a bit more active again, but just as a FYI in case it stalls again, you might want to check out Michael's fork.

Personally, I've become very interested in MIR by vnmakarov on github [https://github.com/vnmakarov/mir]. Similar to QBE, it's quite small at just 15k LOC and aims to have 70% the performance of GCC. It's primarily a JIT, but it does AOT compilation as well. It's being much more actively developed than QBE right now (not that activity is necessarily a measure of quality, but still). This post is a good introduction [https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/], and the README has a section where the author makes some terse comparisons to the alternatives in the same category, including QBE, LIBJIT, CraneLift, and others. [https://github.com/vnmakarov/mir#mir-project-competitors]

More user experiences with QBE: https://briancallahan.net/blog/20210829.html This post got me thinking more about what I actually want to do. Current loose plans:

  • Currently, garnetc compiles to Rust. This Is Fine.
  • When garnetc self-hosts, it will be designed to have multiple backends. This may or may not be a stable API, but should be relatively consistent and easy to get going.
  • The end goal for instruction sets to support will be amd64, arm32, arm64, riscv32, riscv64, wasm32, and wasm64 once it exists. 32-bit x86 would also be nice but IMO adds relatively little value right now: it's not common on desktops, not common on mobile hardware, and not common in embedded hardware. Maybe a compelling reason to support it will show up.
  • I would also like to have a backend that generates C code. This isn't strictly necessary but would be kinda nice for portability: instead of distributing bootstrap binaries you could generate and distribute portable C code, compile it with a C compiler, and then use the resulting garnetc to bootstrap itself from the Garnet sources. A little ironic, considering my goal is to make C entirely obsolete, but still it sounds neat.
  • So, once the compiler is self-hosted, having a QBE backend and a C backend would get us most of these targets. Contributing a wasm backend to the QBE codebase sounds like it would probably not be horrible, and would be good for everyone.
Status
REPORTED
Submitter
~icefox
Assigned to
No-one
Submitted
1 year, 10 months ago
Updated
19 days ago
Labels
T-LATER

~icefox 1 year, 8 months ago

Slurped in from readme file:

Something I need to consider a little is what I want in terms of a compiler backend, since emitting x86_64 opcodes myself basically sounds like the least fun thing ever.

Goals:

  • Not huge
  • Operates pretty fast
  • Outputs pretty good/fast/small code
  • Doesn't require binding to C++ code (pure C may be acceptable)
  • Produces x86_64, ideally also Aarch64 and WASM, SPIR-V would be a nice bonus

Non-goals:

  • Makes best code evar
  • Super cool innovative research project
  • Supports every platform evar, or anything less than 32-bits (it'd be cool, but it's not a goal)

Options:

  • Write our own -- ideal choice in the long run, worst choice in the short run
  • LLVM -- Fails at "not huge", "operates fast" and "doesn't require C++ bindings"
  • Cranelift -- Might actually be a good choice, but word on the street (as of early 2020) is it's poorly documented and unstable. Investigate more. As of 2023 might be in an ok place?
  • QBE -- Fails at "doesn't require C compiler", but initially looks good for everything else. Played around with it a little and it left a bad taste in my mouth for some reasons, but worth revisiting.
  • WASM -- Just output straight-up WASM and use wasmtime to run it. Cool idea in the short term, WASM is easy to output and doesn't need us to optimize it much in theory, and would work well enough to let us bootstrap the compiler if we want to. Much easier to output than raw asm, there's good libraries to output it, and I know how to do it.
  • C -- Just output C Code. The traditional solution, complicates build process, but will work.
  • Rust -- Rust compiles slow but that's the only downside, complicates build process, but will work. Might be useful if we can proof whatever borrow checking type stuff we implement against Rust's

Output Rust for right now, bootstrap the compiler, then think about it.

Trying out QBE and Cranelift both seem reasonable choices, and writing a not-super-sophisticated backend that outputs many targets seems semi-reasonable. Outputting WASM is probably the simplest low-level thing to get started with, but is a little weird since it is kinda an IR itself, so to turn an SSA IR into wasm you need a step such as LLVM's "relooper". Outputting C is similarly fine in the short term and has Problems in the longer term.

~icefox 1 year, 7 months ago*

Cranelift now advertises itself as "production ready", so while it's not Stable or Complete it's probably a reasonable place to start.

Potential long-term gotchas:

  • "Cranelift does not provide assemblers and disassemblers, so it is not necessary to be able to represent every weird instruction in an ISA. Only those instructions that the code generator emits have a representation." So making an inline assembler will be Tricky.
  • No wasm output, might or might not ever add it given that its primary use is to compile wasm to machine code. Ah, nope, looks like it's on the wishlist: https://github.com/bytecodealliance/wasmtime/issues/2566
  • "Integer types are limited to powers of two from i8 to i64"
  • "Addresses are represented as integers---There are no Cranelift pointer types. ... Cranelift may add a single address type too."
  • More sophisticated arch-specific features like SIMD or x86 addressing modes are present but don't appear super powerful yet.

~icefox 1 year, 2 months ago

Oh, check this shit out: https://github.com/vnmakarov/mir

If it lives up to its claims, I'm fukkin sold.

~icefox 19 days ago

Register here or Log in to comment, or comment via email.