~icefox/garnet#52: 
Packages and namespaces and stuff

There’s a very basic proof-of-concept for namespaces/packages that let you break a program up over several files, but nothing particularly functional. It will work basically like Rust’s modules, give or take, and that’s just fine. On the other hand I really want to make Garnet compile fast, and that means being able to parallelize every stage of compilation possible, so the package system may shake out a little differently than Rust’s “glom all files together into one compilation unit”. Need to do some more research there; Zig has some crazy and awesome ideas.

It has been pointed out that ML modules are better for parallel compilation than typeclasses are. Typeclasses require global knowledge to enforce the orphan rules. ML modules just need signatures for modules to be exported in some form other packages can read, so you have to walk the whole program only once to produce those, then once that's done compilation of each file or whatever can be done entirely in parallel.

This is orthogonal to having packages/namespaces actually be ML modules, though. The barrier there is basically that if they are, it involves adding associated consts and types to structs. If they aren't, then structs remain simpler.

Status
REPORTED
Submitter
~icefox
Assigned to
No-one
Submitted
1 year, 1 day ago
Updated
5 months ago
Labels
T-DESIGN

~icefox 11 months ago

~matklad any thoughts on this matter with regard to ML modules? It seems something you might have opinions on. Basically, my question right now is whether it's worth it to be able to have type definitions in our structs/modules, a la associated types. It's an ingredient in the conventional ML module approach, but I personally find it rather annoying to understand and think about it and so far haven't managed to write any code that needs them which can't just be written via normal generics and functors. (At least, it works in my head, basically any types in the module get hoisted out into type params that are inputs to the functor, and the functor can return another functor if it feels like it.)

But if files/packages are modules, then our modules need to be able to include type definitions. Not really any way around that I can see. (And constant values, but that's a lot simpler.) But once you allow that you start getting into messes with HKT's and that whole Thing, and I just don't want to handle it right now.

~matklad 11 months ago

Yup, a whole bunch of opinions here!

I guess the top-level one is that I think that "modules as a unint of polymorphism" and "modules as a unit of compilation/distribution" are almost entirely orthogonal concepts, and they only accidentally share the same name. Eg, "should files be ML modules" is the same as "should files be structs" (they are in Zig) or "should files be classes" (I guess they could have been in Java).

I have more more confident thoughts about CU-modules than ML-modules. I think Rust gets CU-modules almost right, that's what I'd keep and what I've changed:

  • keep two-level structure with crates and modules (but I'd maybe just call them "libraries" and "modules"). Having a library as first-class concept is poweful. Allowing hierarchical namespaces within a library leads to reasonably-sized libraries.

  • keep libraries anonymous, acyclic, and with explicitly declared dependencies. This helps compilation and logical organization. In large projects (eg, rust-analyzer), it's very helpful to be able to say "this crate should not (transitively) depend on that crate".

  • keep tree-based module structure within a library, keep cyclic dependencies between modules (but see https://lobste.rs/s/vx8hbs/rust_module_system_encourages_poor for some arguments why flat structure within a library might be better). You need hierarchy and cyclic deps at some granularity, CU seems the right one.

  • promote pub(crate) visibility, it absolutely should be one keyword. I would suggest the following set of visibility: api for "visible outside the current CU", equivalent to Rust's pub with -Dunreachable-pub, pub for "visible throughout CU" (= pub(crate)), and nothing for "visible in current file".

  • remove "imports are items". That is, imports should not be visible in submodules. Otherwise, it's just confusing and makes name resolution hard (you need protect from import importing itself. Basically, Rust name res is a fixed point iteration loop, and that seems like gratitious accidental complexity).

  • if meta-programming is syntax-based (syntactic proc macros), it should happen before a tree structure of modules is assembled and imports are resolved. You should be able to first expand derives in parallel in every module, and then figure out who imports what.

  • remove mod foo; items, infer modules paths from the file system. This is less confusing for the users, and faster to compile.

  • remove ::, use . throughout. The type/value separation is confusing, increases syntactic soup, and doesn't actually work in Rust.

  • aim for the following compilation model within CU:

    • use parallel waldir to list all files in CU and read them in parallel
    • sequentially assemble all files into a tree
    • for each file in parallel, parse it
    • for each file in parallel, apply syntactic meta progarm, if you have proc macros
    • for each file in parallel, compute a map of symbols defined in file
    • for each import in parallel, resolve import to a symbol (here we use that imports themselves are not symbols)
    • for each symbol, resolve types it mentions (eg, for fn foo(bar: baz::Quux)), we first register that we have function foo in this file, then we resolve all imports in all files, and then we resolve bar::Quux to what it points two.

    I think we don't need any sequential/synchronizing phase besides "assemble modules into a tree", everything else looks like it can be embarassingly parallel.

  • add header/signature files. Make it compiler's job to parse the whole CU and then spit out a single .api file, which is added to git, committed to VCS and is a part of a published package. The .api file gives "at a glance" view of the API of a particular library, it's essentaily rustdoc, but you don't need a browser, as it's just a single text file. Additionally, .api enables parallel compillation, and compilation firewall (if .api doesn't change, no need to recompiel downstream)

  • use signatures for a following compilation model of many CUs:

    • using explicit graph of CUs (lockfile), download all CUs in parallel
    • compile ".api" files following the graph. This isn't embarrassingly parallel (parallelism is limited by graph's critical path), but it is very fast, because .api files are a fraction of .impl files.
    • now that .api are compiled, compile implementaiton of each CU in-parallel. This is not in the graph order, as CU impls depend only on APIs, not on other impls.
    • link the stuff together.

This compilation model is pretty much "lets add a bunch of indireaction everywhere so that we actually have .api files due to stable ABI, and also let's rewrite linkers", not sure if it'll work for garnet.

Now, whether to integrate this all with ML-modules is unclear. Zig is nice that "everything is a struct" (eg, files are structs), so you could do that. But also, in Zig, it gets confsing whereth stuff is a static field or not:

struct {
  this_is_an_instace_field: u32,
  const this_is_a_const_in_the_type: u32 = 92;
  fn this_is_another_name_in_the_type() void {}
}

If I go for "everything is a module", then I'd probably don't allow attaching methods to structs syntactically. So, something like this:

module M {
    struct Point { x: i32, y: i32 } // structs can _only_ declare methods
  
    // This is `M.get_x`, not `Point.get_x`, but we use some sort of ADL-shaped syntactic sugar to make `p.get_x()` syntax work. 
    fun get_x(self: Point) -> i32 { self.x }
}

~icefox 11 months ago

Oh heck, thanks. That was a lot more than I expected! Most of which I agree is a quite good design.

I think that "modules as a unit of polymorphism" and "modules as a unit of compilation/distribution" are almost entirely orthogonal concepts, and they only accidentally share the same name.

Hard agree here. I'm just trying to figure out how they should overlap, if at all. There's also "modules as a unit of code organization", a la Rust's modules, which is orthogonal to all the above -- what I call "namespaces" here. Generally agree with most of the stuff after that.

I would suggest the following set of visibility: api for "visible outside the current CU", equivalent to Rust's pub with -Dunreachable-pub, pub for "visible throughout CU" (= pub(crate)), and nothing for "visible in current file".

Oooh, that is not a bad set of keywords. I might ponder using pub for "visible outside the current CU" because that's what it means in lots of other langage, and maybe loc (local) or something for pub(crate). But the siren song of api as a keyword is strong, especially with its semantic link to signature files using the .api extension.

Basically, Rust name res is a fixed point iteration loop, and that seems like gratitious accidental complexity

C'monnnnn, it's guaranteed to terminate, that's all that really matters right??? :-P

if meta-programming is syntax-based (syntactic proc macros), it should happen before a tree structure of modules is assembled and imports are resolved. You should be able to first expand derives in parallel in every module, and then figure out who imports what.

Oooh, that is a nice approach. I'm not particularly sold on proc macros as Rust does them, but I do want syntactic macros of some kind or another and I require some #[derive]-like functionality, and unless I think of anything better it may as well be macros. (Templates and Zig-style comptime are arguably ways of getting something kinda like the same functionality but I don't know as much about them.) So knowing we can make macro expansion work in parallel is nice. Though if you expand derives before resolving imports, what do you do if you derive something defined in a separate file?

remove mod foo; items, infer modules paths from the file system. This is less confusing for the users, and faster to compile.

Yeah, ezpz. Cargo's canonical crate layout has taken some time to get there, but IMO it's now good enough you don't need the metadata included.

remove ::, use . throughout.

Agree.

aim for the following compilation model within CU:

Okay you've thought about this in a lot more detail than I have! I'm sure that parts of optimization, inlining and linking aren't going to be embarrassingly parallel, or may be bottlenecked on a critical path similar to how crate compilation sometimes is. But that's a separate problem and one amenable to different solutions.

add header/signature files

I was low key intending for this data to just be distributed as a section within a .lib file, but either way works for me. It's just a matter of whether you prioritize machine readability and ease of distribution, or human readability and tinker-ability. Similar but not the same tradeoff as embedded debug symbols vs. a .pdb file... which time has shown to slowly churn in favor of the .pdb file, now that I think of it.

if .api doesn't change, no need to recompile downstream

Still need to re-link, which may involve LTO/cross-module inlining, but yes.

lets add a bunch of indireaction everywhere so that we actually have .api files due to stable ABI,...

Agree, it's just a tough problem for the backend to actually take all that indirection and then get rid of it where possible to produce highly optimized binaries. But it sounds like Swift and Zig are fighting the same problem, and are having some success, so we're not tackling it alone. Worst case we have blazing-fast Debug build times and Rust-slow Release build times, at least until we put in enough work make better linkers/whatever.

"...so that we actually have .api files due to stable ABI, and also let's rewrite linkers", not sure if it'll work for garnet.

I have low-key wished for a tool many times in Rust that could tell you whether or not you've changed your public API, so having that ability built in to the compiler and/or build system sounds pretty good. I don't especially want to rewrite linkers, but it may end up a little necessary no matter what and I'm kinda resigned to doing it sooner or later.

Now, whether to integrate this all with ML-modules is unclear...

Dammit, that was like my one real question :-P

But nothing described above really has anything to do with ML-modules at all, besides the existence of signatures, so it seems safe to say that there's no reason to merge them for now. Zig's "everything is a struct" is indeed nice, and that's where I was basically going with this, but it's also possible for things to be so consistent and mirror-like that they become confusing. Looking at a type and having to ask yourself "okay is this being used as like a runtime struct, or a compile-time ML-module, or a file's namespace, or some kind of cursed combination of all three?" sounds like a potential hazard.

...but we use some sort of ADL-shaped syntactic sugar to make p.get_x() syntax work.

What do you mean by "ADL-shaped"? Not familiar with "ADL" in this context. I do want a p.get_x() syntax to work, or maybe a Lua-like UFCS like p:get_x() desugaring into Something.get_x(Something, p), but I haven't yet figured out a good way for that to exist yet. I had a solution for an earlier ML-module-ish system but it ended up getting thrown away during the Great Typechecker Wars of '21, and re-exploring it has been fairly low priority.

~matklad 11 months ago

C'monnnnn, it's guaranteed to terminate

Not really! In the presence of macros, the thing we are iterating a fixed-point on isn’t actually a monotonic function. So it terminates by virtue of various dirty hacks, and not because the model naturally terminates.

Though if you expand derives before resolving imports, what do you do if you derive something defined in a separate file?

I don’t understand the question, so I’ll just tell what I know :x) I think two models can be compiled with some amount of reasonableness.

In the first model, input to a proc macro is strictly source code of a specific item. As proc macro looks only at a single item in a single file, you can run this in parallel, before any semantic analysis is done,

In the second model, input to a proc-macro is a fully analyzed crate, with types and such. In this model, you first compile code “normally”, and then run proc macros. And there’s some sort of fixed point thing going on, where proc-macro dumps outputs to some file, and, if the file is updated, the whole compilation is restarted. Or you can imagine that the user writes signatures, and the proc macro only ever provides impls. https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/source-generators-overview is similar.

I was low key intending for this data to just be distributed as a section within a .lib

Yeah, as long as .api is a part of distributed artifact, that works.

What do you mean by "ADL-shaped"?

Argument-dependent lookup, also known as koenig search. That’s C++ hack with fives the same power as traits/modules — ability to extend types with generic functionality outside of the type itself. The specific thing I’ve ment here is this:

Using structs for both fields and methods is confusing: fields belong to an instant, methods belong to a type. If you do modules, you can keep structs only for fields, and keep “methods” as top-level functions, and add sugar a-la “functions defined in the same module as struct can be called using dot syntax”.

~icefox 11 months ago

I don’t understand the question...

Basically, if we have a compilation unit with file A that defines a macro and file B that uses the macro, it feels like we'd have to fully or at least mostly process file A before we can progress past the parsing stage for file B. I suppose it doesn't really matter whether it's a proc macro or something else, I think this applies to any macro?

We could put the macro definition in file A's .api file, but now the .api files contain more or less arbitrary code definitions and become a less happy and simple thing. ...Though I suppose that if we want to have a lib that exports a macro for other things to use we need some way of doing that no matter what. This is turning into a general-purpose problem with macros I guess.

Though if we always compile macros to native code functions that just accepts an AST and returns a new one, macro signatures can be defined in an .api file and implemented in the compiled .lib file. It feels slightly insane to make macros be literal compiler plugins, but also like a perfect distillation of what a macro actually is.

Argument-dependent lookup, also known as koenig search.

Aha, interesting. Sounds somewhat like Lua's method syntax sugar, but done at compile time with namespaces instead of at runtime with vtables/metatables. Anything C++ does is automatically suspect, but it's certainly an idea to ponder. If I'm understanding it correctly we do get something like this:

-- file1.gt
type Thing = struct ... end
fn foo(t Thing) = ... end

-- file2.gt
use file1
let t = file1.Thing { ... }
t:foo()  -- desugars to file1.foo(t) since Thing is defined in file1

That's not quite how C++ does it, but C++ also has an entire essay on how it DOES do it, soooo yeah. As presented it's not a perfect solution but handy for common use cases.

Another alternative to have something like Rust impl blocks, where you can just attach any ol' functions to any ol' type's namespace. It doesn't really mesh well with Garnet as I imagine it right now, but there's nothing fundamentally wrong with it. You could use it to optionally construct methods you can call with value:method() that desugar to calling some_file.method(implicit SomeModule, value) and proceeds to look for the implicit module arg as normal.

...anyway, this is getting a tad off-topic XD

~icefox referenced this from #53 11 months ago

~matklad 11 months ago

Basically, if we have a compilation unit with file A that defines a macro and file B that uses the macro

Yeah, I sort of assume that you don’t want to keep macros in the same CU, as that creates all kinds of problems with order of compilation.

~safinaskar 9 months ago*

matklad:

Make it compiler's job to parse the whole CU and then spit out a single .api file, which is added to git, committed to VCS and is a part of a published package

I don't like this. Because I trust in principle "generated files should never be added to VCS". For example, because this will add unnecessary info to git diffs and will make merges more difficult

~matklad 9 months ago

I don’t know how this would actually play out if implemented, but I have a strong suspicion that ability to review “change of the interface” would actually be a great force multiplier for larger projects. Interface changes are costly, and automatically flagging them for more careful review makes much easier to triage PRs, and to explain system-level considerations during the actual review.

I am 0.8 sure that this thing has big drawback and big benefits, so applying a ready-made principle probably won’t give the correct answer for the correct reason here.

~icefox 9 months ago

Because I trust in principle "generated files should never be added to VCS".

idk, adding .lock files to VCS for an application is pretty reasonable in my experience. On the other hand adding binary files to VCS always sucks, even when it's the best solution to a problem. On the third hand, checking stuff into a VCS that is trivial to automatically reproduce from the stuff already there is pretty unhelpful. On the fourth hand--

...Ok, I think I'll just agree with matklad that this will have big drawbacks and big benefits and we don't necessarily know what they are yet. The only things I've seen in production that are similar to this in concept are C/C++ .h files, which are such a terrible hassle they're a great warning of what not to do, and OCaml .mli files, which I don't have much experience with at scale. Will have to try them out in practice and see what happens.

~matklad 9 months ago

From today's Go blocg, it seems they build something similar

https://go.dev/blog/compat#api

https://github.com/golang/go/blob/master/api/go1.txt

~icefox referenced this from #46 6 months ago

~akavel 5 months ago

The only things I've seen in production that are similar to this in concept are C/C++ .h files, which are such a terrible hassle they're a great warning of what not to do (...)

IMO they're a hassle because you need to freaking write and maintain them by hand. OCaml's .mli files were always suspicious to me for the same reason - though I didn't go further than hello-world in OCaml. I would assume at the time C was invented (not sure about OCaml, seems cvs already existed, but maybe was not yet popular enough?), there was no good tooling around versioning, so letting compiler auto-modify something so important as an "api file" could be seen as too risky; also at the time of early C, several hours of highly-educated human labor were probably still way cheaper than a bunch of CPU cycles. I remember what a revolutionary idea the wiki was, how long it took me to grasp and internalize that letting people from around the world "break a website" is totally fine when there are diffs and history. Personally, storing and versioning an auto-generated .api file sounds super cool to me! that would be an awesome item to scrutinize over, Go taught me the value of good APIs. And yet it doesn't have to be mandatory for people - those who don't want to scrutinize them could add them to .gitignore instead. There could be an option to either generate them into source directories for those who appreciate this, or into a non-versioned build artifacts directory for those who don't.

Register here or Log in to comment, or comment via email.