~icefox/garnet#7: 
Thoughts on matches/tree-walking (and data in general)

Ok, got it. Semantically, what I want is a sum type S, and an arbitrary function f from S to whatever, where it does a match on one of S's variants and does something to it, and ignores the rest. Then I want to compose multiple f's together and prove they don't overlap on what variants they affect, without having to write out giant bloody match statements for each application.

This is the visitor pattern, but ML style. Both the visitor pattern and lots-of-functions-with-giant-bloody-match-statements accomplish this, but both involve writing a ton of annoying boilerplate.

I'm envisioning something like:

enum S {
    Foo{ x: i32, y: String},
    Bar{ a: f32, b: Box<S>,
}

fn f(S::Foo{x, y}) -> ... { ... }
fn g(S::Bar{a, b}) -> ... { ... }

visit(some_s, [f, g]);

Currently in any language I'm aware of, you have to do something like:

fn f(s: S) -> ... {
    match s {
        S::Foo{x, y} -> do_stuff_with(x, y),
        x -> x
    }
}
fn g(s: S) -> ... {
    match s {
        S::Bar{a, b} -> do_stuff_with(a, b),
        x -> x
    }
}
fn visit(s: S, visitor_funcs: &[...]) -> ... {
    visitor_funcs.fold(s, |old_s, f| f(old_s))
}
// or
fn visit(s: S) {
    match s {
        S::Foo{..} -> f(s),
        S::Bar{..} -> g(s),
    }
}

Plus a lot more, which gets old pretty quickly.

Status
REPORTED
Submitter
~icefox
Assigned to
No-one
Submitted
1 year, 3 months ago
Updated
4 months ago
Labels
T-THOUGHTS

~icefox 1 year, 3 months ago

Related:

Evrey: @icefox Make them tuple variants with structs inside, then you at least don't have to unpack the individual fields. icefox: That's an improvement but it still just shuffles the tedious bookkeeping around.

I think Garnet is going to have C-Like enums, then Real Enums that only take a single value, plus a RealEnum::discriminant(&self) -> CLikeEnum method, and also some way of going the other way around. And structs and tuples that are interchangable.

I guess I'm thinking that if you have the ability to do SumType{a,b,c} <-> CLikeEnum, {a|b|c} and struct {a:A,b:B,c:C} <-> tuple (A,B,C) then a lot of tedious bookkeeping vanishes, or at least can be automated. Maybe add something like Python's swizzle operators too.

Differentiating "C Like enums" as enum and "Real sum types" as sum seems like a good idea.

~icefox 1 year, 1 month ago

What I Want for enums is something like this:

sum S {
    Foo{ x: i32, y: String},
    Bar{ a: f32, b: Box<S>},
}
// Produces something like this automatically:
enum S_Discriminant {
    Foo,
    Bar
}

impl S {
    fn discriminant(&self) -> S_Discriminant { ... }
    fn from_parts(disc: S_Discriminant, data: ???) -> Self { ... }
}

Is data there a Union? Maybe. Seems reasonable so far. Key properties that CLike enums have: they can always be turned into numbers, they can sometimes be used as bitsets, they are always Copy, it is possible to turn a limited set of numbers into them...

~icefox 1 year, 1 month ago

Oh, they also can always be ordered and hashed, it is possible to construct a list/set of all of them, that list/set has a size known at compile time, so you can iterate over every variant...

~icefox 6 months ago*

Okay, our types are as follows:

  • enum. Ordered integers with names. What the ordering is can be specified but doesn't matter to the type system. Whether or not there's helpers to treat them as bitsets also doesn't matter right now. Fixed size, though not all enums might be the same size. Always can map sensibly to an integer. Always can attempt to turn an integer into an enum, might fail at runtime.
  • tuple. An ordered sequence of different values that may be different types. Each element is referred to by index.
  • struct. An ordered sequence of different values that may be different types. Each element is referred to by name. Forcing structs to be ordered is a tradeoff. On the one hand you remove packing/optimization options that Rust does, on the other hand you gain the ability to reason about struct layouts.
  • sum. An enum plus a struct or a tuple.

So this means that we can do tuple <-> struct and we can do (enum, tuple/struct) <-> sum. This also means that if we have structs with a known layout then we can make structs that share fields and that sort of thing. Vtables can simply be structs that contain structs.

Things still consider: bitfields, Erlang-like bit syntax.

~icefox 5 months ago

On the topic of struct data layout and the pro's and cons of such: https://queue.acm.org/detail.cfm?id=3212479 is fantastic.

~icefox 5 months ago

Julia and Zig apparently both have interesting approaches to tagged enums, investigate.

~araspik 4 months ago

The envisioned code you've provided here makes one fundamental assumption: that enumeration values are consecutive integers from 0 to something. I think that's why there's no direct shortcut available in Rust: that assumption can't be taken for granted, as you can specify custom values that may not be in order. Writing it out manually also enforces that you associate the right variant to the right function.

Another option you have is to use a compile-time for-loop over the available variants, deciding which function to use by indexing a compile-time list (or iterating over a zip pairing the variants to the functions, if that's your thing). In D, at least, this would be implemented with something like static foreach (although D doesn't have sum types).

~icefox 4 months ago*

My intended assumption is a little broader: enumeration values can be mapped to unique but maybe non-consecutive integers. This brings a few other assumptions: enumerations are ordered, you can have successor and predecessor operators that let you iterate over the enumeration members, etc. That basically gets you metaprogramming on enums that's just as good as your metaprogramming on integer values... if that makes any sense at all.

There's really two use cases for enums: either you care about the number they represent and are using them as a set of constants, or you don't care and the compiler can do whatever it feels like. Struct layout is similar: either you care because it's exposed to the ABI or has to match the layout of another struct defined elsewhere, or you don't care and the compiler can do whatever it feels like. Rust is basically built on the assumption that you never care, and repr(whatever) is the exception; it would be nice to be able to have a bit more support for the former use case.

~araspik 4 months ago

Under that assumption, your code would have to work with a static foreach kind of construct, so that iteration over the enumeration variants is achieved. There (IMO) isn't a better way to express this.

The two use cases you've highlighted need to be separated, as they're used in typically quite different contexts. I think Zig pulls this off nicely with enum and a union of an enum, but the syntax needs to be re-evaluated because it involves quite a bit of boilerplate (in particular, all the variants need to be named twice). With Rust, I think a basic, consistent ABI for aggregate types should have been specified a long time ago, probably based on (or the same as) the SysV ABI - that would significantly aid inter-language operation. As a rule, the less implementation-defined stuff, the better.

~icefox 4 months ago*

Under that assumption, your code would have to work with a static foreach kind of construct, so that iteration over the enumeration variants is achieved.

If I'm understanding what you mean correctly, you could represent this at runtime. Just make each enum automatically generate an EnumType::VALUES constant that is an ordered array of the enum's values. If your compiler's const optimizations are good enough I don't see why this couldn't be just as fast as making it a special case.

The two use cases you've highlighted need to be separated, as they're used in typically quite different contexts.

Exactly, and I'm hoping that having the ability to pack together and unpack the discriminant from a sum type and manipulate them separately we can get the best of both worlds. The "arbitrary discriminant for a sum type" use case is easy, the "set of related integer constants" use case is easy, and with some fairly straightforward screwing around you can write code that translates from one to the other.

~araspik 4 months ago

The benefit of requiring this at compile-time is that you don't need to generate and save the VALUES array in program memory - unless, of course, you're providing RTTI and want to include enum values in it (although IMO there's no point in doing so as you can't use an enum without an understanding of what it does, which means that you already know the enum in your code i.e. at compile-time).

Register here or Log in to comment, or comment via email.