~icefox/garnet#59: 
Internal vs external iteration

We're getting to the point where we both want for loops and actually can implement them sanely, so lemme think about them a bit more. I'd always assumed we would go the Rust route and just have an iterator module/interface, and go from there. So, what gets called "external iterators". But in a recent post from Graydon's blog:

[I didn't want] Library-defined containers, iteration and smart pointers. Containers, iteration and indirect-access operations (along with arithmetic) comprise the inner loops of most programs and so optimizing their performance is fairly paramount; if you make them all do slow dispatch through user code the language will never go fast. If user code is involved at all, then, it has to be inlined aggressively. The other option, which I wanted, was for these to be compiler builtins open-coded at the sites of use, rather than library-provided. No "user code" at all (not even stdlib). This is how they were originally in Rust: vec and str operations were all emitted by special-case code in the compiler. There's a huge argument here over costs to language complexity and expressivity and it's too much to relitigate here, but .. I lost this one and I still mostly disagree with the outcome.

[I also didn't want] Exterior iteration. Iteration used to be by stack / non-escaping coroutines, which we also called "interior" iteration, as opposed to "exterior" iteration by pointer-like things that live in variables you advance. Such coroutines are now finally supported by LLVM (they weren't at the time) and are actually a fairly old and reliable mechanism for a linking-friendly, not-having-to-inline-tons-of-library-code abstraction for iteration. They're in, like, BLISS and Modula-2 and such. Really normal thing to have, early Rust had them, and they got ripped out for a bunch of reasons that, again, mostly just form "an argument I lost" rather than anything I disagree with today. I wish Rust still had them. Maybe someday it will!

So, I'm still leaning into external iteration but let's ponder it more concretely instead of just blindly accepting it. I do not want vec and str operations to be built into the compiler, since I don't want memory allocation built into the compiler. It has to live in lib code, because the point of this language is to be able to write memory allocators. Soooo the aggressive inlining seems like the way to go.

Interior iteration is, as far as I understand it, basically Ruby-style things.for_each(|x| do_stuff_with(x)). The trick to it is that if you just have a closure in it like Rust's iterator adapters, and you want to have your loop body break out of the loop, or even return from it, then you can't do that with just a closure.

things.for_each(fn(x) =
    if x > 3 then break end    -- Basically can't do anything because the inner closure doesn't know it's in a loop
    else if x < -3 then return end  -- Basically a "continue", since it just returns from the inner closure, not the `for_each()`, let alone the enclosing function.
end)

This means that your for_each() function has to do something wonky like have the inner function return an Option or Result that alters the control flow it takes. See Rust iterators' map_while() and other combinators that do exactly this, and other things (Unity's coroutines?) that likewise treat control flow as a message you have to send to something that interprets it.

The problem with coroutines is that the borrow checker basically forces the control flow of dynamic memory allocation and deallocation to be in the shape of a tree with roots it can reason about at compile time. So when control flow leaves one branch of the tree to move on to the next then it knows for a fact that either all memory allocated has been freed and there are no references to it anymore, or an object from that branch has been moved/attached to an object created higher in the current branch. And all that is done with compile-time analysis. So making a language where you can basically snip off, rearrange, and jump between branches of the control flow tree at runtime sounds like a fantastic way to make it an impossible problem.

However, "stack / non-escaping coroutines" sounds like it could be a great way to make this problem tractable again... If I can find enough info on them to actually understand wtf is going on, since what I remember of Modula-2 coroutines implies they're allocated on the heap. The only languages that appears to do things kinda like this are Swift and Kotlin? Maybe. Research problem time! I'd love to add coroutines if I can figure out how to make them not suck the way Rust's async does.

Status
REPORTED
Submitter
~icefox
Assigned to
No-one
Submitted
1 year, 9 months ago
Updated
1 year, 6 months ago
Labels
T-DESIGN

~icefox 1 year, 9 months ago

Rust design notes for coroutines: https://lang-team.rust-lang.org/design_notes/general_coroutines.html

Lots of interesting discussion, but it's also just a single-level yield a la Python generators. As technomancy says, single-level yield is better than nothing but it's pretty nerfed.

~icefox 1 year, 6 months ago

Discussion of Rust async+coroutine design: https://lobste.rs/s/6fjkeh/why_async_rust ; the original article talks about internal vs. external iteration among other things. Basically Rust's async is at a local minimum with lots of interlocking features, and changing one thing will usually involve changing lots of other things. It may be possible to do a slightly cleaner design, for example with a Move trait rather than Pin, but anyway.

Sebastian Davidson 1 year, 6 months ago · edit

Hello,

I saw your issue on implementing "internal iteration" (non-escaping coroutines), and it made me want to do some research myself. I went and found some resources on how interior iteration was compiled, and I found sources specifically for CLU and Sather. CLU was the first language to implement iterators using these interior coroutines, it seems. I put the most useful-seeming resources first, and the less useful-seeming ones at the end.

"Advanced Programming Language Design." This is a whole book, not just a paper. Chapter 2 covers coroutines in Simula and CLU, and their rough implementations, with pictures of the control stack drawn. In those languages, iterators were implemented with local stack-allocated coroutines, rather than implementing iterators as a class with explicit state written by the programmer kept across method calls. That seems like the kind of iteration that Graydon is talking about here. Link: https://www.researchgate.net/profile/Raphael-Finkel/publication/220692467_Advanced_programming_language_design/links/0c96052af3e324bf31000000/Advanced-programming-language-design.pdf

"Aspects of implementing CLU" is another paper that covers how iterators (and other constructs) were implemented in CLU. It's a short read. I think it's covered by Chapter 2 of the book I linked at the top, though. https://dl.acm.org/doi/pdf/10.1145/800127.804079

"Iterator abstraction in Sather": https://dl.acm.org/doi/pdf/10.1145/225540.225541 This paper describes how iterators work in Sather and CLU, which Sather took inspiration from. However, it doesn't talk about the implementation at all. You could take it as inspiration of how the syntax might look in Garnet. This is the same PDF, but it seems more readable: https://omohundro.files.wordpress.com/2009/03/murer_omohundro95_iteration_abstraction_in_sather.pdf

"Control Mechanisms for Generators in Icon": https://www.proquest.com/docview/303100477 This covers both the "two-stack" and "one-stack" coroutine models implemented in the Icon programming language. I think the "one-stack" model is what you're looking for.

"BLISS: A Language for Systems Programming" describes similar CLU-style coroutines a little bit: https://dl.acm.org/doi/pdf/10.1145/362919.362936

This paper also might be useful (coroutines in Java): https://dl.acm.org/doi/10.1145/1852761.1852765

Finally, there's also this old 1980 book I found, called "Coroutines: A Programming Methodology, a Language Design and an Implementation": https://digital.library.adelaide.edu.au/dspace/bitstream/2440/21027/2/02whole.pdf I don't know how useful it will be.

Best, Sebastian

Sebastian Davidson 1 year, 6 months ago · edit

Sorry to email twice, but I found one last link.

"The Implementation of Retention in a Coroutine Environment" https://sci-hub.se/https://doi.org/10.1007/BF00265556 This paper describes an algorithm for inferring at compile-time whether a coroutine's activation record must be allocated on the heap, or if it can be allocated on the caller's stack. The algorithm is to create a called-by graph and mark components whose instances may be suspended. Since you want Garnet to have non-escaping coroutines, you could, in theory, use it to reject programs with escaping coroutines.

I've sent a lot of old papers. Hopefully this is okay; I just did this on a whim.

On Mon, Oct 30, 2023 at 1:58 AM Sebastian Davidson sodb03@gmail.com wrote:

Hello,

I saw your issue on implementing "internal iteration" (non-escaping coroutines), and it made me want to do some research myself. I went and found some resources on how interior iteration was compiled, and I found sources specifically for CLU and Sather. CLU was the first language to implement iterators using these interior coroutines, it seems. I put the most useful-seeming resources first, and the less useful-seeming ones at the end.

"Advanced Programming Language Design." This is a whole book, not just a paper. Chapter 2 covers coroutines in Simula and CLU, and their rough implementations, with pictures of the control stack drawn. In those languages, iterators were implemented with local stack-allocated coroutines, rather than implementing iterators as a class with explicit state written by the programmer kept across method calls. That seems like the kind of iteration that Graydon is talking about here. Link: https://www.researchgate.net/profile/Raphael-Finkel/publication/220692467_Advanced_programming_language_design/links/0c96052af3e324bf31000000/Advanced-programming-language-design.pdf

"Aspects of implementing CLU" is another paper that covers how iterators (and other constructs) were implemented in CLU. It's a short read. I think it's covered by Chapter 2 of the book I linked at the top, though. https://dl.acm.org/doi/pdf/10.1145/800127.804079

"Iterator abstraction in Sather": https://dl.acm.org/doi/pdf/10.1145/225540.225541 This paper describes how iterators work in Sather and CLU, which Sather took inspiration from. However, it doesn't talk about the implementation at all. You could take it as inspiration of how the syntax might look in Garnet. This is the same PDF, but it seems more readable: https://omohundro.files.wordpress.com/2009/03/murer_omohundro95_iteration_abstraction_in_sather.pdf

"Control Mechanisms for Generators in Icon": https://www.proquest.com/docview/303100477 This covers both the "two-stack" and "one-stack" coroutine models implemented in the Icon programming language. I think the "one-stack" model is what you're looking for.

"BLISS: A Language for Systems Programming" describes similar CLU-style coroutines a little bit: https://dl.acm.org/doi/pdf/10.1145/362919.362936

This paper also might be useful (coroutines in Java): https://dl.acm.org/doi/10.1145/1852761.1852765

Finally, there's also this old 1980 book I found, called "Coroutines: A Programming Methodology, a Language Design and an Implementation": https://digital.library.adelaide.edu.au/dspace/bitstream/2440/21027/2/02whole.pdf I don't know how useful it will be.

Best, Sebastian

~icefox 1 year, 6 months ago

Hi Sebastian, nice to meet you! Thanks for the research, these are some pretty interesting sources. It's interesting that they seem to be mostly old; has nobody seriously looked at internal iteration since 1995? Though now that I think of it there's only two mainstream-ish languages I know of that use it heavily: Ruby and Io. But Io isn't actually used for much real that I am aware of, and for some reason people don't seem to talk about Ruby's iterators in those terms very often? Or maybe they do and I just don't know about it, I am really not connected to the Ruby culture very much.

Something else that occurred to me while reading your info is that something like internal iteration is in fact extremely common in functional language iteration combinators... map, fold, filter, etc. They're just all special cases with more-or-less-hardwired control flow, which actually makes them quite useful sometimes 'cause you don't have to think too hard about what they're doing. So maybe not many languages think about internal iteration using coroutines, 'cause having a bunch of special-case combinators with closures does the job reasonably well. On the flip side I think that part of the selling point for internal iteration is being able to trivially write these combinators yourself and have them easily slot together instead of it being a carefully-thought-out exercise in API design.

Sebastian Davidson 1 year, 6 months ago · edit

It's interesting that they seem to be mostly old; has nobody seriously looked at internal iteration since 1995? It surprised me, too. Looking into languages like CLU and BLISS has made me think "Why didn't they ever add non-escaping coroutines in C99, or C11?" I think it's mostly that C and C++ didn't have it, and the managed languages (like the JVM) had runtimes that didn't support coroutines, so newer languages like Kotlin compiled them to state machines instead. Python could theoretically do this for its generators, but from what I read, neither CPython nor PyPy do this particular optimization (but PyPy does do other optimizations, like inlining generators when possible—"generator unpeeling"). I guess it's because no one had tried to make a new systems programming language since the 1980s, before Rust (discounting Ada '95 and 2012, but even Ada was first released in 1983). Everyone had been implementing languages with VMs.

Though now that I think of it there's only two mainstream-ish languages I know of that use it heavily: Ruby and Io. I didn't look into Ruby or Io at all when writing that, because I'm unfamiliar with them, but I can't find papers saying they do the non-escaping coroutine iterator optimization. It seems like Ruby always allocates a separate frame, and I can't find anything on Io at all.

By the way, I found a comment by Graydon, which links his implementation of interior iteration in the Rust compiler written in OCaml on Github. https://www.reddit.com/r/ProgrammingLanguages/comments/141qm6g/comment/jyldotb/ I only found it after finding those papers. He explicitly says he learned it from Sather, which copied it from CLU. You might consider contacting him, since right it seems like he's the only one who's implemented this in a programming language since the 1990s (when Sather did it).

Something else that occurred to me while reading your info is that something like internal iteration is in fact extremely common in functional language iteration combinators... map, fold, filter, etc. You're right. In functional programming languages, like OCaml or Haskell, the functions map, fold, and filter are all pure functions that receive closures as parameters, which may capture the surrounding environment's variables. I think allocating non-escaping coroutines on the caller's stack frame can be seen as just a special-case optimization of closures. It's interesting, though, because I hadn't heard of this until reading Graydon's blog post.

On the flip side I think that part of the selling point for internal iteration is being able to trivially write these combinators yourself and have them easily slot together instead of it being a carefully-thought-out exercise in API design. I agree. Being able to write "yield x", when it's possible, is much more elegant.

~icefox 1 year, 6 months ago

You might consider contacting him...

That would require me to be way more brave about talking to new people than I really am!

~icefox referenced this from #10 1 year, 5 months ago

Register here or Log in to comment, or comment via email.