~technomancy/fennel#28: 
Reader macros syntax proposals

We've discussed several ideas on IRC on how reader macros could be used in both print and read correct way.

Suppose we have a macro module named foo.fnl, which has the reader macro bar. Reader macros begin with @ as it's the only reserved character in Fennel as of rightnow, and are followed by their respective namespace and macro itself.

Here are some uses with literals:

  • @foo.bar[1 2 3] will expand to (foo.bar [1 2 3])
  • @foo.bar{:a 1 :b 2} => (foo.bar {:a 1 :b 2})
  • @foo.bar"a" => (foo.bar "a")

For literals that are alphanumeric we'll need some kind of separator. Space is highly discouraged by ~technomancy, but I'll list it there anyway for completeness:

  • @foo.bar/a => (foo.bar a)
  • @foo.bar a => (foo.bar a)

Currently, forward slash / seems like the optimal option, and it will look naturally with compound data structures too:

  • @foo.bar/[1 2 3] will expand to (foo.bar [1 2 3])
  • @foo.bar/{:a 1 :b 2} => (foo.bar {:a 1 :b 2})
  • @foo.bar/"a" => (foo.bar "a")
  • @foo.bar/1337 => (foo.bar 1337)

The question is if / should be mandatory?

Another take on reader macros - use # as both separator and indicator:

  • foo.bar#[1 2 3] will expand to (foo.bar [1 2 3])
  • foo.bar#{:a 1 :b 2} => (foo.bar {:a 1 :b 2})
  • foo.bar#"a" => (foo.bar "a")
  • foo.bar#1337 => (foo.bar 1337)

Currently identifyer, followed by a # is a valid syntax in macros, which is a shorthand for gensym, however a#b is not recognized as a gensym, and does not require any prefix symbol, and given that # is already used as a reader macro for hash-fn giving it a name might be nice way to retain consistency with existing reader macros.

Status
REPORTED
Submitter
~andreyorst
Assigned to
No-one
Submitted
9 months ago
Updated
8 months ago
Labels
enhancement macros needs-design

~andreyorst 9 months ago

I also think that supporting unnamespaced reader macros should be in the works. While namespacing those will defenitively make it more apparent when someone uses them, still, we are importing macros without namespaces pretty often, and I don't see why reader macros should be any different

~technomancy 9 months ago

we are importing macros without namespaces pretty often, and I don't see why reader macros should be any different

So in Clojure, the name and namespace for the reader macro is chosen by the person writing the reader macro. If we adopt a model like this, we can't allow macros to define no namespace, because it will cause collisions.

But in import-macros we make it so the local name of the macro is determined by whoever is doing the importing. So if reader macros are also declared this way, then it's fine to allow people to choose a name that is not namespaced on a per-module basis.

When I said we need to disallow reader macros that have no namespace, I was thinking about the Clojure model. I don't think we can make a decision on this point until we decide upon the interface for declaring the use of a reader macro. It can't be quite like import-macros since the reader macros are applied so much earlier in the process before we can start compilation.

~technomancy 9 months ago

I also realized that using . as a separator between the module name and the macro name is a bad idea because . is an extremely common character in module names. The examples above only really make sense for reader macro modules which aren't nested.

So our best bet might be: @foo.bar/abc[1 2 3] expanding to (foobar.abc [1 2 3]) where foobar here is the "foo.bar" module. Unfortunately this doesn't allow for symbols or numbers to be passed as datum to the reader macros; in order to support that we would need a second separator.

This problem of needing a second separator is not as bad in the case of the final suggestion in the description above:

foo.bar/abc#a => (foo.bar (sym "a"))

~andreyorst 9 months ago

. is an extremely common character in module names.

I have never seen . in module names, only as a "directory" separator in require and as a method of looking up a key in a table. Can you provide an example of what you mean?

@foo.bar/abc[1 2 3] expanding to (foobar.abc [1 2 3])

wat?. Could you please elaborate how foo.bar transforms into foobar?

foo.bar/abc#a => (foo.bar (sym "a"))

Where did abc go?

I think we need to make examples with (semi)real files and this syntax.

~technomancy 9 months ago

I have never seen . in module names, only as a "directory" separator in require and as a method of looking up a key in a table. Can you provide an example of what you mean?

(local mod (require :module.name))

When I say "module name" I mean the value that is passed to require to load the module; that's the canonical name of the module. Locally it can be rebound to another name, but that's only valid within a given chunk.

Could you please elaborate how foo.bar transforms into foobar?

This would be as if you had done (local foobar (require :foo.bar)) since the module name has a dot in it but the local name cannot have a dot in it.

Where did abc go?

Haha, OK this one was just my mistake. It should be

foo.bar/abc#a => (foobar.abc (sym "a"))

Anyway this is only a problem if the reader macro must directly reference the canonical name for a module. If the reader macro module names (and possibly contents) can be given local renames upon import, then it's not an issue.

~andreyorst 9 months ago

Ah, I see. I was confused with some of these, and the more I've read the more I was confused towards the end :D

I would not consider module.name name at all, as it is two names separated by the dot - name of the directory and name of the module, which indicates nested structure. When you place your module in nested directories a/b/c.fnl you then require it with :a.b.c. Module itself here is named c, not a.b.c, though you can indicate that it came from a/b/ in its local name as you've said.

I don't quite understand the semantic of automatic transformation of foo.bar to foobar as it would have possibility of unexpected name clashes:

(local foobar :vaiv)
@foo.bar/abc[1 2 3] ;; if expands to (foobar.abc [1 2 3]) will give weird error message "attempt to index string value"

I would rather not expand these things at all, just refer to those in the lua way, just as we refer to modules with nested directories.

~technomancy 9 months ago

I would not consider module.name name at all, as it is two names separated by the dot - name of the directory and name of the module, which indicates nested structure.

This only really holds under the default searchers. Abstractly the module name can be any opaque string, and a searcher can resolve it to any module value using any arbitrary rules it wants. The idea of replacing dots with directory separators is only a convention specific to certain searchers. It's important to emphasize that the module system itself knows nothing about the filesystem; only certain searchers do.

Within the context of an application, it's fine to make assumptions about how module names map to files, but when designing something like reader macros this is a mistake we must avoid. We must treat entire module names as opaque strings.

I don't quite understand the semantic of automatic transformation of foo.bar to foobar as it would have possibility of unexpected name clashes

Yeah, this is confusing. I did not mean to imply that this transformation would be automatic; I just wanted to come up with a local that could refer to a module whose name was not a valid local name. Let's ignore this for now.

I think pursuing this line of thinking is probably not a great idea until we answer one core question: is the reader macro system A) scoped (similar to the existing macro system), or B) does it refer to absolute, globally-defined module names more like Clojure's notation? Let's look at some whole-file examples rather than single-form examples. For the purposes of the examples here let's assume using # as the separator; we can revisit that question later.

A) gives you much more control, but it also adds some syntactic overhead:

(reader-macros {: abc} :foo.bar)
(reader-macros full :foo.bar)

@abc#[1 2 3] ; would call the abc field of the foo.bar macro module
@full.baz{:a 1 :b 2} ; would call the baz field of the foo.bar module

Whereas B) is more concise and allows you to use a reader macro anywhere in the code without it being explicitly imported first, but each reference to a reader macro must fully identify the macro module as well as the field in question:

@foo.bar/abc#[1 2 3] ; same as above
@foo.bar/baz#{:a 1 :b 2}

Once it's spelled out this way, I think I lean towards A even though it's a bit more verbose. I think it's more consistent with the existing notation which makes the origin point of every identifier more explicit. But I haven't fully made up my mind. B is more convenient, but not being able to use . to separate the module and its field is unfortunate.

On the other hand, the (reader-macros ...) form in A looks a lot like a regular macro, but it must be applied during the parsing phase, so that's weird and misleading. But I can't think of a way around that problem. The other problem is that if you do an import-macros inside a function, those macros are tied to that function's scope, which is great. But reader-macros cannot work that way, because the scoping information cannot be determined by the parser; it can only apply to the whole file on down from the point where the reader-macros form appears.

~jeremypenner 9 months ago

I'm trying to understand the semantics here - in what sense, if any, is this not just an alternate syntax for a two-element list? Are we evaluating custom code at parse-time to modify the results of the tree? If so, how does this interact with the compiler if the resulting object doesn't have a reasonable serialization to Lua?

If this is just a different way of writing (foobar.abc [1 2 3]) without parens then I can't see how it justifies its weight.

~andreyorst 9 months ago

~jeremypenner, the main difference is that this is a reader macros, not simple macros.

For example, when you define a table with custom behavior defined with metamethods, when you print this structure you get ordinary table. Which means that your data strucutre prints, but does not read correctly. This can't be solved even if you define custom __fennelview for this data structure (unless it will print a valid constructor based on the data inside the table). Reader macros solve this problem, you can define a reader macro for your data structure that will be used as print representation of your data structure and will read into your data structure.

For example, my ordered-set. I define such set with (ordered-set :a :b :a :c), and without custom __fennelview metamethods it will be printed like this in Fennel 0.7.1:

>> (ordered-set :a :b :a :c)
["a" "b" "c"]

However if we then pass printed form back to reader we will get ordinary table without set semantics:

>> (conj ["a" "b" "c"] :a)
["a" "b" "c" "a"]

If we have reader macros, we can define __fennelview so our data structure printed like this:

>> (ordered-set :a :b :a :c)
@ordered-set#["a" "b" "c"]

And by implementing @ordered-set macro, we can then use it, so when we read it we get correct data strucure with set semantics at read time:

>> (conj @ordered-set#["a" "b" "c"] :a)
@ordered-set#["a" "b" "c"] ;; :a was not added because it's a set

In short, reader macros ensure that our data strucures not only print but read as well. And, as opposed to (ordered-set :a :b :c) which is a function call, @ordered-set#["a" "b" "c"] is data, because it is read-time construct.

~jeremypenner 9 months ago

"Read-time" in Fennel is not currently super well-documented - I assume in this context we are using it to mean fennel.parser(). One fundamental question: When fennel.parser encounters the text @ordered-set#["a" "b" "c"], what does it do? I see two options with very different consequences:

  1. It calls ordered-set on [:a :b :c] and embeds the result directly into the output. The compiler sees this and treats this as a literal value which evaluates to itself.
  2. It generates a table, something like {:tag :ordered-set :form [:a :b :c]}, with a special marker in the metatable to mark it as a "tagged literal", and embeds that in the output. The compiler sees this and compiles a call to (ordered-set [:a :b :c]).

I was assuming option 1, which IMO is a bad idea. (How do you compile an arbitrary object with metatables and arbitrary functions or even userdata to readable Lua? IMO there's no good way that's not way more trouble than it's worth. Do we attempt to protect against @eval#"(launch-missiles)"?) But all of my objections basically disappear if the reader doesn't execute arbitrary user code directly.

Another question then becomes, how does this interact with macros? Do macros that get passed an @ordered-set#[:a :b :c] see the special unevaluated "tagged literal" table, or do they see an ordered set? If they see an ordered set, we're back into "how do we serialize an arbitrary complex value into Lua code" territory again. But it may be surprising to see a special unevaluated "tagged literal" object and have to take an extra step to turn it into a value the macro can interpret more directly. OTOH, if the "tagged literal" object is exposed to macros, then macros could trivially do things like turn an @ordered-set#[:a :b :c] into a @special-transformed-set#[:a :b :c :d], which could be cool.

~technomancy 9 months ago

In my previous comment I talked about the possibility of scoping reader macros; that is, making it so they can only be invoked if they have already been somehow declared further up the file. I think that proposal makes them too similar to regular macros; like ~jeremypenner said it ends up being just a way to avoid writing parens, which definitely does not carry enough value to justify its complexity. If this feature is to be distinct enough to be valuable, it must be an extension to the reader itself, which I think means applying it at one place so that it can be applied across the whole codebase.

In order to evaluate how this would work, I think we should work backwards from how the macros are brought in.

One possibility (let's call this the options approach) is that reader macros are added at the compiler entry points; that would be either the opts table that gets passed to fennel.compile, fennel.eval, fennel.make_searcher, etc, or as command-line options. They could be specified as a list of reader-macro modules to install in the reader. For instance:

table.insert(package.loaders,
             fennel.make_searcher({correlate=true,
                                   readermacros={"foo.bar", "cljlib"}}))

This approach would also support having reader macros be declared on the command line:

$ fennel --reader-macro foo.bar --reader-macro cljlib --compile code.fnl

In this case the modules foo.bar and my.macros would be tables containing reader macros, such that this code would run:

(let [s cljlib/ordered-set#[:a :b :c]]
  (calculate s foo.bar/megatable#{:a 1 :b 2}))

A variation on the options approach (options with aliasing) would not necessarily require a 1:1 mapping between the tag in the code and the module name containing its reader macro:

local foobar = require("foo.bar")
local cljlib = require("cljlib")
table.insert(package.loaders,
             fennel.make_searcher({correlate=true,
                                   readermacros={foo.bar=foobar, cljlib=cljlib}}))

I'm not sure how the CLI equivalent of this would look though.

Another possibility (let's call this the module-name approach) would be that references to reader macros consist solely of the full module name:

(let [s cljlib.ordered-set#[:a :b :c]]
  (calculate s foo.bar.megatable#{:a 1 :b 2}))

In this case, rather than the reader macro module being a table, the module would solely contain a function, which would get called at read time with [:a :b :c], {:a 1 :b 2} etc as its argument. This removes the need for a separator character, but it means that every reader macro would need to be its own module. Considering that reader macros should be used sparingly, this doesn't seem like much of a downside. It's immediately obvious by looking at the code where the reader macro in question is defined. This also avoids the need for specifying in the options table or CLI, reducing the number of moving parts required.

~technomancy 9 months ago

"Read-time" in Fennel is not currently super well-documented

https://p.hagelb.org/data-understatement.gif

There is currently no reference anywhere in the Fennel documentation to "reading" as anything other than getting bytes from disk or standard in. Existing material uses the term "parser" exclusively. So maybe calling them something like "extensible literals" or something would be better?

Anyway, any thoughts on the means of specifying the extensions, or on the idea of requiring full module names when referring to them?

~technomancy 9 months ago

~jeremypenner

How do you compile an arbitrary object with metatables and arbitrary functions or even userdata to readable Lua? IMO there's no good way that's not way more trouble than it's worth.

Originally I was imagining that it would emit a list form which would call setmetatable on the input data and do something fancy in the metatable, but that is somewhat problematic because now it no longer looks like a literal at all to other macros that are downstream of the reader macro.

~andreyorst I think we need a little more detail on how this would work and how it would interact with other macros that would consume code containing these literals in them. We want them to feel first-class at compile time but also feel first-class at runtime, and it's difficult to think of how that could be accomplished.

Do we attempt to protect against @eval#"(launch-missiles)"?

I think this would be handled in the same way that macros are already sandboxed; the reader macros should have the same protection against evaluating arbitrary code as we already apply to macros. No I/O or arbitrary code loading allowed, just the existing sandbox.

~technomancy 9 months ago

I've added a more detailed proposal on the wiki: https://github.com/bakpakin/Fennel/wiki/ParserMacros

Nothing there is set in stone but it summarizes the discussion so far and outlines the open questions. We can continue the discussion here but I thought it would be helpful to have something more concrete rather than spread out across so many comments.

~andreyorst 8 months ago*

While working on a pretty printer I was also thinking about a possible support for cyclic table definitions via reader macros. For example (cycle notation is a draft, but inspired by fennelview):

>> (local t {:a 1 :b 2})
>> (tset t :t t)
>> (print (pp t))
@1{:a 1
   :b 2
   :t @1{...}

This is not going to read, because @ is reserved character in Fennel (which makes it perfect for this), and second, the @1{...} thing just makes no sense. However if Fennel's reader were to support this as a reader macro we could do the opposite - read @1{:a 1 :b 2 :t @1{...}} and get this code embedded into source code (or maybe evaled?):

(let [_tmp_0 {:a 1 :b 2}]
  (tset _tmp_0 :t _tmp0)
  t)

The hard parts here is to validate this and error user if something is not right, e.g. ID is missing, or used incorrectly. The other hard part is to actually generate code that will correctly generate this kind of definition, which may be hard for deeply nested cycles, or when there are several mutual cycles:

>> (macrodebug @1[1 2 3 @2[1 2 @1[...]] [1 2 @2[...]]])
(do
  (local v1 [1 2 3])
  (local v2 [1 2 v1])
  (local v3 [1 2 v2])
  (table.insert v1 v2)
  (table.insert v1 v3)
  v1)

I think if something prints it should also read, but this means that reader has to support this kind of stuff. Given that Lua allows this kind of tables, maybe it is good opportunity to provide a way of defining such tables non-programmatically? As a long term goal maybe, though (if this even possible)

Register here or Log in to comment, or comment via email.