When profiling my program, I found out that the majority of runtime is spent in (require :fennel)
. Behold:
$ cat test.fnl
(require :fennel)
(print "hii")
$ fennel --require-as-include --compile test.fnl > test.lua
$ tail test.lua
]===], {allowedGlobals = false, env = env, filename = "src/fennel/match.fnl", moduleName = module_name, scope = compiler.scopes.compiler, useMetadata = true})
for k, v in pairs(match_macros) do
compiler.scopes.global.macros[k] = v
end
package.preload[module_name] = nil
end
return mod
end
require("fennel")
return print("hii")
$ time lua test.lua
hii
real 0m0.484s
user 0m0.456s
sys 0m0.026s
That's half a second to print "hi". To compare, I required enough of penlight for test.lua
to have a similar amount of lines (~8700) as with fennel. This took 30-50ms.
I remember someone mentioned on IRC that fennel compiles macros every time it's required. Is there a way to precompile everything? Could we make it the default? Otherwise any program that calls fennel programmatically (fennel.dofile
, fennel.view
, ...) pays a pretty big startup penalty.
Yes, I think it's a good idea to turn macros into Lua as part of the bootstrap process for the compiler.
Otherwise any program that calls fennel programmatically (
fennel.dofile
,fennel.view
, ...) pays a pretty big startup penalty.Oddly these two examples are very different; if you want to call
dofile
then you definitely need the whole compiler loaded, but if you just want to callfennel.view
that can already be done faster:;; instead of this: (local {: view} (require :fennel)) ;; do this: (local view (require :fennel.view))However, I'm not sure if this is well-documented. And this only works with
fennel.view
; none of the other nested modules in the compiler support this.
Thanks for the tip about
fennel.view
, I didn't know you could do that!Sadly, in my case I need
fennel.dofile
as I'm evaluating fennel files provided by the user.
OK, I've taken a look at this. I have an implementation that offers some speed improvements, with some tradeoffs. However, I can't reproduce the slow boot speeds you're reporting. For the same program you've shown above, I get about 100ms runtime with LuaJIT, or 125 with Lua 5.4; both imperceptibly fast. Hard to say what's causing it to be so slow in your case.
Precompiling the macros gets it down to 20-30 milliseconds; however my current approach requires disabling metadata, which breaks docstrings on built-in macros.
These timings were from my Raspberry Pi 4. On my Macbook Air (M1) it takes around 100ms.
A couple runs of profiling on the Raspberry Pi:
$ luajit -jp test.lua hii 12% special 8% multi-sym? 8% parse_string_loop 8% symbol_to_expression 8% parse_sym 8% hook-opts 8% compile1 4% quoted? 4% getb 4% sym 4% add_stable_keys 4% close_table 4% flatten_chunk 4% load-code 4% test.lua:0 4% view 4% maxn $ luajit -jp test.lua hii 9% test.lua:0 6% list? 6% whitespace_3f 6% compile1 6% special 6% getb 6% parse_sym 3% stablepairs 3% _157_ 3% set_source_fields 3% peephole 3% parse_string 3% (for generator) 3% get_arg_name 3% getopt 3% check_malformed_sym 3% close_table 3% parse_comment 3% exprs1 3% test.lua:2654 3% load-code 3% view 3% parse_sym_loop 3% multi-sym? $ luajit -jp test.lua hii 9% test.lua:0 6% view 6% symbol_to_expression 6% check_malformed_sym 3% normalize_opts 3% make_options 3% get_arg_name 3% (for generator) 3% parse_number 3% sym 3% calculate_if_target 3% open_table 3% compile1 3% macroexpand_2a 3% getbyte 3% hook-opts 3% tostring 3% ast-source 3% _229_ 3% whitespace_3f 3% f 3% exprs1 3% make-scope 3% getb 3% _12_ 3% skip_whitespace 3% load-code
On the Mac I couldn't manage to run the LuaJIT profiler, and the ones written in Lua had issues with recursion and showed incorrect data.
Precompiling the macros gets it down to 20-30 milliseconds; however my current approach requires disabling metadata, which breaks docstrings on built-in macros.
From my perspective that's fine, if you want to fire a REPL for the user you can require the version that's not precompiled.
OK, well, you can take a look at my branch here where I've started this work: https://git.sr.ht/~technomancy/fennel/log/precompile-macros
I have an idea for how to fix the metadata problem, but there are also problems with nesting the compiler (loading fennel from a fennel program) in the
test-nest
test that I have no idea what could be causing them.
Finally managed to test it :) I can confirm it helps a lot: now the timings on my Raspberry Pi are ~35ms for PUC Lua 5.1, and ~25ms for LuaJIT. That's much better!
I think I have this working on the "precompile-macros" branch now! But it's too big of a change to merge right now. We'll have to get 1.5.1 released, and then we can bring this in for 1.6.0.
Great news! I see that even metadata is included now :) Thank you!
This is on
main
now!