The unu style with ~~~ markers is conceptually very close to markdown with '''forth source code markers. By moving to .retro.md as the standard code format in 2020.1 and beyond it becomes far more readable when viewed on sourcehut/gitlab/github and being developed with editors & IDEs. The script also works well with .muri files and vanilla ''' code blocks.
Note: ''' is shown above as sr.ht does not show backtick-backtick-backtick in the preview.
To test this idea a modified retro-unu.py is shown here with the first part of retro.forth in markdown style. It becomes very easy to see the code blocks from the comments.
import sys
if __name__ == "__main__":
if len(sys.argv) == 1:
sys.exit(0)
f = sys.argv[1]
in_block = False
with open(f, "r") as source:
if len(sys.argv) == 1:
for line in source.readlines():
if line.rstrip() == "~~~":
in_block = not in_block
elif in_block:
print(line.rstrip())
elif sys.argv[2] == "--to-md":
need_space = False
the_end = False
for line in source.read().split('\n'):
if line.rstrip() == "~~~":
in_block = not in_block
if in_block:
print('```')
print('')
else:
print('')
print('```')
print('')
need_space = False
elif line.strip() == '':
print('')
if need_space:
print('')
need_space = False
else:
if line.strip() == '## The End':
the_end = True
if in_block or the_end:
if the_end:
print('`' + line + '`')
else:
print(' '+line)
else:
if line[:4] == ' ' or line.strip()[0] in ['|', '_']:
line = line.strip()
if line[0] in ['|', '_']:
print(line)
else:
print('`' + line + '`\n')
else:
if need_space:
print(' ', end='')
print(line.strip(), end='')
need_space = True
Retro is a dialect of Forth. It builds on the barebones Rx core, expanding it into a flexible and useful language.
Over the years the language implementation has varied substantially. Retro began in 1998 as a 16-bit assembly implementation for x86 hardware, evolved into a 32-bit system with cmForth and ColorForth influences, and eventually started supporting mainstream OSes. Later it was rewritten for a small, portable virtual machine.
This is the twelfth generation of Retro. It targets a virtual machine (called Nga) and runs on a wide variety of host systems.
Various past releases have had different methods of dealing with the dictionary. I have settled on using a] single global dictionary, with a convention of using a short namespace prefix for grouping related words. This was inspired by Ron Aaron's 8th language.
The main namespaces are:
namespace | words related to |
---|---|
ASCII | ASCII Constants |
a | arrays |
c | characters |
compile | compiler functions |
d | dictionary headers |
err | error handlers |
io | i/o functions |
n | numbers |
s | strings |
v | variables |
This makes it very easy to identify related words, especially across namespaces. E.g.,
c:put
c:to-upper
s:put
s:to-upper
Prefixes are an integral part of Retro. These are single symbol modifiers added to the start of a word which control how Retro processes the word.
The interpreter model is covered in Rx.md, but basically:
- Get a token (whitespace delimited string)
- Pass it to
interpret``
+ if the token starts with a known prefix then pass
it to the prefix handler
+ if the initial character is not a known prefix,
look it up
- if found, push the address ("xt") to the stack
and call the word's class handler
- if not found call
err:not-found``
- repeat as needed
This is different than the process in traditional Forth. A few observations:
- there are no parsing words
- numbers are handled using a prefix
- prefixes can be added or changed at any time
The basic prefixes are:
prefix | used for | |
---|---|---|
: | starting a definition | |
& | obtaining pointers | |
( | stack comments | |
` | inlining bytecodes | |
' | strings | |
# | numbers | |
$ | characters | |
@ | variable get | |
! | variable set | |
\ | inline assembly | |
^ | assembly references | |
compiler macros |
Memory Map
This assumes that the VM defines an image as being 524,288 cells. Nga implementations may provide varying amounts of memory, so the specific addresses will vary.
RANGE | CONTAINS |
---|---|
0 - 1024 | rx kernel |
1025 - 1535 | token input buffer |
1536 + | start of heap space |
............... | free memory for your use |
506879 | buffer for string evaluate |
507904 | temporary strings (32 * 512) |
524287 | end of memory |
I provide a word, EOM
, which returns the last addressable location. This will be used by the words in the s:
namespace to allocate the temporary string buffers at the end of memory.
:EOM (-n) #-3 fetch ;
depth
returns the number of items on the data stack. This is provided by the VM upon reading from address -1.
:depth (-n) #-1 fetch ;
Stack comments are terse notes that indicate the stack effects of words. While not required, it's helpful to include these.
They take a form like:
(takes-returns)
I use a single character for each input and output item. These will often (though perhaps not always) be:
n, m, x, y number
a, p pointer
q quotation (pointer)
d dictionary header (pointer)
s string
c character (ASCII)
I define a few words in the d:
namespace to make it easier to operate on the most recent header in the dictionary. These return the values in specific fields of the header.
:d:last (-d) &Dictionary fetch ;
:d:last.xt (-a) d:last d:xt fetch ;
:d:last.class (-a) d:last d:class fetch ;
:d:last.name (-s) d:last d:name ;
I implement reclass
to change the class of the most recent word.
:reclass (a-) d:last d:class store ;
With this I can then define immediate
(for state-smart words) and data
to tag data words.
:immediate (-) &class:macro reclass ;
:data (-) &class:data reclass ;
:primitive (-) &class:primitive reclass ;
There are a couple of things with this:
- I'd personally prefer to keep using .retro and/or .forth as a suffix for source files
This is purely a personal preference. We've already changed the suffix from .forth to .retro to avoid some conflicts with other Forth systems, and I'm a little reluctant to change again, especially since:
- The source files as I write them are in a subset of Markdown, but the current toolchain doesn't support full Markdown.
If we embrace Markdown, which specific flavor, and what would need to be added to the Markdown to XHTML to accommodate this? I don't want to encourage use of syntax which the documentation toolchain can't handle.
- Not all users use Markdown
Unu requires code fences, but nothing else. Declaring Markdown to be the standard format may meet some pushback from those who prefer to work with other formats. (As an example, I have one client using ReST with Sphinx; we have a variation of Unu that generates ReST, translating the fenced blocks to the ReST literal blocks.)
That said, there are some benefits to doing this. I'll continue to consider it, and will gather more feedback before deciding how to proceed.