~crc_/retroforth#22: 
suggestion: embrace markdown as the source structure

The unu style with ~~~ markers is conceptually very close to markdown with '''forth source code markers. By moving to .retro.md as the standard code format in 2020.1 and beyond it becomes far more readable when viewed on sourcehut/gitlab/github and being developed with editors & IDEs. The script also works well with .muri files and vanilla ''' code blocks.

Note: ''' is shown above as sr.ht does not show backtick-backtick-backtick in the preview.

To test this idea a modified retro-unu.py is shown here with the first part of retro.forth in markdown style. It becomes very easy to see the code blocks from the comments.

import sys

if __name__ == "__main__":
    if len(sys.argv) == 1:
        sys.exit(0)

    f = sys.argv[1]
    in_block = False
    with open(f, "r") as source:
        if len(sys.argv) == 1:
            for line in source.readlines():
                if line.rstrip() == "~~~":
                    in_block = not in_block
                elif in_block:
                    print(line.rstrip())

        elif sys.argv[2] == "--to-md":
            need_space = False
            the_end = False
            for line in source.read().split('\n'):
                if line.rstrip() == "~~~":
                    in_block = not in_block
                    if in_block:
                        print('```')
                        print('')
                    else:
                        print('')
                        print('```')
                        print('')
                        need_space = False
                elif line.strip() == '':
                    print('')
                    if need_space:
                        print('')
                    need_space = False
                else:
                    if line.strip() == '## The End':
                        the_end = True
                    if in_block or the_end:
                        if the_end:
                            print('`' + line + '`')
                        else:
                            print('    '+line)
                    else:
                        if line[:4] == '    ' or line.strip()[0] in ['|', '_']:
                            line = line.strip()
                            if line[0] in ['|', '_']:
                                print(line)
                            else:
                                print('`' + line + '`\n')
                        else:
                            if need_space:
                                print(' ', end='')
                            print(line.strip(), end='')

                        need_space = True

#RETRO FORTH

#Background

Retro is a dialect of Forth. It builds on the barebones Rx core, expanding it into a flexible and useful language.

Over the years the language implementation has varied substantially. Retro began in 1998 as a 16-bit assembly implementation for x86 hardware, evolved into a 32-bit system with cmForth and ColorForth influences, and eventually started supporting mainstream OSes. Later it was rewritten for a small, portable virtual machine.

This is the twelfth generation of Retro. It targets a virtual machine (called Nga) and runs on a wide variety of host systems.

#Namespaces

Various past releases have had different methods of dealing with the dictionary. I have settled on using a] single global dictionary, with a convention of using a short namespace prefix for grouping related words. This was inspired by Ron Aaron's 8th language.

The main namespaces are:

namespace words related to
ASCII ASCII Constants
a arrays
c characters
compile compiler functions
d dictionary headers
err error handlers
io i/o functions
n numbers
s strings
v variables

This makes it very easy to identify related words, especially across namespaces. E.g.,

c:put

c:to-upper

s:put

s:to-upper

#Prefixes

Prefixes are an integral part of Retro. These are single symbol modifiers added to the start of a word which control how Retro processes the word.

The interpreter model is covered in Rx.md, but basically:

- Get a token (whitespace delimited string)

- Pass it tointerpret``

+ if the token starts with a known prefix then pass

it to the prefix handler

+ if the initial character is not a known prefix,

look it up

- if found, push the address ("xt") to the stack

and call the word's class handler

- if not found callerr:not-found``

- repeat as needed

This is different than the process in traditional Forth. A few observations:

- there are no parsing words

- numbers are handled using a prefix

- prefixes can be added or changed at any time

The basic prefixes are:

prefix used for
: starting a definition
& obtaining pointers
( stack comments
` inlining bytecodes
' strings
# numbers
$ characters
@ variable get
! variable set
\ inline assembly
^ assembly references
compiler macros

#Naming and Style Conventions

  • Names should start with their namespace (if appropriate) * Word names should be lowercase * Variable names should be Title case * Constants should be UPPERCASE * Names may not start with a prefix character * Names returning a flag should end with a ? * Words with an effect on the stack should have a stack comment

#Code Begins

Memory Map

This assumes that the VM defines an image as being 524,288 cells. Nga implementations may provide varying amounts of memory, so the specific addresses will vary.

RANGE CONTAINS
0 - 1024 rx kernel
1025 - 1535 token input buffer
1536 + start of heap space
............... free memory for your use
506879 buffer for string evaluate
507904 temporary strings (32 * 512)
524287 end of memory

I provide a word, EOM, which returns the last addressable location. This will be used by the words in the s: namespace to allocate the temporary string buffers at the end of memory.


    :EOM  (-n)  #-3 fetch ;

#Stack Depth

depth returns the number of items on the data stack. This is provided by the VM upon reading from address -1.


    :depth  (-n) #-1 fetch ;

#Stack Comments

Stack comments are terse notes that indicate the stack effects of words. While not required, it's helpful to include these.

They take a form like:

(takes-returns)

I use a single character for each input and output item. These will often (though perhaps not always) be:

n, m, x, y number

a, p pointer

q quotation (pointer)

d dictionary header (pointer)

s string

c character (ASCII)

#Dictionary Shortcuts

I define a few words in the d: namespace to make it easier to operate on the most recent header in the dictionary. These return the values in specific fields of the header.


    :d:last        (-d) &Dictionary fetch ;
    :d:last.xt     (-a) d:last d:xt fetch ;
    :d:last.class  (-a) d:last d:class fetch ;
    :d:last.name   (-s) d:last d:name ;

#Changing A Word's Class Handler

I implement reclass to change the class of the most recent word.


    :reclass    (a-) d:last d:class store ;

With this I can then define immediate (for state-smart words) and data to tag data words.


    :immediate  (-)  &class:macro reclass ;
    :data       (-)  &class:data  reclass ;


    :primitive (-) &class:primitive reclass ;

Status
REPORTED
Submitter
~scott91e1
Assigned to
No-one
Submitted
5 months ago
Updated
5 months ago
Labels
suggestion toolchain

~crc_ 5 months ago

There are a couple of things with this:

  • I'd personally prefer to keep using .retro and/or .forth as a suffix for source files

This is purely a personal preference. We've already changed the suffix from .forth to .retro to avoid some conflicts with other Forth systems, and I'm a little reluctant to change again, especially since:

  • The source files as I write them are in a subset of Markdown, but the current toolchain doesn't support full Markdown.

If we embrace Markdown, which specific flavor, and what would need to be added to the Markdown to XHTML to accommodate this? I don't want to encourage use of syntax which the documentation toolchain can't handle.

  • Not all users use Markdown

Unu requires code fences, but nothing else. Declaring Markdown to be the standard format may meet some pushback from those who prefer to work with other formats. (As an example, I have one client using ReST with Sphinx; we have a variation of Unu that generates ReST, translating the fenced blocks to the ReST literal blocks.)

That said, there are some benefits to doing this. I'll continue to consider it, and will gather more feedback before deciding how to proceed.

Register here or Log in to comment, or comment via email.