add benchmarking

It may be beneficial to have a standard set of benchmarks.

Potential uses:

  • general comparison of performance across VM implementations
  • looking for regressions after updates
  • looking for bottlenecks that could be addressed by a VM (as in the case of the current Python VM that overrides some words with faster implementations)

I don't plan to use benchmarking to contrast Retro with other Forth systems, merely as a guide to aiding in improving Retro itself.

Assigned to
5 months ago
2 months ago

~crc_ 5 months ago

I've added two small tests to start the process for this.

On my Linode running OpenBSD:

1,000,000 iterations of empty loops
    1m49.14s real     1m47.41s user     0m00.05s system
Push and discard a value 1,000,000 times
    2m15.99s real     2m13.74s user     0m00.07s system

1,000,000 iterations of empty loops
    0m02.95s real     0m02.67s user     0m00.25s system
Push and discard a value 1,000,000 times
    0m03.19s real     0m03.01s user     0m00.15s system

1,000,000 iterations of empty loops
    0m00.32s real     0m00.32s user     0m00.00s system
Push and discard a value 1,000,000 times
    0m00.38s real     0m00.38s user     0m00.00s system

~scott91e1 5 months ago

Its great to see PyPy (Version 3 I assume) performing well as that is the platform I am targeting.

Could be interesting to see what Jython is capable of given its JVM/JIT underpinning.

Also LuaJIT VM some day :]

Just for interest, how many opcodes are decoded to perform 1,000,000 empty loops in retro?

I would assume this number is highly dependent on what retro functions have been moved into the VM.

~crc_ 5 months ago

I'm actually using 2.7 with PyPy at the moment (my CPython is v3):

Python 2.7.13 (4a68d8d3d2fc1faec2e83bcb4d28559099092574, May 08 2020, 21:47:03)
[PyPy 7.2.0 with GCC 4.2.1 Compatible OpenBSD Clang 8.0.1 (tags/RELEASE_801/final)] on openbsd6

I'll have to see if OpenBSD has a port for Python3, or if I'll need to set aside some time to try building it from source.

I don't have a machine with Java at the moment, so can't run this under Jython. This is something I will look into doing.

I'll do some statistics tracking this weekend on the number of opcodes executed for the benchmarks; I haven't done so yet.

RE: Lua; this is one my todo list, hopefully for sometime next year.

~scott91e1 5 months ago

Perhaps add some negative slots to the the VM spec to pull opcount counts and the high resolution timer? Do the other VMs return 0 for unsupported negative slots or abort?

An additional negative slot that takes takes the top-of-stack value, subtracts current time and returns elapsed milliseconds (perhaps distinct integer and float versions) suddenly neat ABI for benchmarking. Ideally the timing code should have as little impact on runtime & opcount as possible. Also a slot for now & top-of-stack to UTC/ISO string eg. "2020-12-19T18:45:28.640919"

~scott91e1 5 months ago

To be fair; the benchmark suite should allow the JIT to warm-up before the timings are performed.

IMHO The current numbers are not a true indication of the runtime performance possible with JIT based VMs.

~crc_ referenced this from #28 5 months ago

~crc_ 5 months ago

re: negative slots; I've opened a separate issue related to that.

re: warming up time: this depends on ones usage. I run Retro largely non-interactively, so the current approach reflects the performance as I use it. But this may admittedly not be how others use it; it's worth testing both ways IMO, so I'll ultimately work on both.

~crc_ referenced this from #36 5 months ago

~crc_ referenced this from #56 2 months ago

Register here or Log in to comment, or comment via email.