A small collection of VM implementations. Currently there are 10 VMs:
See http://www.boundvariable.org/ for more details about the VM.
VM images (codex.umz and sandmark.umz) are available there.
The VMs in this section share some code (common.h and common.c),
specifically, arrays allocation / deallocation, IO functions, etc. When
comparing the benchmark execution times below, keep in mind that each of
these programs spends ~9 seconds in calloc / free (according to gprof).
So, for example, opcode dispatch and execution in lightn.c is actually
almost a decimal order of magnitude faster than in switch.c (to be more
precise, (32 - 9) / (11.5 - 9) = 9.2 times faster), and about twice as fast
as in dasm-x64.dasm.
- c/switch.c
The most naive implementation possible. Takes 32 seconds on my laptop to run
sandmark.umz. Opcode dispatch is just a Cswitch, hence the name. Branch misprediction happens almost on every opcode dispatch.EDIT: (02.05.2022) Apparently C compilers are now smart enough to generate branchy dispatch instructions even for naive switch code? Now switch.c is almost as fast as goto.c (~7% slower, perhaps?).
- c/goto.c
- Best I can do with pure C so far (w/o e.g. assembly tricks). Takes
22.5 seconds on my laptop to run the
sandmark. Opcode dispatch is a bit smarter (Cgotowith a branchyNEXTclause, much more friendly to the CPU branch predictor). - c/dasm-x64.dasm
- A real JIT, incrementally compiles the VM opcode stream into native
code. Handles self-modifying code, etc. x86_64 only (see
dasm-x86.dasmfor a 32bit variant, though). Uses dynasm. Takes 14 seconds to run thesandmark. Unfortunately,dynasmis actually buggy as hell. Spent most of the day finding those weird bugs in the generated machine code in GDB. If you wonder why the Assembly language code indasm-x64.dasmis so damn ugly -- well, it's largely because almost any less ugly code triggers some bugs indynasm. At this rate it really isn't worth the trouble. Which is a pity, because I like the idea behinddynasma lot. - c/lightn.c
Another JIT implementation, this time using GNU Lightning. This one takes ~11.5 seconds to run the
sandmark. I used a non-portable x86_64 subset of whatlightningprovides (specifically, 10 general purpose registers; portable code should use 6). Try not to do that, though; apparently there's a bug inlightningclobbering some registers sometimes. Perhaps a portable code wouldn't trigger that bug. Overall,lightningrules. I will probably prefer it dodynasmin my future projects.The speedup (comparing to
dynasmversion) is mostly because withlightningI was able to map all VM registers to real x86_64 CPU registers. It's probably impossible withdynasm.
- go/switch/switch.go
Takes 21 seconds (was: 30 and 46 seconds) to run the
sandmark. Basically a translation ofswitch.cto golang, a very naive implementation. It should be possible to translategoto.cto Go too, might improve the performance by I don't know, 10%? Quite a tedious task though, since there are no macros in Go.Update: Apparently, when I got rid of
continueinswitch.godispatch, the optimizer managed to use computed goto for dispatch, or something. The benchmark now takes 30 seconds instead of 46. Almost as fast as C, very impressive.Update 2: (02.05.2022) Apparently Go is now much better at optimizing these things -- now it's almost as fast as C.
- nim/switch.nim
Takes 30 seconds to run the
sandmark. Also a translation ofswitch.c, to Nim. Nim documentation mentions a pragma ({.computedGoto.}) that is specifically designed to optimize switch VM dispatch wrt branch predictions, and it actually works (but you can't use e.g.continuein your switch).Update: With cached zero array it's also almost as fast as C (was 40 seconds without this optimization).
- crystal/switch.cr
- Takes 39 seconds to run the
sandmark, run with GC_MARKERS=1. Also a translation ofswitch.c, to Crystal. Shortest source code so far. That's actually pretty fast, since that's supposedly a naive switch dispatch, very unfriendly to branch predictor, no computed gotos, nothing. - crystal/tail-call.cr
- Takes 36 seconds to run the
sandmark, run with GC_MARKERS=1. The idea is to use tail call elimination optimization (which Crystal does when compiled in--releasemode) for dispatch instead of switch. This way we can give the branch predictor something to work with. Variables access is a bit tricky in this case, hence the use of closures for ops. - clojure/naive.clj
- Takes 32 minutes (yes, that's minutes, not seconds) to run the
sandmark. This one is written in purely functional style, only using persistent data structures andloop/recurfor emulating assignments. This is about as unwieldy to write as it is slow to execute. So, don't try this at home for low-level bit-crunching stuff. - clojure/arrays.clj
- Takes 55 seconds to run the
sandmark. This one is written using native arrays and mutable vars. It's almost two orders of magnitude improvement overnaive.clj. Also, the source code is less convoluted. Well, at least somewhat; clojure type hinting does not seem intuitive to me at all. It's just 2 times slower than e.g. naive C (switch.c) or golang, which is actually not bad at all. - rust/src/main.rs
- Very straightforward implementation in Rust, no
unsafe, nothing. It's actually about as fast as C interpreters (~5% slower thanswitch.c). Very impressive.