Chapter 5

The Bytecode VM

High-performance execution

The tree-walking interpreter is easy to understand but not the fastest. Bloom's default engine is a bytecode virtual machine (VM): a compiler lowers the AST into a flat array of opcodes, and a small stack machine executes them. It runs roughly 2–6× faster on loops, and far faster on deep recursion.

The default engine, with a narrow safety net The VM runs first because it is fast, and it now runs almost the entire language — closures included, captured natively via boxed cells (Chapter 6). The only feature it can't handle is modules / imports: when the compiler sees an import it throws, and Bloom re-runs the sketch on the tree-walking interpreter. See how Bloom chooses.

Why Bytecode?

Walking a tree involves lots of pointer chasing and method dispatch. Bytecode is a flat array of numbers — the VM just reads instructions sequentially, which is much faster.

Tree-Walking
+
1
2
Jump between nodes (slower)
Bytecode VM
CONST1CONST2ADD
→ sequential read
Simple array scan (faster)

How It Works

The bytecode system has two parts:

  1. Compiler — Converts AST to bytecode instructions
  2. Virtual Machine — Executes those instructions

The Stack

The VM uses a stack to hold values. Operations pop values from the stack, compute, and push results back.

Expression: 1 + 2
CONST 1    // Push 1 onto stack         Stack: [1]
CONST 2    // Push 2 onto stack         Stack: [1, 2]
ADD        // Pop 2, pop 1, push 1+2    Stack: [3]

Bytecode Instructions

Each instruction is an opcode (operation code), sometimes followed by arguments:

Addr Opcode Operand
0 CONST 0 (value: 1)
2 CONST 1 (value: 2)
4 ADD

The CONST instruction takes an index into a constant pool — an array of literal values. This keeps the bytecode compact.

Key Opcodes

Category
Opcodes
Purpose
Stack
CONST, POP, DUP
Push/remove values
Variables
LOAD_LOCAL, STORE_LOCAL
Read/write local vars
Globals
LOAD_GLOBAL, STORE_GLOBAL
Read/write global vars
Math
ADD, SUB, MUL, DIV, MOD, NEG
Arithmetic operations
Compare
EQ, NE, LT, LE, GT, GE
Comparisons
Jumps
JUMP, JUMP_IF_FALSE, LOOP
Control flow
Functions
CALL, CALL_NATIVE, RETURN
Function calls
Collections
NEW_ARRAY, GET_INDEX, SET_INDEX
Array/object access
Iteration
RANGE, ITER_INIT, ITER_NEXT
Loop support

Compiling a For Loop

Let's see how a loop becomes bytecode:

Source
for i in 0..3 {
  print(i)
}
Addr Opcode Comment
0 CONST 0 ; push range start
1 CONST 3 ; push range end
2 RANGE ; pop 2 → push range{0,3}
3 ITER_INIT ; pop range → push iterator
4 STORE_LOCAL iter slot ; keep iterator in a slot
6 LOAD_LOCAL iter slot ; push iterator
8 ITER_NEXT → 20 ; if exhausted jump out, else push next
11 STORE_LOCAL i slot ; bind loop variable i
13 LOAD_LOCAL i slot ; push i as argument
15 CALL_NATIVE print id, 1 arg
19 LOOP → 6 ; jump back to LOAD iterator
20 (end)

The green rows are the loop body — executed 3 times. The iterator is held in a local slot; each pass reloads it, and ITER_NEXT either pushes the next value or jumps to the end when the range is exhausted. LOOP is a backward jump that repeats the body. (Addresses are illustrative; the real compiler also emits a few POPs to keep the stack balanced.)

Real opcode names These are the actual opcodes from the OpCode enum in src/lang/bytecode.ts. Numbers use a constant pool: CONST takes a 16-bit index into that pool rather than embedding the value inline.

Superinstructions

Common instruction sequences are fused into single superinstructions for speed:

Pattern
Superinstruction
Speedup
LOAD a; LOAD b; ADD
ADD_LOCALS a, b
~2x
LOAD a; CONST n; LT; JUMP_IF_FALSE
LOAD_LOCAL_CONST_LT_JUMP
~3x
LOAD a; CONST 1; ADD; STORE a
INC_LOCAL a
~3x

The compiler recognizes these patterns and emits the optimized version.

Local Variables

Inside functions, variables use numbered slots instead of names. No hash table lookup needed!

fn example(a, b) {   // a = slot 0, b = slot 1
  let c = a + b      // c = slot 2
  return c
}

// Compiled to:
LOAD_LOCAL 0     // push a
LOAD_LOCAL 1     // push b
ADD              // a + b
STORE_LOCAL 2    // store to c
LOAD_LOCAL 2     // push c
RETURN           // return top of stack

Calling Native Functions

Native functions (like circle) use a dispatch table for O(1) lookup:

// In the VM (simplified):
case OpCode.CALL_NATIVE: {
  const nativeId = (code[ip++] << 8) | code[ip++]  // u16 index
  const argCount = code[ip++]                       // u8
  const args = []
  for (let i = argCount - 1; i >= 0; i--) args[i] = this.pop()
  const fn = this.nativeFunctions[nativeId]         // direct array access
  this.push(fn(this, args))
}

At compile time the compiler looks each function name up in a table of native names. If it's a known native (like circle or print), it emits CALL_NATIVE with that fixed numeric id, so dispatch at runtime is a single array index — no name lookup. The VMNativeBridge class registers the same 100+ functions the interpreter has, holding the matching drawing state (fill, stroke, transforms, noise tables, RNG) so both engines draw identically.

Closures and the One Thing That Falls Back

The VM's locals are stack slots that vanish when a function returns, so the obvious worry is closures: a nested function that outlives the function that created it and still reads its variables. The VM handles this anyway. At compile time it runs a free-variable analysis to find which locals get captured, boxes exactly those into heap Cells, and emits a small family of CLOSURE/*_UPVALUE/*_CELL opcodes so the closure carries live references to those cells. Non-capturing functions compile to byte-identical bytecode, so the fast path is untouched. This is involved enough to deserve its own chapter:

The full story Chapter 6: Closures walks through the boxed-cell model end to end — the problem, the analysis, the opcodes, and a worked counter-factory example.

The single feature the VM genuinely can't run is modules. It has no module system, so when the compiler hits an import it throws ModulesNotSupportedError, and Bloom transparently re-runs the sketch on the tree-walking interpreter, which links modules. You don't have to do anything.

The Infinite-Loop Watchdog

A while (true) {} inside draw() would wedge the browser's main thread. The interpreter guards against this with iteration and time limits; the VM needs the same protection without slowing the hot path. It arms a wall-clock deadline once per top-level call (so the budget spans a whole setup() or draw() frame) and checks it only on the LOOP back-edge — the backward jump emitted for while and for-in loops — throttled by a counter so the check stays off the per-iteration path. The fused integer-range loop (LOOP_INT, the 0..n fast path) is intentionally not watched: its bounds are known, so it can't spin forever.

Performance

The VM avoids the interpreter's per-node method dispatch and pointer chasing, swaps name-keyed variable lookups for direct slot/array indexing, and fuses common patterns into superinstructions. In practice that's roughly 2–6× faster on loop-heavy code and dramatically faster on deep recursion — the VM returns from a call by simply resetting the stack pointer, with no control-flow object involved at all.

Where it shines Sketches with lots of math, recursion, or tight loops — fractals, particle systems, complex animations — stay on the fast path. And because the VM captures closures natively, even sketches built around counter factories and event handlers get full VM speed.

In the Source Code

The bytecode system lives in src/lang/bytecode.ts:

See it for real You can disassemble any program into this opcode listing with bloom disasm file.blm. The disassembly guide walks through a complete example line by line.