Skip to content

Bytecode Persistence Strategy

FormatSerialized BinaryStructured Text (JSON/YAML)Custom Text FormatIn-Memory
TypeCompact Machine CodeHuman-Readable DataManual String RepresentationRAM Structure
SizeSmallest. Highly optimized.Large (Syntax overhead).Moderate.N/A.
I/O SpeedFastest. Direct memory mapping.Slow (Parsing required).Slow (Parsing required).Instant (No I/O).
ReadabilityNone. Requires disassembler.High.Moderate.N/A.
ImplementationTrivial (Haskell Data.Binary).Easy (Aeson library).Hard (Requires new parser).None.
  • Pros: Instant execution if the Compiler and VM are in the same binary.
  • Cons: Not Persistent. It fails the project requirement to produce an executable artifact. We cannot distribute the compiled code or run it later without recompiling.
  • Pros: We control the syntax completely.
  • Cons: Inefficient. We would need to write another parser just to load the bytecode into the VM. It introduces unnecessary complexity and point of failure.
  • Pros: Easy to debug; human-readable by default.
  • Cons: Bloated. A simple instruction like PUSH 42 becomes {"op": "PUSH", "arg": 42}, wasting disk space and I/O bandwidth. It is too slow for a performant VM.
  • Pros: Extremely compact and fast to load. It represents the exact state of the instructions in memory.
  • Cons: Not human-readable (requires a separate disassembler tool).

We selected Serialized Binary Files as the storage format for our bytecode to strictly adhere to the project specifications and maximize performance.

The subject explicitly states:

“Your compiler should be able to output its result as a binary format (bytecode).” “Your VM should be able to load this binary format and run it.”

Using text or memory would technically fail this mandatory requirement.

Since we are using Haskell, we can utilize the Data.Binary (or Cereal) libraries.

  • We can derive the serialization logic automatically using Generic.
  • This gives us robust, crash-proof serialization with zero boilerplate code.
  • Efficiency: It handles endianness and bit-packing automatically, ensuring our bytecode is portable across architectures.

Binary files are significantly smaller than text files.

  • Loading: The VM can “slurp” the binary file directly into data structures without the overhead of parsing text, checking for syntax errors, or converting strings to integers.
  • Execution: This ensures the VM startup time is minimized.

While not encrypted, a binary format prevents accidental modification by users (e.g., deleting a parenthesis in a JSON file) which could crash the VM. To read the code, we satisfy the “Disassembly” requirement by implementing a specific --debug or --disassemble flag in our compiler.