Architecture for a JS to C compiler

The basic data-flow of compilation looks like this:

  1. parse: turn the source text into something we can use as input for our compiler, usually an AST (abstract syntax tree)
  2. optimisation (optional): transform the input program into an equivalent one that's more efficient in time or space
  3. code generation: output a program in our target language, e.g machine code

My JS to C compiler will implement this as follows

  1. parse: use esprima - this project is about compilation not parsing
  2. & 3. code generation + optimisation: a compiler written in TypeScript that outputs C

C as a target language

C is lower-level than JavaScript, but it still gives us a lot! We can use the following features in C to implement similar features in JavaScript:

  1. control structures: if, for, while etc will map well between languages
  2. function calls: we can compile JS functions to C functions

However, there is lots we're going to have to write from scratch. For instance C lacks data-structures beyond statically sized arrays. We'll fill this in with what's called a 'runtime library' - a library we'll call from our compiled program. Lots of the behaviour we'll need to implement JavaScript cannot be implemented without runtime support:

  1. garbage collection
  2. exception handling
  3. data-structures: dynamic arrays and objects
  4. prototype-based object system

A sketch

Lets manually walk the following JS code through our compiler tool-chain:

function fact(n) {
    return n < 3 ? n : n * fact(n - 1);
}
console.log(fact(5));

First we run esprima to parse the source and give us a syntax-tree:

{
  "type": "Program",
  "body": [
      {
          "type": "FunctionDeclaration",
          "id": {
              "type": "Identifier",
              "name": "fact"
          },
          "params": [
              {
                  "type": "Identifier",
                  "name": "n"
              }
          ],
          "body": {
              "type": "BlockStatement",

This will be input to js-to-c.ts, which would output a target C program. We have a ternary expression in our input program, here's a snippet from the function in the compiler that handles it. You can see we're implementing JS's ternary expression with C's if:

function compileConditionalExpression(node: ConditionalExpression, state: CompileTimeState) {
    // ...
    return `${testSrc};
            JsValue* ${resultTarget.id};
            if(isTruthy(${testResultTarget.id})) {
                ${consequentSrc}
            } else {
                ${alternateSrc}
            }`
}

Once our compiler has run we're left with a valid C program (hopefully). The below is the compiled fact function. Some interesting things to look for: the recursive call, and how the result value from the ternary is threaded through the multiplication and return:

static JsValue *fact_1(Env *env) {
  JsValue *return_3;

  JsValue *left_6 = envGet(env, interned_2 /* n */);
  JsValue *right_7 = jsValueCreateNumber(3);
  JsValue *conditionalPredicate_5 = LTOperator(left_6, right_7);
  JsValue *conditionalValue_4;
  if (isTruthy(conditionalPredicate_5)) {
    return_3 = envGet(env, interned_2 /* n */);
  } else {
    JsValue *left_8 = envGet(env, interned_2 /* n */);
    JsValue *callee_10 = envGet(env, interned_11 /* fact */);
    JsValue *left_12 = envGet(env, interned_2 /* n */);
    JsValue *right_13 = jsValueCreateNumber(1);
    JsValue *call10Arg_0 = subtractOperator(left_12, right_13);
    JsValue *args_10[] = {call10Arg_0};
    JsValue *right_9 = functionRunWithArguments(callee_10, args_10, 1, NULL);

    return_3 = multiplyOperator(left_8, right_9);
  }
  return return_3;
  return getUndefined();
}

You can also see calls out to our runtime library - e.g envGet(). This is part of the runtime as we can't rely on the C call-stack here. In JS, functions are closures. Therefore they need the environment they close over to live well beyond the lifetime of the call-stack that defined them - think of function values stored as global properties or as event-listeners.

To use the runtime library our output program use C's #include macro to include our runtime library's header files[^2], and get access to the definitions of the functions our library has defined elsewhere:

#include "../../runtime/environments.h"
#include "../../runtime/exceptions.h"
#include "../../runtime/functions.h"

[^2]: header-files in C define the function prototypes defined elsewhere in .c or library files. Relying on the header-files means we can target library code without recompiling.

In my next post, we'll look at our how our js-to-c.ts compiler takes a syntax tree and outputs C source code.