Architecture for a JS to C compiler
The basic data-flow of compilation looks like this:
- parse: turn the source text into something we can use as input for our compiler, usually an AST (abstract syntax tree)
- optimisation (optional): transform the input program into an equivalent one that's more efficient in time or space
- code generation: output a program in our target language, e.g machine code
My JS to C compiler will implement this as follows
- parse: use esprima - this project is about compilation not parsing
- & 3. code generation + optimisation: a compiler written in TypeScript that outputs C
C as a target language
C is lower-level than JavaScript, but it still gives us a lot! We can use the following features in C to implement similar features in JavaScript:
- control structures:
if
,for
,while
etc will map well between languages - function calls: we can compile JS functions to C functions
However, there is lots we're going to have to write from scratch. For instance C lacks data-structures beyond statically sized arrays. We'll fill this in with what's called a 'runtime library' - a library we'll call from our compiled program. Lots of the behaviour we'll need to implement JavaScript cannot be implemented without runtime support:
- garbage collection
- exception handling
- data-structures: dynamic arrays and objects
- prototype-based object system
A sketch
Lets manually walk the following JS code through our compiler tool-chain:
function fact(n) {
return n < 3 ? n : n * fact(n - 1);
}
console.log(fact(5));
First we run esprima to parse the source and give us a syntax-tree:
{
"type": "Program",
"body": [
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "fact"
},
"params": [
{
"type": "Identifier",
"name": "n"
}
],
"body": {
"type": "BlockStatement",
This will be input to js-to-c.ts
, which would output a target C program. We have a ternary expression in our input program, here's a snippet from the function in the compiler that handles it. You can see we're implementing JS's ternary expression with C's if
:
function compileConditionalExpression(node: ConditionalExpression, state: CompileTimeState) {
// ...
return `${testSrc};
JsValue* ${resultTarget.id};
if(isTruthy(${testResultTarget.id})) {
${consequentSrc}
} else {
${alternateSrc}
}`
}
Once our compiler has run we're left with a valid C program (hopefully). The below is the compiled fact
function. Some interesting things to look for: the recursive call, and how the result value from the ternary is threaded through the multiplication and return:
static JsValue *fact_1(Env *env) {
JsValue *return_3;
JsValue *left_6 = envGet(env, interned_2 /* n */);
JsValue *right_7 = jsValueCreateNumber(3);
JsValue *conditionalPredicate_5 = LTOperator(left_6, right_7);
JsValue *conditionalValue_4;
if (isTruthy(conditionalPredicate_5)) {
return_3 = envGet(env, interned_2 /* n */);
} else {
JsValue *left_8 = envGet(env, interned_2 /* n */);
JsValue *callee_10 = envGet(env, interned_11 /* fact */);
JsValue *left_12 = envGet(env, interned_2 /* n */);
JsValue *right_13 = jsValueCreateNumber(1);
JsValue *call10Arg_0 = subtractOperator(left_12, right_13);
JsValue *args_10[] = {call10Arg_0};
JsValue *right_9 = functionRunWithArguments(callee_10, args_10, 1, NULL);
return_3 = multiplyOperator(left_8, right_9);
}
return return_3;
return getUndefined();
}
You can also see calls out to our runtime library - e.g envGet()
. This is part of the runtime as we can't rely on the C call-stack here. In JS, functions are closures. Therefore they need the environment they close over to live well beyond the lifetime of the call-stack that defined them - think of function values stored as global properties or as event-listeners.
To use the runtime library our output program use C's #include
macro to include our runtime library's header files[^2], and get access to the definitions of the functions our library has defined elsewhere:
#include "../../runtime/environments.h"
#include "../../runtime/exceptions.h"
#include "../../runtime/functions.h"
[^2]: header-files in C define the function prototypes defined elsewhere in .c
or library files. Relying on the header-files means we can target library code without recompiling.
In my next post, we'll look at our how our js-to-c.ts
compiler takes a syntax tree and outputs C source code.
Leave a comment via Github 💬