Patrick Passarella

Finally understanding Node.js internals.

A straightforward post about some of the Node.js internals aspects, like V8, Event Loop, JIT, Libuv, and more bonus topics.

Patrick Passarella - 30th Apr, 2021

Photo by @tomadamsmia on Unsplash

Scope

First of all, what does node internals means? Node internals is the structure of Node, how does it work, why it works like that, what technologies are involved, things like that.

In this post, I will talk about some of the core technologies Node is composed, a more top-down perspective on how they work, and some additional info that may help you understand some more about it.

More specifically:

  • V8 Engine
  • JIT Paradigm
  • Non-blocking I/O
  • Libuv
  • Event Loop
  • Generators
  • Async Await

This is not an advanced post, actually, it's the other way around, a post for you to finally understand the basics of how Node works, because there's too much about it.

Why

Why learn node internals? First, knowledge is always welcomed, especially if it will make a big change in your technical thinking skills, and this is the case. Second, understanding how things work, will make you develop better, and find and fix bugs easier.

V8 Engine

We need to start talking about V8, because almost everything is related to it.

V8 is an engine created by Google, written in C++, that compiles JavaScript source code to native machine code at runtime. It was created to optimize javascript execution inside browsers.

Javascript is an "interpreted" language, but the v8 compiles the code and optimizes the execution, allowing the execution to be done on top of the compiled code.

Javascript execution in V8 is divided into three stages:

  • Source to syntax tree: The parser generates an abstract syntax tree (AST) from the source code;
  • Syntax tree to bytecode: V8 Ignition generates bytecode from the syntax tree;
  • Bytecode to machine code: V8 TurboFan generates a graph from bytecode, replacing sections of bytecode with machine code.

These two last stages operate within the just-in-time (JIT) paradigm.

JIT paradigm

To execute any program, the computer must translate the code to machine language. There are a few ways to do that.

Interpreter:

Translates and then executes, line by line. Has fast translation (easy) and slow execution;

Compiler:

Compile all the code to machine code at once before executing it. Slow translation (complex), but fast execution.

JIT (Just in Time) compilation:

Compilation at run time. Combine the best parts from both interpreter and compiler, making translation and execution fast. The code is run through an interpreter (Ignition), and during execution, it keeps track of code segments that can be reused where possible, and others that can be used to make optimizations based on assumptions (TurboFan).

The main problem the interpreter have is that if you pass the same line of code 10 times, it will translate 10 times. JIT tries to avoid retranslations.

The JIT compiler can expose overhead memory consumption. Ignition addresses this by achieving three objectives: reducing memory usage, reducing startup time, and reducing complexity. All three objectives are accomplished by compiling AST to bytecode and collecting feedback during program execution.

Threads

Some info about threads, which is commonly confused by people.

A thread is something that is associated with an instance of each program we run, it's an operation that our CPU needs to perform.

There are two types of threads.

Single Thread

Only one piece of code can be executed at a time. For instance, if multiple requests are sent to a server, the next one needs to wait for the current one to send back a response to be processed.

Multi Thread

Multiple pieces of code can be executed at the same time, in parallel. Each request sent to a server simultaneously will create a new thread to process it.

Javascript/Node being single-threaded does not mean it needs to wait for a request to be done, and also doesn't mean that it doesn't use threads internally.

Non-blocking I/O

Since JavaScript execution in Node.js is single-threaded, it is possible to start an I/O process, but not block non-dependent tasks, it means that it can just start the communication, and continue to process other tasks that don’t need the I/O response, if a task needs the response, it will wait for it and then execute a callback. It helps parallel execution and the use of resources.

How is it done?

Instead of waiting for the process to complete, it sends it to a “worker”, and frees the thread, while it is processing the data, the thread can process another task.

Examples

Blocking

1const content = fs.readFileSync('path'); // 1
2
3console.log(content); // 2
4console.log('Executed'); // 3
5
6// The file content
7// Executed

The last console.log is totally independent of the file read operation, it doesn't need its result, but even then, it needs to wait for it to complete before executing.

Non-blocking

1fs.readFile('path', (content) => {
2 console.log(content);
3});
4
5console.log('Executed');
6
7// Executed
8// The file content

In this case, the last console.log is executed first, which is great, because again, it doesn't need the file read result, so it doesn't make sense to wait for it. This is done because Node throws the process to the "worker", and while it is doing its work, Node goes to execute the second task, which is the console.log('Executed').

Libuv

Libuv is an open-source library written in C, that uses a thread-pool (multi-thread) to manage parallel operations.

So, what are those workers I have been talking about? Well, they are part of Libuv. Workers are non-blocking async I/O processes in the background managed by Libuv. Each sync I/O operation they receive from the event loop is run on a background thread, to not block the main thread.

Libuv also provides the event loop, where it was made to be tied with only a single thread. You can run multiple event loops, but it must be one per thread. Inside Libuv, there are a lot of concepts, like handles, who are abstractions to some resources like TCP/UDP and signals. There are also handles to interact with the event loop, like idle, prepare, check, and some async types. Each one serves to do a specific action with the event loop, such as run something before or after Libuv does I/O polling for example.

Also, some modules from Node comes from Libuv, like the fs and child_process module.

Async File I/O

When a task for something like a file I/O is put in the call stack, the event loop sends it to the Libuv thread pool, when the data is processed, it puts it in the Task Queue, for the event loop to call when it is available.

Network I/O

The thread pool is bypassed completely when executing low-level OS operations, like a REST API call. Libuv nor node handle these. Instead, there’s something called OS Async Helpers, where the operations are executed immediately as soon as a CPU is available.

After the task is done, it puts the callback into the Job Queue, for the event loop to call after the call stack is empty.

All network I/O are performed on sockets which are polled using epoll/kqueue/IOCP (system event interfaces).

Event Loop

We can think about the event loop like someone who sends tasks to the right places, for example, if it gets a task, it will get it from the call stack, see if it can be run on the system primitive (kqueue, epoll), if yes, it will send to it and them get back the response after it’s done, if not, it will send to a background worker. So it’s literally just “communication”.

But in the middle of all that communication, it also manages a lot of other stuff, like timers, and other callbacks.

The event loop is what allows Node.js to perform non-blocking I/O operations, by offloading operations to the system kernel whenever possible. The event loop will go through a phase, execute the callbacks related to that phase, and then move to the next one. Each phase has a callback queue. Depending on the task, it will send it to the corresponding queue.

When Node.js starts, it initializes the event loop, processes the script, then begins processing the event loop. The loop only stops when there are no more tasks queued in the event loop or in the call stack.

Phases

Event Loop

timers: Executes callbacks scheduled by setTimeout() and setInterval().

pending callbacks: Executes callbacks for some system operations.

idle, prepare: Libuv handles, idle will run the given callbacks once per loop iteration, prepare runs before libuv does its I/O polling.

poll: Retrieve new I/O events and execute I/O related callbacks;

check: Execute setImmediate() callbacks.

close callbacks: Execute close callbacks, e.g. socket.on('close', ...).

Final diagram

Event Loop Complete

Quick recap

Looking in that diagram, I will follow the readFile task.

  1. A read file task is placed on the call stack.
  2. The event loop sends it to Libuv background threads.
  3. While the threads are working, the event-loop single-thread can process another task since it's empty.
  4. After the Libuv threads are done processing the readFile callback, it sends it to the respective queue, in this case, the Task Queue.
  5. After the call stack is empty, the event-loop grabs the readFile callback from the Task Queue and place it on the call stack to be executed.

Now, the API request example.

  1. An API request task is placed on the call stack.
  2. The event loop sends it to the OS to process.
  3. While the OS is working, the event-loop single-thread can process another task since it's empty.
  4. After the OS is done processing, it sends it to the respective queue, in this case, the Job Queue.
  5. After the call stack is empty, the event-loop grabs the API request callback from the Job Queue and place it on the call stack to be executed.

Generators

This is more like extra content, but generators also have a pretty good part inside Node. But first, let's see about Iterators.

Iterators

An iterator is an object that produces a sequence of values and return a value on its end.

What is defined as an iterator

An object to be defined as an iterator needs to implement a next method, which needs to return an object containing two properties:

Done: Defines if the sequence is finished or not

Value: Any value returned by the iterator

An iterator object can be iterated by calling the next method. When all the values are iterated, the done property returns true.

1const iterator = [1, 2, 3][Symbol.iterator]();
2iterator.next(); // { value: 1, done: false }
3iterator.next(); // { value: 2, done: false }
4// ...
5iterator.next(); // { value: undefined, done: true }

Creating an Iterator

1function makeRangeIterator(start = 0, end = Infinity) {
2 let nextIndex = start;
3 let iterationCount = 0;
4
5 const rangeIterator = {
6 next: function () {
7 let result;
8 if (nextIndex < end) {
9 result = { value: nextIndex, done: false };
10 nextIndex++;
11 iterationCount++;
12 return result;
13 }
14 return { value: iterationCount, done: false };
15 },
16 };
17 return rangeIterator;
18}

That'll seem like too much code, but it's pretty straightforward. The function implements and returns the next function, in which when you call it, it will increment the current index, and return the result, which it's the value and done object. When it's the final index, we just return the done: true. In this example, I used just the iteration count for the value.

This implementation is considered an Iterator since it has all the requirements.

Now back to generators

Generators are a type of function that can be executed, paused, and resumed in a controlled manner. There are two main things that generators are used for, implementing lazy iterators and blocking asynchronous calls.

Here is the same function above, but implemented using generators.

1function* makeRangeIterator(start = 0, end = Infinity) {
2 let iterationCount = 0;
3
4 for (let i = start; i < end; i++) {
5 iterationCount++;
6 yield i;
7 }
8
9 return iterationCount;
10}

Very noticeable better. Generators have that special syntax that you need to add an * after the function keyword, this determines that this function is a generator. Besides that, the only other weird thing is the yield keyword. When the code is being executed, it will stop right when it reaches the yield, and the value after the yield, in this case, the i, is the value that will be returned in the value property from that iteration.

1const generator = makeRangeIterator();
2generator.next(); // { value: 0, done: false }
3generator.next(); // { value: 1, done: false }
4generator.next(); // { value: 2, done: false }
5// ...

You can also inject a value by passing the value to the next function, like so, generator.next('new value').

Async Await

Have you ever wondered how this thing works? Me neither, but I will show you.

Async Await is a way to pause the execution of a code on the await keyword, when used on a promise, and resume only when the promise is resolved.

So we need 3 things to make something like this work.

  1. Pause a function
  2. Put a value inside the function
  3. Resume a function

And that's exactly what generators allowed us to do!

So, let's implement our own Async Await, why not.

1const wrapToReturnPromise = function (gen) {
2 const generator = gen();
3 const { value } = generator.next();
4
5 if (value.isPromise()) { // pseudo-code
6 return value
7 .then((val) => generator.next(val))
8 .catch((err) => generator.throw(err));
9 }
10
11 // Error if not promise
12};

Here we created a function that receives a generator function from its parameters, then we call the generator next method to get its initial value, after that, we check if that value is a promise and we call next again inside the then, but this time injecting the value to it (generator.next(val)), which would cause it to be returned as {value: val, done} object.

1const fakeApi = function () {
2 return new Promise((resolve, reject) => {
3 setTimeout(() => {
4 return resolve('Fake data');
5 }, 1000);
6 });
7};
8
9const foo = wrapToReturnPromise(function* () {
10 try {
11 const val = yield fakeApi();
12 console.log(val); // "Fake data"
13 return val;
14 } catch (err) {
15 console.log('Error: ', err.message);
16 }
17});

Here is the interesting part, we call the wrapToReturnPromise, passing in our generator function, which in this case just calls a function that returns a Promise and logs it, but we can see that it behaves exactly like an async await function, the yield will tell the runtime to stop there, just like the await, and after the promise is resolved, it resumes and logs the returned value!. Very cool (for me at least).

And here's where I'm gonna end this post, I hope it was useful for you and you learned something. This stuff may be difficult to understand sometimes, but time will take care of it.

Thanks for reading!Enjoyed the content or just want to send me a message? Follow me on Twitter!
Patrick Passarella (@P_Passarella) | Twitter
Twitter user