A straightforward post about some of the Node.js internals aspects, like V8, Event Loop, JIT, Libuv, and more bonus topics.
Patrick Passarella - 30th Apr, 2021
Photo by @tomadamsmia on Unsplash
First of all, what does node internals means? Node internals is the structure of Node, how does it work, why it works like that, what technologies are involved, things like that.
In this post, I will talk about some of the core technologies Node is composed, a more top-down perspective on how they work, and some additional info that may help you understand some more about it.
More specifically:
This is not an advanced post, actually, it's the other way around, a post for you to finally understand the basics of how Node works, because there's too much about it.
Why learn node internals? First, knowledge is always welcomed, especially if it will make a big change in your technical thinking skills, and this is the case. Second, understanding how things work, will make you develop better, and find and fix bugs easier.
We need to start talking about V8, because almost everything is related to it.
V8 is an engine created by Google, written in C++, that compiles JavaScript source code to native machine code at runtime. It was created to optimize javascript execution inside browsers.
Javascript is an "interpreted" language, but the v8 compiles the code and optimizes the execution, allowing the execution to be done on top of the compiled code.
Javascript execution in V8 is divided into three stages:
These two last stages operate within the just-in-time (JIT) paradigm.
To execute any program, the computer must translate the code to machine language. There are a few ways to do that.
Translates and then executes, line by line. Has fast translation (easy) and slow execution;
Compile all the code to machine code at once before executing it. Slow translation (complex), but fast execution.
Compilation at run time. Combine the best parts from both interpreter and compiler, making translation and execution fast. The code is run through an interpreter (Ignition), and during execution, it keeps track of code segments that can be reused where possible, and others that can be used to make optimizations based on assumptions (TurboFan).
The main problem the interpreter have is that if you pass the same line of code 10 times, it will translate 10 times. JIT tries to avoid retranslations.
The JIT compiler can expose overhead memory consumption. Ignition addresses this by achieving three objectives: reducing memory usage, reducing startup time, and reducing complexity. All three objectives are accomplished by compiling AST to bytecode and collecting feedback during program execution.
Some info about threads, which is commonly confused by people.
A thread is something that is associated with an instance of each program we run, it's an operation that our CPU needs to perform.
There are two types of threads.
Only one piece of code can be executed at a time. For instance, if multiple requests are sent to a server, the next one needs to wait for the current one to send back a response to be processed.
Multiple pieces of code can be executed at the same time, in parallel. Each request sent to a server simultaneously will create a new thread to process it.
Javascript/Node being single-threaded does not mean it needs to wait for a request to be done, and also doesn't mean that it doesn't use threads internally.
Since JavaScript execution in Node.js is single-threaded, it is possible to start an I/O process, but not block non-dependent tasks, it means that it can just start the communication, and continue to process other tasks that don’t need the I/O response, if a task needs the response, it will wait for it and then execute a callback. It helps parallel execution and the use of resources.
Instead of waiting for the process to complete, it sends it to a “worker”, and frees the thread, while it is processing the data, the thread can process another task.
1const content = fs.readFileSync('path'); // 123console.log(content); // 24console.log('Executed'); // 356// The file content7// Executed
The last console.log
is totally independent of the file read operation, it doesn't need its result, but even then, it needs to wait for it to complete before executing.
1fs.readFile('path', (content) => {2 console.log(content);3});45console.log('Executed');67// Executed8// The file content
In this case, the last console.log
is executed first, which is great, because again, it doesn't need the file read result, so it doesn't make sense to wait for it.
This is done because Node throws the process to the "worker", and while it is doing its work, Node goes to execute the second task, which is the console.log('Executed')
.
Libuv is an open-source library written in C, that uses a thread-pool (multi-thread) to manage parallel operations.
So, what are those workers I have been talking about? Well, they are part of Libuv. Workers are non-blocking async I/O processes in the background managed by Libuv. Each sync I/O operation they receive from the event loop is run on a background thread, to not block the main thread.
Libuv also provides the event loop, where it was made to be tied with only a single thread. You can run multiple event loops, but it must be one per thread.
Inside Libuv, there are a lot of concepts, like handles, who are abstractions to some resources like TCP/UDP and signals. There are also handles to interact with the event loop, like idle
, prepare
, check
, and some async types. Each one serves to do a specific action with the event loop, such as run something before or after Libuv does I/O polling for example.
Also, some modules from Node comes from Libuv, like the fs
and child_process
module.
When a task for something like a file I/O is put in the call stack, the event loop sends it to the Libuv thread pool, when the data is processed, it puts it in the Task Queue, for the event loop to call when it is available.
The thread pool is bypassed completely when executing low-level OS operations, like a REST API call. Libuv nor node handle these. Instead, there’s something called OS Async Helpers, where the operations are executed immediately as soon as a CPU is available.
After the task is done, it puts the callback into the Job Queue, for the event loop to call after the call stack is empty.
All network I/O are performed on sockets which are polled using epoll/kqueue/IOCP (system event interfaces).
We can think about the event loop like someone who sends tasks to the right places, for example, if it gets a task, it will get it from the call stack, see if it can be run on the system primitive (kqueue, epoll), if yes, it will send to it and them get back the response after it’s done, if not, it will send to a background worker. So it’s literally just “communication”.
But in the middle of all that communication, it also manages a lot of other stuff, like timers, and other callbacks.
The event loop is what allows Node.js to perform non-blocking I/O operations, by offloading operations to the system kernel whenever possible. The event loop will go through a phase, execute the callbacks related to that phase, and then move to the next one. Each phase has a callback queue. Depending on the task, it will send it to the corresponding queue.
When Node.js starts, it initializes the event loop, processes the script, then begins processing the event loop. The loop only stops when there are no more tasks queued in the event loop or in the call stack.
timers: Executes callbacks scheduled by setTimeout() and setInterval().
pending callbacks: Executes callbacks for some system operations.
idle, prepare: Libuv handles, idle will run the given callbacks once per loop iteration, prepare runs before libuv does its I/O polling.
poll: Retrieve new I/O events and execute I/O related callbacks;
check: Execute setImmediate() callbacks.
close callbacks: Execute close callbacks, e.g. socket.on('close', ...).
Looking in that diagram, I will follow the readFile task.
Now, the API request example.
This is more like extra content, but generators also have a pretty good part inside Node. But first, let's see about Iterators.
An iterator is an object that produces a sequence of values and return a value on its end.
An object to be defined as an iterator needs to implement a next
method, which needs to return an object containing two properties:
Done: Defines if the sequence is finished or not
Value: Any value returned by the iterator
An iterator object can be iterated by calling the next
method. When all the values are iterated, the done
property returns true.
1const iterator = [1, 2, 3][Symbol.iterator]();2iterator.next(); // { value: 1, done: false }3iterator.next(); // { value: 2, done: false }4// ...5iterator.next(); // { value: undefined, done: true }
1function makeRangeIterator(start = 0, end = Infinity) {2 let nextIndex = start;3 let iterationCount = 0;45 const rangeIterator = {6 next: function () {7 let result;8 if (nextIndex < end) {9 result = { value: nextIndex, done: false };10 nextIndex++;11 iterationCount++;12 return result;13 }14 return { value: iterationCount, done: false };15 },16 };17 return rangeIterator;18}
That'll seem like too much code, but it's pretty straightforward.
The function implements and returns the next
function, in which when you call it, it will increment the current index, and return the result, which it's the value
and done
object. When it's the final index, we just return the done: true
. In this example, I used just the iteration count for the value.
This implementation is considered an Iterator since it has all the requirements.
Generators are a type of function that can be executed, paused, and resumed in a controlled manner. There are two main things that generators are used for, implementing lazy iterators and blocking asynchronous calls.
Here is the same function above, but implemented using generators.
1function* makeRangeIterator(start = 0, end = Infinity) {2 let iterationCount = 0;34 for (let i = start; i < end; i++) {5 iterationCount++;6 yield i;7 }89 return iterationCount;10}
Very noticeable better. Generators have that special syntax that you need to add an *
after the function
keyword, this determines that this function is a generator.
Besides that, the only other weird thing is the yield
keyword. When the code is being executed, it will stop right when it reaches the yield
, and the value after the yield
, in this case, the i
, is the value that will be returned in the value
property from that iteration.
1const generator = makeRangeIterator();2generator.next(); // { value: 0, done: false }3generator.next(); // { value: 1, done: false }4generator.next(); // { value: 2, done: false }5// ...
You can also inject a value by passing the value to the next
function, like so, generator.next('new value')
.
Have you ever wondered how this thing works? Me neither, but I will show you.
Async Await is a way to pause the execution of a code on the await
keyword, when used on a promise, and resume only when the promise is resolved.
So we need 3 things to make something like this work.
And that's exactly what generators allowed us to do!
So, let's implement our own Async Await, why not.
1const wrapToReturnPromise = function (gen) {2 const generator = gen();3 const { value } = generator.next();45 if (value.isPromise()) { // pseudo-code6 return value7 .then((val) => generator.next(val))8 .catch((err) => generator.throw(err));9 }1011 // Error if not promise12};
Here we created a function that receives a generator function from its parameters, then we call the generator next
method to get its initial value, after that, we check if that value is a promise and we call next
again inside the then
, but this time injecting the value to it (generator.next(val)
), which would cause it to be returned as {value: val, done}
object.
1const fakeApi = function () {2 return new Promise((resolve, reject) => {3 setTimeout(() => {4 return resolve('Fake data');5 }, 1000);6 });7};89const foo = wrapToReturnPromise(function* () {10 try {11 const val = yield fakeApi();12 console.log(val); // "Fake data"13 return val;14 } catch (err) {15 console.log('Error: ', err.message);16 }17});
Here is the interesting part, we call the wrapToReturnPromise
, passing in our generator function, which in this case just calls a function that returns a Promise and logs it, but we can see that it behaves exactly like an async await function, the yield
will tell the runtime to stop there, just like the await, and after the promise is resolved, it resumes and logs the returned value!. Very cool (for me at least).
And here's where I'm gonna end this post, I hope it was useful for you and you learned something. This stuff may be difficult to understand sometimes, but time will take care of it.