Utilizing Thread pools in Node.js

Utilizing Thread pools in Node.js

Explanation of utilizing thread pools with the single-threaded event loop

If you are wondering about the title,

  • All the methods of fs module in Node.js use thread pools
  • Some methods of crypto module like pbkdf2 hash method use Node.js thread pool
  • Depending of OS, Node.js make use of thread pools for different methods/modules

We all know, that Node.js is a single-threaded runtime. Every time we run Node.js, an event loop is created and inside a single thread, it determines what is going to be executed at a single time.

The event loop keeps an eye on the following to keep the loop continuing,

  • Pending Timers
  • Pending OS Tasks
  • Pending Operations

Pending Operations is the one that handles the thread pool. By default, the thread pool keeps managing 4 threads at a time.

Before we proceed further, let's get some memory refreshment of threads and OS Schudling.

Threads

When we run the program on the computer, we run Process. Process is an instance of a computer program. Inside a process, there are single/multiple threads.

A thread contains an instruction list like getting a file, reading it, executing an operation, closing the file, etc. These instructions are inside the threads and are usually executed by the CPU.

  • The scheduler decides which threads will be executed in a specific instance of time.
  • OS has multiple cores (physical and logical) to execute threads.

OS Scheduling:

Usually, an I/O operation requires non-zero time to be completed. Consider the process of reading a file. In this process, there can be two threads, one to get file info from a hard drive and another operation on that file, like reading. OS first tries to get the file info from the disk. This is a time-consuming process. During this waiting time, a CPU can take another thread and return back to the first thread to do the rest of the work.

Example of Using multi-threads with Event Loop

Observe the following codes,

const start = new Date.now();

function doRequest() {
  // making a HTTP request
  console.log(`Network: ${Date.now() - start}ms`);
}

function doHash() {
  crypto.pbkdf2(...args, () => {
    console.log(`Hash: ${Date.now() - start}ms`);
  });
}

doRequest();

fs.readFile('fileName', 'format', () => {
  console.log(`FS: ${Date.now() - start}ms`);
});

doHash();
doHash();
doHash();
doHash();

Output,

Network: time
Hash: time
FS: time
Hash: time
Hash: time
Hash: time

From the output, a couple of interesting facts come up,

  • There's no way File System operation should take more time than a hashing
  • Why always HTTP request comes first
  • Why always one hash operation takes place before the file operation

Explanation:

HTTP calls are handled by OS itself. On the other hand, file system operation and hashing operation are handled by the thread pool. So while other operations make it to the thread pool, we get results from the HTTP requests.

For file system operation, there are two distinct processes,

  • Get stat/info about the file
  • Read the file

By default, we have 4 threads in the thread pool. We have 5 operations, 1 file system operation, and 4 hash operations. Among them, initially, 1 fs took one thread and 3 hash take other 3 threads.

In the first thread, While we go and look at the info of the file, and wait for info, the one leftover hash took place in the pool. In the meantime, the file stats are ready. The moment one hash is completed, the empty thread takes the leftover file operation and makes it completed almost immediately.

We can control the number of pools in thred-pools using UV_THREADPOOL_SIZE environment variable

Now if we make the thread pool size 5,

process.env.UV_THREADPOOL_SIZE = 5;

const start = new Date.now();

function doRequest() {
  // making a http request
  console.log(`HTTP: ${Date.now() - start}ms`);
}

function doHash() {
  crypto.pbkdf2(...args, () => {
    console.log(`Hash: ${Date.now() - start}ms`);
  });
}

doRequest();

fs.readFile('fileName', 'format', () => {
  console.log(`FS: ${Date.now() - start}ms`);
});

doHash();
doHash();
doHash();
doHash();

The output will be,

FS: time
Network: time
Hash: time
Hash: time
Hash: time
Hash: time

Hope this makes sense. Network calls are handled by OS, and do not deal with a thread pool. In the thread pool, for 5 operations (4 hashing and one file operation). With 5 threads, all operations come at the same time and file operation is faster than others. Network calls, handled by OS itself, are faster than the hashing operation and come at next. Then all other hash operations output comes in a sequence.

What if, the thread pool size is 1?

Network: time
Hash: time
Hash: time
Hash: time
Hash: time
FS: time

Hope this makes sense,

  • Network is handled by OS itself outside of the thread pool
  • One single thread first executes the first thread of FS
  • During the pending time of fs, one hash operation comes up
  • When first getting file info operation is pending, all the hash operations are in the queue
  • The end of file operation takes place after all these hash operations is being completed

So yes, the event loop is single-threaded and internally, with libUV it makes use of multiple threads.

References: Node JS: Advanced Concepts By Stephen Grider