Utilizing Thread pools in Node.js
Explanation of utilizing thread pools with the single-threaded event loop
Table of contents
If you are wondering about the title,
- All the methods of
fs
module in Node.js use thread pools - Some methods of
crypto
module likepbkdf2
hash method use Node.js thread pool - Depending of OS, Node.js make use of thread pools for different methods/modules
We all know, that Node.js is a single-threaded runtime. Every time we run Node.js, an event loop is created and inside a single thread, it determines what is going to be executed at a single time.
The event loop keeps an eye on the following to keep the loop continuing,
- Pending Timers
- Pending OS Tasks
- Pending Operations
Pending Operations
is the one that handles the thread pool. By default, the thread pool keeps managing 4 threads at a time.
Before we proceed further, let's get some memory refreshment of threads
and OS Schudling
.
Threads
When we run the program on the computer, we run Process
. Process
is an instance of a computer program. Inside a process, there are single/multiple threads.
A thread contains an instruction list like getting a file, reading it, executing an operation, closing the file, etc. These instructions are inside the threads and are usually executed by the CPU.
- The scheduler decides which threads will be executed in a specific instance of time.
- OS has multiple cores (physical and logical) to execute threads.
OS Scheduling:
Usually, an I/O operation requires non-zero time to be completed. Consider the process of reading a file. In this process, there can be two threads, one to get file info from a hard drive and another operation on that file, like reading. OS first tries to get the file info from the disk. This is a time-consuming process. During this waiting time, a CPU can take another thread and return back to the first thread to do the rest of the work.
Example of Using multi-threads with Event Loop
Observe the following codes,
const start = new Date.now();
function doRequest() {
// making a HTTP request
console.log(`Network: ${Date.now() - start}ms`);
}
function doHash() {
crypto.pbkdf2(...args, () => {
console.log(`Hash: ${Date.now() - start}ms`);
});
}
doRequest();
fs.readFile('fileName', 'format', () => {
console.log(`FS: ${Date.now() - start}ms`);
});
doHash();
doHash();
doHash();
doHash();
Output,
Network: time
Hash: time
FS: time
Hash: time
Hash: time
Hash: time
From the output, a couple of interesting facts come up,
- There's no way File System operation should take more time than a hashing
- Why always HTTP request comes first
- Why always one hash operation takes place before the file operation
Explanation:
HTTP calls are handled by OS itself. On the other hand, file system operation and hashing operation are handled by the thread pool. So while other operations make it to the thread pool, we get results from the HTTP requests.
For file system operation, there are two distinct processes,
- Get stat/info about the file
- Read the file
By default, we have 4 threads in the thread pool. We have 5 operations, 1 file system operation, and 4 hash operations. Among them, initially, 1 fs took one thread and 3 hash take other 3 threads.
In the first thread, While we go and look at the info of the file, and wait for info, the one leftover hash took place in the pool. In the meantime, the file stats are ready. The moment one hash is completed, the empty thread takes the leftover file operation and makes it completed almost immediately.
We can control the number of pools in
thred-pools
usingUV_THREADPOOL_SIZE
environment variable
Now if we make the thread pool size 5,
process.env.UV_THREADPOOL_SIZE = 5;
const start = new Date.now();
function doRequest() {
// making a http request
console.log(`HTTP: ${Date.now() - start}ms`);
}
function doHash() {
crypto.pbkdf2(...args, () => {
console.log(`Hash: ${Date.now() - start}ms`);
});
}
doRequest();
fs.readFile('fileName', 'format', () => {
console.log(`FS: ${Date.now() - start}ms`);
});
doHash();
doHash();
doHash();
doHash();
The output will be,
FS: time
Network: time
Hash: time
Hash: time
Hash: time
Hash: time
Hope this makes sense. Network calls are handled by OS, and do not deal with a thread pool. In the thread pool, for 5 operations (4 hashing and one file operation). With 5 threads, all operations come at the same time and file operation is faster than others. Network calls, handled by OS itself, are faster than the hashing operation and come at next. Then all other hash operations output comes in a sequence.
What if, the thread pool size is 1?
Network: time
Hash: time
Hash: time
Hash: time
Hash: time
FS: time
Hope this makes sense,
- Network is handled by OS itself outside of the thread pool
- One single thread first executes the first thread of FS
- During the pending time of fs, one hash operation comes up
- When first getting file info operation is pending, all the hash operations are in the queue
- The end of file operation takes place after all these hash operations is being completed
So yes, the event loop is single-threaded and internally, with libUV
it makes use of multiple threads.
References: Node JS: Advanced Concepts By Stephen Grider