Benchmarking Typescript/Javascript on NodeJS on Linux
Table of Contents
Intro #
This post covers installing and running benny, an easy to use benchmarking library for NodeJS. It also shows how to get reliable and repeatable results when running the benchmarks, specifically on Linux but the ideas should be applicable elsewhere.
Even if you aren’t using Javascript or Typescript, the results may be of interest.
Setup #
I am using npm, typescript and ES modules, but this can be adapted for your own setup.
Init project folder with typescript:
mkdir benchmarking && cd benchmarking
npm init -y .
npm install -D typescript
npx tsc --init
Set "moduleResolution": "node16"
and module
to something that isn’t commonjs in tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "ES2022",
"moduleResolution": "node16",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"skipLibCheck": true
}
}
Add "type": module
to package.json
{
"name": "benchmarking",
"version": "1.0.0",
"description": "",
"main": "index.js",
"type": "module",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"devDependencies": {
"benny": "^3.7.1",
"typescript": "^4.9.5"
}
}
Benny #
Install benny
npm install benny -D
Create the example from the benny repo in example.ts
import * as b from "benny";
b.suite(
"Example",
b.add("Reduce two elements", () => {
[1, 2].reduce((a, b) => a + b);
}),
b.add("Reduce five elements", () => {
[1, 2, 3, 4, 5].reduce((a, b) => a + b);
}),
b.cycle(),
b.complete(),
b.save({ file: "reduce", version: "1.0.0" }),
b.save({ file: "reduce", format: "chart.html" })
);
Build and run the example with the following commands:
npx tsc
node example.js
and you should see this output:
Running "Example" suite...
Progress: 100%
Reduce two elements:
212 520 999 ops/s, ±2.01% | fastest
Reduce five elements:
199 846 060 ops/s, ±2.47% | slowest, 5.96% slower
Finished 2 cases!
Fastest: Reduce two elements
Slowest: Reduce five elements
Saved to: benchmark/results/reduce.json
Saved to: benchmark/results/reduce.chart.html
To run your own code for comparison, change the anonymous function in b.add
.
Great, job done, we can finish there?… Not quite.
Reliable testing #
To show some of the potential issues with benchmarking I am going to compare the performance of the Array.map
function to create a new array of squared numbers and compare with some alternate methods.
Here is the code:
import * as b from "benny";
const arr = Array.from({ length: 100000 }, () => Math.random());
b.suite(
"Mapping to array of squared values",
b.add("Map", () => {
const arr2 = arr.map((value) => {
value ** 2;
});
}),
b.add("Foreach", () => {
const arr2 = [];
arr.forEach((value) => {
arr2.push(value ** 2);
});
}),
b.add("For", () => {
const arr2 = Array(arr.length);
for (let i = 0; i < arr.length; i++) {
arr2[i] = arr[i] ** 2;
}
}),
b.cycle(),
b.complete()
);
And the results:
Map:
1 285 ops/s, ±8.68% | 15.29% slower
Foreach:
438 ops/s, ±3.02% | slowest, 71.13% slower
For:
1 517 ops/s, ±2.38% | fastest
Foreach is slowest, as expected, as it has to allocate a new array for every item. Interestingly a for loop is faster than map. I am not qualified to tell you why but presumably the overhead of the anonymous function call in the map function cannot be optimised away enough to compete with the for loop.
Anyway, the actual performance isn’t the point. What if I add an identical map test after the for loop test:
Map:
1 343 ops/s, ±8.85% | 18.75% slower
Foreach:
467 ops/s, ±1.82% | slowest, 71.75% slower
For:
1 653 ops/s, ±0.50% | fastest
Map:
942 ops/s, ±2.07% | 43.01% slower
Erm what.
Exactly the same function has dropped from 1343 to 942 op/s.
Whats going on? #
Modern CPUs are complicated [citiation needed]. They do all sorts of stuff to make the execution of code as fast and efficient as possible, but it can get in the way of benchmarks like this.
The reason for the difference in results above is probably because of turboboost. During the first test the clock frequency was higher, but after the next couple the CPU started to thermal throttle and the frequency had to be reduced, and with it the performance.
How can we fix it? #
All of the ideas for this are shamelessly taken from this great blog post. Its very interesting and covers a bunch of CPU features that can be disabled/changed when doing benchmarks to get consistent results. Go read it.
I wrote a script that implements some of the suggestions from the post, namely:
- Disabling turboboost
- Disabling hyper threading
- Set CPU governor to performance
- Bind process to particular core
It also builds the code, runs the benchmark and reverts the settings afterwards. In a file called run_benchmark.sh
:
echo "Building"
npx tsc || exit 1
echo "Disabling turboboost | 1"
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo >/dev/null
cat /sys/devices/system/cpu/intel_pstate/no_turbo
echo "Enabling performance mode"
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo performance | sudo tee $i >/dev/null
done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo "Disabling hyperthreading (core 1) | 0"
echo 0 | sudo tee /sys/devices/system/cpu/cpu1/online >/dev/null
cat /sys/devices/system/cpu/cpu1/online
echo "Running benchmark on core 0"
# Run node on cpu 0
sudo perf stat -- taskset -c 0 node "$1"
# Revert changes
echo "Enabling turboboost | 0"
echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo >/dev/null
cat /sys/devices/system/cpu/intel_pstate/no_turbo
echo "Enabling hyperthreading (core 1) | 1"
echo 1 | sudo tee /sys/devices/system/cpu/cpu1/online >/dev/null
cat /sys/devices/system/cpu/cpu1/online
echo "Enabling powersave mode"
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo powersave | sudo tee $i >/dev/null
done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
This is specific to Intel, but presumably applying the same changes to an AMD CPU is similar.
Results #
The same test is run with:
./run_benchmark.sh map.js
and the new results are:
Map:
690 ops/s, ±4.28% | 36.87% slower
Foreach:
313 ops/s, ±0.96% | slowest, 71.36% slower
For:
1 093 ops/s, ±0.48% | fastest
Map:
696 ops/s, ±4.26% | 36.32% slower
Slower, but consistent. Also notice that the uncertainty has gone down.
Conclusion #
Benchmarking different Typescript and Javascipt functions on Node is very easy to do, but there are pitfalls to be wary of. The script in this post shows how to get more consistent results.
Also, for loops can be faster than Array.map
in Javascript. Interesting!
I have no idea how to do the same on Mac or Windows but I assume you can somehow, maybe by clicking with a mouse (ergh!).