Benchmarking Typescript/Javascript on NodeJS on Linux

Table of Contents

Benchmarking is useful, Javascript is everywhere, how do we combine them and get it right?

Intro #

This post covers installing and running benny, an easy to use benchmarking library for NodeJS. It also shows how to get reliable and repeatable results when running the benchmarks, specifically on Linux but the ideas should be applicable elsewhere.

Even if you aren’t using Javascript or Typescript, the results may be of interest.

Setup #

I am using npm, typescript and ES modules, but this can be adapted for your own setup.

Init project folder with typescript:

mkdir benchmarking && cd benchmarking
npm init -y .
npm install -D typescript
npx tsc --init

Set "moduleResolution": "node16" and module to something that isn’t commonjs in tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ES2022",
    "moduleResolution": "node16",
    "esModuleInterop": true,
    "forceConsistentCasingInFileNames": true,
    "strict": true,
    "skipLibCheck": true
  }
}

Add "type": module to package.json

{
  "name": "benchmarking",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "type": "module",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "devDependencies": {
    "benny": "^3.7.1",
    "typescript": "^4.9.5"
  }
}

Benny #

Install benny

npm install benny -D

Create the example from the benny repo in example.ts

import * as b from "benny";

b.suite(
  "Example",

  b.add("Reduce two elements", () => {
    [1, 2].reduce((a, b) => a + b);
  }),

  b.add("Reduce five elements", () => {
    [1, 2, 3, 4, 5].reduce((a, b) => a + b);
  }),

  b.cycle(),
  b.complete(),
  b.save({ file: "reduce", version: "1.0.0" }),
  b.save({ file: "reduce", format: "chart.html" })
);

Build and run the example with the following commands:

npx tsc
node example.js

and you should see this output:

Running "Example" suite...
Progress: 100%

  Reduce two elements:
    212 520 999 ops/s, ±2.01%   | fastest

  Reduce five elements:
    199 846 060 ops/s, ±2.47%   | slowest, 5.96% slower

Finished 2 cases!
  Fastest: Reduce two elements
  Slowest: Reduce five elements

Saved to: benchmark/results/reduce.json

Saved to: benchmark/results/reduce.chart.html

To run your own code for comparison, change the anonymous function in b.add.

Great, job done, we can finish there?… Not quite.

Reliable testing #

To show some of the potential issues with benchmarking I am going to compare the performance of the Array.map function to create a new array of squared numbers and compare with some alternate methods.

Here is the code:

import * as b from "benny";
const arr = Array.from({ length: 100000 }, () => Math.random());

b.suite(
  "Mapping to array of squared values",

  b.add("Map", () => {
    const arr2 = arr.map((value) => {
      value ** 2;
    });
  }),

  b.add("Foreach", () => {
    const arr2 = [];
    arr.forEach((value) => {
      arr2.push(value ** 2);
    });
  }),

  b.add("For", () => {
    const arr2 = Array(arr.length);
    for (let i = 0; i < arr.length; i++) {
      arr2[i] = arr[i] ** 2;
    }
  }),

  b.cycle(),
  b.complete()
);

And the results:

  Map:
    1 285 ops/s, ±8.68%   | 15.29% slower

  Foreach:
    438 ops/s, ±3.02%     | slowest, 71.13% slower

  For:
    1 517 ops/s, ±2.38%   | fastest

Foreach is slowest, as expected, as it has to allocate a new array for every item. Interestingly a for loop is faster than map. I am not qualified to tell you why but presumably the overhead of the anonymous function call in the map function cannot be optimised away enough to compete with the for loop.

Anyway, the actual performance isn’t the point. What if I add an identical map test after the for loop test:

  Map:
    1 343 ops/s, ±8.85%   | 18.75% slower

  Foreach:
    467 ops/s, ±1.82%     | slowest, 71.75% slower

  For:
    1 653 ops/s, ±0.50%   | fastest

  Map:
    942 ops/s, ±2.07%     | 43.01% slower

Erm what.

Exactly the same function has dropped from 1343 to 942 op/s.

Whats going on? #

Modern CPUs are complicated ^{[citiation needed]}. They do all sorts of stuff to make the execution of code as fast and efficient as possible, but it can get in the way of benchmarks like this.

The reason for the difference in results above is probably because of turboboost. During the first test the clock frequency was higher, but after the next couple the CPU started to thermal throttle and the frequency had to be reduced, and with it the performance.

How can we fix it? #

All of the ideas for this are shamelessly taken from this great blog post. Its very interesting and covers a bunch of CPU features that can be disabled/changed when doing benchmarks to get consistent results. Go read it.

I wrote a script that implements some of the suggestions from the post, namely:

Disabling turboboost
Disabling hyper threading
Set CPU governor to performance
Bind process to particular core

It also builds the code, runs the benchmark and reverts the settings afterwards. In a file called run_benchmark.sh:

echo "Building"
npx tsc || exit 1

echo "Disabling turboboost | 1"
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo >/dev/null
cat /sys/devices/system/cpu/intel_pstate/no_turbo

echo "Enabling performance mode"
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo performance | sudo tee $i >/dev/null
done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

echo "Disabling hyperthreading (core 1) | 0"
echo 0 | sudo tee /sys/devices/system/cpu/cpu1/online >/dev/null
cat /sys/devices/system/cpu/cpu1/online

echo "Running benchmark on core 0"
# Run node on cpu 0
sudo perf stat -- taskset -c 0 node "$1"

# Revert changes

echo "Enabling turboboost | 0"
echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo >/dev/null
cat /sys/devices/system/cpu/intel_pstate/no_turbo

echo "Enabling hyperthreading (core 1) | 1"
echo 1 | sudo tee /sys/devices/system/cpu/cpu1/online >/dev/null
cat /sys/devices/system/cpu/cpu1/online

echo "Enabling powersave mode"
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo powersave | sudo tee $i >/dev/null
done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

This is specific to Intel, but presumably applying the same changes to an AMD CPU is similar.

Results #

The same test is run with:

./run_benchmark.sh map.js

and the new results are:

Map:
    690 ops/s, ±4.28%     | 36.87% slower

  Foreach:
    313 ops/s, ±0.96%     | slowest, 71.36% slower

  For:
    1 093 ops/s, ±0.48%   | fastest

  Map:
    696 ops/s, ±4.26%     | 36.32% slower

Slower, but consistent. Also notice that the uncertainty has gone down.

Conclusion #

Benchmarking different Typescript and Javascipt functions on Node is very easy to do, but there are pitfalls to be wary of. The script in this post shows how to get more consistent results.

Also, for loops can be faster than Array.map in Javascript. Interesting!

I have no idea how to do the same on Mac or Windows but I assume you can somehow, maybe by clicking with a mouse (ergh!).