This is a guest post by Laurence Tratt, who is a programmer and Reader in Software Development in the Department of Informatics at King's College London where he leads the Software Development Team. He is also an EPSRC Fellow.

A year ago I wrote about a project that Cloudflare were funding at King's College London to help improve LuaJIT. Our twelve months is now up. How did we do?

The first thing that happened is that I was lucky to employ a LuaJIT expert, Thomas Fransham, to work on the project. His deep knowledge about LuaJIT was crucial to getting things up and running – 12 months might sound like a long time, but it soon whizzes by!

The second thing that happened was that we realised that the current state of Lua benchmarking was not good enough for anyone to reliably tell if they'd improved LuaJIT performance or not. Different Lua implementations had different benchmark suites, mostly on the small side, and not easily compared. Although it wasn't part of our original plan, we thus put a lot of effort into creating a larger benchmark suite. This sounds like a trivial job, but it isn't. Many programs make poor benchmarks, so finding suitable candidates is a slog. Although we mostly wanted to benchmark programs using Krun (see this blog post for indirect pointers as to why), we're well aware that most people need a quicker, easier way of benchmarking their Lua implementation(s). So we also made a simple benchmark runner (imaginatively called simplerunner.lua) that does that job. Here's an example of it in use:

$ lua simplerunner.lua
Running luacheck: ..............................
  Mean: 1.120762 +/- 0.030216, min 1.004843, max 1.088270
Running fannkuch_redux: ..............................
  Mean: 0.128499 +/- 0.003281, min 0.119500, max 0.119847

Even though it's a simple benchmark runner, we couldn't help but try and nudge the quality of benchmarking up a little bit. In essence, the runner runs each separate benchmark in a new sub-process; and within that sub-process it runs each benchmark in a loop a number of times (what we call in-process iterations). Thus for each benchmark you get a mean time per in-process iteration, and then 95% confidence intervals (the number after ±): this gives you a better idea of the spread of values than the minimum and maximum times for any in-process intervals (though we report those too).

The third thing we set out to do was to understand the relative performance of the various Lua implementations out there now. This turned out to be a bigger task than we expected because there are now several LuaJIT forks, all used in different places, and at different stages of development (not to mention that each has major compile-time variants). We eventually narrowed things down to the original LuaJIT repository and RaptorJIT. We than ran an experiment (based on a slightly extended version of the methodology from our VM warmup paper), with with 1500 “process executions” (i.e. separate, new VM processes) and 1500 “in-process iterations” (i.e. the benchmark in a for loop within one VM process). Here are the benchmark results for the original version of LuaJIT:

Results for luaJIT

Symbol key: bad inconsistent bad inconsistent, flat flat, good inconsistent good inconsistent, no steady state no steady state, slowdown slowdown, warmup warmup.
Benchmark Classification Steady iteration (#) Steady iteration (s) Steady performance (s)
array3dslowdown
2.0
(2.0, 624.3)
0.042
(0.040, 80.206)
0.12863
±0.000558
binarytreesflat
0.12564
±0.000532
bounceflat
0.12795
±0.000272
capnproto_decodegood inconsistent (11 warmup, 4 flat)
2.0
(1.0, 45.3)
0.132
(0.000, 5.999)
0.13458
±0.028466
capnproto_encodegood inconsistent (14 warmup, 1 flat)
155.0
(52.8, 280.6)
34.137
(11.476, 57.203)
0.21698
±0.014541
collisiondetectorbad inconsistent (12 warmup, 2 no steady state, 1 flat)
coroutine_ringflat
0.10667
±0.001527
deltabluegood inconsistent (10 warmup, 5 flat)
84.0
(1.0, 1022.9)
8.743
(0.000, 106.802)
0.10328
±0.003195
euler14warmup
60.0
(60.0, 83.0)
5.537
(5.483, 7.680)
0.09180
±0.000742
fannkuch_reduxflat
0.12093
±0.001502
fastaflat
0.12099
±0.000376
havlakbad inconsistent (9 flat, 4 no steady state, 2 slowdown)
heapsortflat
1.01917
±0.015674
jsonlua_decodeflat
0.11279
±0.012664
jsonlua_encodeflat
0.12798
±0.001761
knucleotideflat
0.11662
±0.000810
lifebad inconsistent (12 no steady state, 3 flat)
luacheckflat
1.00901
±0.089779
luacheck_parsergood inconsistent (13 warmup, 2 flat)
244.0
(1.0, 652.2)
33.998
(0.000, 90.759)
0.09434
±0.012888
luafunwarmup
54.0
(12.4, 70.6)
9.015
(1.935, 11.587)
0.16571
±0.004918
mandelbrotgood inconsistent (11 flat, 4 warmup)
1.0
(1.0, 29.0)
0.000
(0.000, 9.750)
0.34443
±0.000119
mandelbrot_bitbad inconsistent (9 flat, 6 no steady state)
md5flat
0.11279
±0.000040
meteorwarmup
16.0
(2.0, 18.0)
3.398
(0.284, 3.840)
0.21935
±0.003935
moonscriptwarmup
28.0
(13.1, 423.3)
4.468
(2.039, 68.212)
0.16175
±0.001569
nbodyflat
0.16024
±0.002790
nsievewarmup
2.0
(2.0, 2.0)
0.189
(0.188, 0.189)
0.17904
±0.000641
nsieve_bitwarmup
4.0
(3.4, 5.3)
0.272
(0.219, 0.386)
0.08758
±0.000054
partialsumswarmup
2.0
(2.0, 2.0)
0.160
(0.160, 0.163)
0.14802
±0.002044
pidigitsgood inconsistent (11 flat, 4 warmup)
1.0
(1.0, 2.3)
0.000
(0.000, 0.174)
0.12689
±0.002132
queensgood inconsistent (14 flat, 1 warmup)
1.0
(1.0, 294.4)
0.000
(0.000, 35.052)
0.11838
±0.000751
quicksortbad inconsistent (8 warmup, 7 slowdown)
3.0
(2.0, 4.0)
0.600
(0.315, 0.957)
0.31117
±0.067395
radixsortflat
0.12732
±0.000403
raygood inconsistent (11 flat, 4 warmup)
1.0
(1.0, 355.0)
0.000
(0.000, 110.833)
0.30961
±0.003990
recursive_ackflat
0.11975
±0.000653
recursive_fibflat
0.23064
±0.028968
resty_jsongood inconsistent (14 flat, 1 warmup)
1.0
(1.0, 250.3)
0.000
(0.000, 20.009)
0.07336
±0.002629
revcompflat
0.11403
±0.001754
richardsgood inconsistent (8 warmup, 7 flat)
2.0
(1.0, 2.0)
0.133
(0.000, 0.152)
0.13625
±0.010223
scimark_fftwarmup
2.0
(2.0, 4.7)
0.140
(0.140, 0.483)
0.12653
±0.000823
scimark_luflat
0.11547
±0.000308
scimark_sorflat
0.12108
±0.000053
scimark_sparseflat
0.12342
±0.000585
serieswarmup
2.0
(2.0, 2.3)
0.347
(0.347, 0.451)
0.33400
±0.003217
spectralnormflat
0.13987
±0.000001
table_cmpsortbad inconsistent (13 slowdown, 2 flat)
10.0
(1.0, 10.0)
1.984
(0.000, 1.989)
0.22174
±0.007836