js-fingertree
Finger trees for JavaScript. Parent is aureooms/js-persistent.
data FingerTree a = Empty | Single a | Digit a
Optimization of the code step by step
I will copy here the output of a benchmark run on my computer after each optimization change together with a small explanation.
Note: I did not check if the real issues were caused by generators, bound functions, V8 optimization inhibitors, or anything else...
When the benchmark was added
d903efb8d9011a44a5ff732c569b2ae33aa5b8f1
What the benchmark code does is, in order:
- build an empty tree
T
- add 100000 elements to the beginning of
T
- remove those 100000 elements by popping the first element of
T
100000 times - add 100000 elements to the end of
T
- split
T
at all 100000 positions, one after the other - remove the 100000 elements of
T
by popping the last element 100000 times
Here are the first running times I measured for this implementation. It is not difficult to see that the implementation is not fast at all. 100000 elements is not that much right?
$ node benchmark/tree.jsnumber of operations: 100000cons: 8584mstail: 6841mspush: 8581mssplit: 144243msinit: 6896ms
However most of the motivation for those optimization steps came from applying the same benchmark to another JavaScript implementation (actually, I copied the benchmark from Joe's repository). This made it obvious that my implementation was performing badly.
$ node ../fingertree.js/benchmark/benchmark.jsnumber of operations: 100000cons: 155mstail: 221mspush: 134mssplit: 1889msinit: 141ms
First implementation of lazy evaluation of subtrees
f0ca58169a32140a8617625466f15489a3549124
A Deep instance posseses a subtree which in some cases is not accessed at all during the whole lifetime of its parent. So why build it? The idea is to delay the construction of the object, which might be expensive, to a later point in time using a proxy: a finger tree look-alike object that will construct the real subtree if needed.
Time measurements after applying lazy evaluation concepts:
$ node benchmark/tree.jsnumber of operations: 100000cons: 8585mstail: 6812mspush: 8113mssplit: 139449msinit: 6777ms
Unfortunately there is no big performance improvement here but as you'll see there are some nastier things to change.
Unwrap values
19c8ca9298c98eef7e3f257257cb4e01adfe0edb
Before this change, each value was first wrapped in a unique-value-node-like structure defined as:
{ thiselement = element ; thisv = v ; } { return thisv ; }
where v
is the value returned by Measure.measure( element )
. This wrapping
allowed to give the same interface to elements, nodes, digits and trees with
respect to computing measures. This approach has at least 2 problems:
- It introduces one additional level of indirection by wrapping all values.
- The way it was implemented here caches measures for all elements. This might not be a good thing in settings where measures are big objects. Moreover, this is in contradiction with the lazy way of functional programming paradigms.
However, there is a simpler way to solve the common interface problem. Indeed, the elements contained in the digits of the root tree can be measured with the Measure object left unchanged, and starting from the subtree of the root level we just need a modified Measure object that uses the interface of Node2 and Node3 to compute the node measures. Never will digits contain objects of different types, so patching the Measure object this way suffices, there is no need to give the same interface to nodes and elements.
Here are the measurements after the change:
$ node benchmark/tree.jsnumber of operations: 100000cons: 8452mstail: 6930mspush: 7983mssplit: 141338msinit: 6657ms
Mostly noise for the moment. There must be something else to fix...
Removing iterator loops and bindings for fixed size sequences
28f6ee08d33e2cb2b9d0e5cae69948a7013bc0b7
At some point I decided to use a better tool than intuition to progress:
profiling. This change is the first where I guided the pruning by looking at
the output of node --prof benchmark/tree.js
using node-tick-processor
from the tick package.
Before this change, digit measures where computed using the idiom
;
However, since all four digit types are hard-coded has classes One, Two, Three and Four we could as well give a custom implementation of the measure( ) method to each digit type. For type Two for example, we have:
... { return M ; } ...
Here are the times:
$ node benchmark/tree.jsnumber of operations: 100000cons: 350mstail: 493mspush: 398mssplit: 25851msinit: 442ms
A drastic improvement right? However, there seems to be a problem with the split method...
Removing loops: part II
5b0af102240b3f0cfda7001f6b108a9811594c7a
It turns out I didn't remove all occurences of this generic digit measurement
method. Here are the running times after removing occurences of this method
from Deep.splitTree
.
$ node benchmark/treejsnumber of operations: 100000cons: 370mstail: 473mspush: 293mssplit: 7563msinit: 396ms
Better.
Specialized procedures for small loops: part II
88592b2562bea585c868e036a04dd1687e91211c 726354a3bd0591d767fb658d206680a6a74d2fbb
These two commits introduce specialized procedures for building trees from
small lists and digits. These are really small trees, i.e. 0 to 4 elements, so
why use the generic from_iterable
constructor?
$ node benchmark/treejsnumber of operations: 100000cons: 315mstail: 435mspush: 249mssplit: 4834msinit: 408ms
Here again a big improvement on the split method.
Dropping some of the es6 syntax
295011e3293b3dd2fe1edc80bdf1a14ca6d4dcca
This is a big one. I started writing this library with es6 syntax in mind. However, looking at the profiler output you will see:
$ node-tick-processor isolate-0x2af4cf0-v8.log | grep '3:24' 146 2.3% 2.4% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 278 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 6 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 165 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 97 99.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 127 98.4% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 163 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 146 2.3% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 138 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 58 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 129 100.0% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24 3 2.3% LazyCompile: *get /home/aureooms/sandbox/js-fingertree/js/dist/fingertree.js:3:24
I didn't analyze much but it has to do with the way classes are handled by babel.
Ok so I just threw the es6 classes and replaced them with plain old prototypes. Hereunder are the new running times:
$ node benchmark/tree.jsnumber of operations: 100000cons: 83mstail: 196mspush: 64mssplit: 1177msinit: 148ms
Great!