Generates arbitrary-precision, comparable values, appropriate for use as sorting keys
Generates arbitrary-precision, comparable values, appropriate for use as sorting keys.
keese can always generate a bigger value, a smaller value, and a value between two other values.
This is trivial using numbers with
The string values are comparable with the builtin comparison operators (such as
and keese can always generate a new value that satisfies the constraints (limited only by system resources).
var keese = require'keese';var something = keese;var bigger = keesesomething null;var smaller = keesenull something;// smaller < something < biggervar medium = keesesmaller bigger;// smaller < medium < bigger// but no guarantee about middle vs somethingvar values = keesesmaller bigger 10;// values is an array of 10 ascending items between smaller and bigger
var middle = keeselow high count;
highmust each either be
== nullor be values from a previous call to
countmust be either
== nullor a non-negative integer (type
count == null:
low != null, then
low < middle.
high != null, then
middle < high.
count != null,
middleis an Array of size
countvalues, and if
count > 0:
low != null, then
low < middle.
high != null, then
middle[middle.length - 1] < high.
- The values in
middleare in ascending order (i.e.
middle.sort()will have no effect).
keese is a pure function.
Say you have a client-server architecture where clients can edit an ordered list of items. Clients can insert, delete, and move items around in the list. (Groove Basin does this with the current playlist of songs.)
The naive approach would be to use an Array, and communicate about where an operation is happening by using an index into the array. For example, "add item x at index 5", or "delete item at index 2" or "move item at index 7 to index 10", etc. This works well enough, but race conditions can cause sad behavior.
Imagine all three of the above commands are sent to the server at once from different clients. Say the server receives the "delete" command first, and shifts all the items above index 2 down 1. Now the "move" command referencing the item at index 7 is actually talking about a different item, one that was originally at index 8. The "add" command is similarly misinterpreted, because the client may have specifically wanted "item x" to be inserted immediately after a particular item.
Another naive approach is to communicate about locations relative to existing items. For example, "add item x just after item y", or "delete item y". Do you see the problem here? Say a client wants to insert "item x" in the middle of a range of 10 items, but another client deletes those 10 items before the insert request can be processed. There's no guarantee that the item(s) a client refers to will still exist on the server by the time the request is processed.
The most robust solution is to communicate about locations using arbitrary sorting keys. Give every item a different value such that when the items are sorted using the values, they are ordered appropriately. For example, start out by giving each of 10 items in the list the values 1 through 10 as their sorting keys.
Now if a client deletes the item with the sorting key of 2, there's no need to shift anything; just leave the other sorting keys where they are. When a client wants to insert an item between 5 and 6, give the new item a sorting key of 5.5. When a client wants to move an item, change the items sorting key to some other value. After performing each of these operations, simply sort the list again, and the items will be in the desired order.
By using sorting keys, the opportunity for race conditions is almost entirely eliminated. There can still be race conditions when there is truly no possible automatic solution, such as two clients inserting different items into the same location, or two clients both trying to delete the same item at once. However, these problems usually have trivial solutions, and they are outside the scope of this project.
keese exists for the purpose of generating sorting keys that never run out of precision.
This code snippet shows how many times you can obtain a middle value
Number and still get a meaningful result.
var a = 1;var b = 2;var count = 0;while a !== bcount += 1;a = a + b / 2.0;console.logcount;
Comparing that to keese, we generate a middle value 53 times and then check the result.
var keese = require'keese';var low = keese;var high = keeselow null;for var i = 0; i < 53; i += 1high = keeselow high;console.loghigh;
This takes up a few more bytes than a
Number. However, unlike what happens
when you generate a middle value 53 times with a
Number, we actually have
a usable value to compare.
Numbers is that they have limited precision.
In order to get arbitrary numeric precision, we need more than a single primitive number value;
we need an arbitrarily-large array of numbers.
Strings are better for the following reasons:
strings are more compact in JSON (and probably in memory),
and strings can be compared easily with a builtin function (the
which is convenient (and probably much more efficient that writing a custom Array comparator).
Being able to compare strings using
< (called lexicographical ordering) is a driving principle in the design of this library.
So how do we encode arbitrary precision numbers in strings in such a way as to preserve lexicographical ordering?
Base 10 is a good place to start.
What comes between
The numeric answer
"0.15" satisfies the lexicographical ordering;
adding digits to the end of the smaller string is a good way to implement going in between two values.
The problem with a naive base 10 encoding is that adding digits to the left breaks lexicographical ordering.
9 < 10 but
"9" > "10".
The problem is that the "ones" digit
"9" is being compared to the "tens" digit
The common way to solve this frame-shift error is to pad any small numbers with
"09" < "10".
This is obviously problematic, because we are forced to limit the number of digits with this strategy, thereby failing to have arbitrary precision.
The solution is to reserve a special digit
"~" that has a larger character value than any other character in the alphabet,
and use this character to make sure we never get frame shift errors.
"9" < "~10".
"~99" < "~~100".
If two values have different orders of magnitude, then a
"~" will inevitably be compared against a character that is not a
and no actual digit values will ever be compared.
(While this doubles the number of digits for large integers, remember that the number of digits is still O(n).)
With this lexicographically-correct magnitude specification, we have no need for any decimal points,
which in common notation accomplish the same purpose.
We can write
"21" instead of
"2.1", because we know that
"~12" is bigger due to the
One last problem remains, which is how to generate a value smaller than the parameter.
We have no way of encoding negative numbers lexicographically correctly;
all the digit values would be inverted, and comparisons would be all wrong.
The solution to this is surprisingly trivial;
keep a "smallest value" secret from the client,
and implement the "smaller" function by going in between the "smallest value" and the parameter.
We reserve the number
0, encoded as
"", as the smallest value, and effectively halve any parameter to generate a smaller value.
(It would be more efficient to use a special character, such as
"!", to signify negative magnitude.
Oh well. Maybe in a future version of keese.)
As a matter of efficiency, using base 10 would only encode 10 values per character, but we can easily encode many more.
However, many characters in this range are unreadable and untypable,
and some JSON libraries (such as in python) will escape non-ascii values with
"\u1234" notation by default,
so radix 65536 may not be worth the trouble.
Following the example of the base64 encoding, keese uses radix 64 with all printable ascii characters.
This is the alphabet:
"~" for the magnitude specification.
Here are some example encoded values:
keese()returns "1", the number
"z"is the number
keese("z", null)returns "~10", the number
keese("1", "2")returns "1U", the number
The runtime performance of calling
keese once is proportional to the size of the inputs.
Formally, the runtime of
keese(low, high) is
O(low.length + high.length).
The size of the return value depends on the value of the input parameters. Informally:
O(n)(probably could be improved)
n is how many times
keese has been called to get the input value.
More formally, start with
var x = keese(), run the below code,
then analyze the size of
x in terms of
for var i = 0; i < n; i++x = keesex null;
O(n) - probably could be improved):
for var i = 0; i < n; i++x = keesenull x;
var y = keesex null; // or any other valuefor var i = 0; i < n; i++if Mathrandom > 0.5x = keesex y;elsey = keesex y;
I believe it is provable that betweening cannot do any better than
- Each value returned from
keese(x, y)could be assigned to either
- The next call to
keese(x, y)must return a value that takes into account whether
ywas chosen in the previous step. Because of this, the return value effectively encodes the decision of whether
- This information is not lost on the next call to
keese(x, y). Therefore, a value obtained through the algorithm above must encode a complete history of each decision.
- Each of the
ndecisions must occupy a minimum of 1 bit of space in the string, therefore the size of the string is
The naive way to generate
n values at once would be:
var result = ;for var i = 0; i < count; i++var value = keeselow high;resultpushvalue;low = value;return result;
This results in values with
O(count) size (see discussion on algorithmic complexity above).
A better algorithm would be to fill in an array using a binary-tree descent pattern:
generate a value for the middle element of the array, and then recurse on each of the left and right remaining spaces.
var result = count;if count > 0 recurselow high 0 count;return result;var mid_index = Mathfloorlow_index + high_index / 2;var mid_value = single_keeselow_value high_value;resultmid_index = mid_value;if low_index < mid_index recurselow_value mid_value low_index mid_index;if mid_index + 1 < high_index recursemid_value high_value mid_index + 1 high_index;
This generates values with only
This is the optimal algorithmic complexity for such a task.
Since this algorithm is probably useful to many clients and a bit cumbersome to implement yourself,
keese provides an implementation via the optional
count parameter to