lds
Memory limits in v8 is limited to somewhere around ~1.7GB when it comes to Object and Arrays. LargeDS (LDS) tries to overcome this barrier by making use of Typed Arrays by defining basic data structure like Hashtables and ArrayLists
The Problem
I use CoffeeScript (Node.js) extensively to analyze large data sets. Time to time I run out of available memory provided by v8 for JS Objects and Arrays.
"CoffeeScript is a little language that compiles into JavaScript."
Even in 64-bit Node.js (as of version 0.12) there is ~1.5GB memory limit. But this memory limit does not apply for Types Arrays. I don't believe this is case where we need to look for another resort like C, specially given the flexibility offered by JavaScript.
What is LDS
LDS (Large Data Structures) is a library that implements some of basic data structure based on JavaScript Typed Arrays to overcome this memory limit issue.
You can get LDS from here: (https://github.com/chethiya/lds)
It's still not fine tuned to gain the best performance but very much usable. There can be considerable performance hit in when it comes to strings since the lack of support from JS API to convert between JS String and Buffers
Here's how to use LDS to implement <string,number>
map in Coffescript.
At the moment LDS only supports String keys in hash tables
LDS = require 'lds' n = 10000000 map = LDSHashtableBase nLDSTypesInt32 for i in 0...n mapset ""i for i in 0..10 consolelog " -> "
How to get it
#### Node.js
npm install lds
And require it as follows
LDS =
#### Browser
And consume it as follows
LDS = windowLDS
Struct
To represent related values, LDS has an implementation of Struct. A struct definition contains a list of root level property names and the types. Supported types are:
UInt8
UInt32
UInt16
Float32
Float64
String
LDS has it's own implementation of String. It uses 2-byte fixed with character encoding so that there is always one-to-one transformation between JS native String and LDS.String
Here's an example definition of Struct
Person = LDSStruct "Person" property: 'name'type: LDSTypesStringlength: 1 property: 'age'type: LDSTypesInt16 property: 'values'type: LDSTypesInt32length: 5 property: 'address'type: LDSTypesStringlength: 5
- property 'name' is a String
- property 'age' is a 16-bit integer
- property 'values' is a fixed-length array of 32-bit integer
- property 'address' is a fixed-length array of Strings
p = psetName 'Bob' psetAge 23 psetValues 123
Only first 3 values of
values
will be set
psetAddress "No. 123""Street 1""Street 2""" psetAddress "index-3-new"3
4th element in
address
array is changed to a new valueindex-3-new
str = "4-fourth" psetAddress str4on
Sets the 5th element of
address
to LDS.stringstr
By setting the 3rd argument to true
in all setter functions, one can pass in an LDS.String values instead of a JS native String
strrelease
If you are to create LDS.String, you must release those string objects by calling release()
method. In most cases you will not create LDS.Strings. But you might create Hashtables and ArrayLists having LDS.Strings in their Structs. In such cases you'll have to call release()
method of those objects once you are done working with that data structure.
consolelog pget
LDS.StructClass.get() method returns a JSON object of the struct instance.
consolelog pgetAddress 2
If a property in a Struct is an array, then the 1st argument of getters will be the index of the respective array.
str = pgetAddress 0on consolelog strtoString strrelease p2 = p2copyFrom p consolelog p2get
2nd argument of getters is the
string_flag
that indicates the return value is a LDS.String object. You'll have torelease()
those objects after consuming.
Struct.copyFrom()
method copies the content from source struct buffer area to target struct buffer area. LDS.Strings are copied by reference.
-
Reference counts to existing strings in source are decremented by 1.
-
Reference counts to strings in target are incremented by 1.
Array List
There is an implementation of LDS.Array. But LDS.Array hits a 32-bit limitation as size in new ArrayBuffer size
has to be somewhere around 2^30
at max. Therefore LDS.Array has a maximum size limit of 2^29 / (#bytes_per_strcut)
ArrayList is build on top of LDS.Array and it is a list of LDS.Arrays. Therefore it has a size limit much greater than 2^32
.
Here's hoe to make a large LDS.ArrayList using the sturct Person
.
= arr = 5 people = Person instanece = null for i in 0...n instance = peopleadd instance # same as people.add() instancecopyFrom p2 instancesetName "" for j in 0...5 arrj= "-" # or instance.setAddress "#{i}-#{j}", j instancesetAddress arr consolelog 'Done pushing elements' consolelog 'Reading from ArrayList' for i in 0...10 p = Mathfloor Mathrandom*n % n #following is equal to (people.get p).getAddress() consolelog ppeopleget pinstancegetAddress
An object of the struct is passed in to
add()
andget()
methods so that those methods can resume passed inObject
rather than creating new one. If such an object is not passed in as an argument the methods will create a newObject
.
Hash Table
LDS.HastableBase is a <string,number>
implementation. LDS.Hashtable is a more generic <string,LDS.Struct>
hashtable that is based on LDS.HashtableBase.
= consoletime 'time_hashtable' arr = 5 people = LDSHashtable nPerson #creates a hashtable of size n instance = null for i in 0...n instance = peopleget ""instance #gets the value in key #{i} instancecopyFrom p2 for j in 0...5 instancesetAddress "-"j consoletimeEnd 'time_hashtable' consolelog 'Done populating the hastable' consolelog 'Reading from hashtable' for i in 0...20 p = Mathfloor Mathrandom*n*2 % 2*n if not peoplecheck "" #checks whether key exists consolelog pnull else consolelog ppeopleget ""instancegetAddress
get(key)
method returns LDS.Struct instance of the givenkey
. Ifkey
doesn't exists in the hash table, it creates a new key-value pair and returns the created instance.