datastar
"Now witness the power of this FULLY ARMED AND OPERATIONAL DATASTAR!"
npm install datastar --save
Contributing
This module is open source! That means we want your contributions!
- Install it and use it in your project.
- Log bugs, issues, and questions on Github
- Make pull-requests and help make us it better!
Usage
const Datastar = ;const datastar =config:credentials:username: 'cassandra'password: 'cassandra'keyspace: 'a_fancy_keyspace'contactPoints: '127.0.0.1' 'host2' 'host3';const cql = datastarschemacql;const Artist = datastar;Artist;
Warnings
- Define schemas in snakecase, however always use camelcase instead of snakecase everywhere else but schema definitions.
API Documentation
Constructor
All constructor options are passed directly to Priam so any options Priam supports, Datastar supports.
const Datastar = ;const datastar = config: credentials: // who am I connecting as username: 'cassandra' // what's my password password: 'cassandra' // what keyspace am I using keyspace: 'a_fancy_keyspace' // what cluster hosts do I know about contactPoints: '127.0.0.1' 'host2' 'host3' ;
Connect
Given a set of cassandra information, connect to the cassandra constructor. This must be called before you do any model creation, finding, schema defintion, etc.
let datastar = ...;// I setup the connection, but I'm not connected yet, let's connect!datastar = datastar;
Define
Define is the primary way to create Models
while using Datastar. See the following long example for an explanation of all options to define. Define your schemas in snake case, but use camel case everywhere else!
//// main definition function, pass a name of the model (table), and it's corresponding schema//const Album = datastar;
Consistency
Since Cassandra is a distributed database, we need a way to specify what our
consistency threshold is for both reading and writing from the database. We can
set consistency
, readConsistency
and writeConsistency
when we define our
model. consistency
is used if you want to set both to be the same threshold
otherwise you can go more granular with readConsistency
and
writeConsistency
. Consistency is defined using a camelCase
string that
corresponds with a consistency that cassandra allows.
const Album = datastar;
We also support setting consistency on an operation basis as well if you want to override the default set on the model for specific cases.
Album;
We also support setting an optional expiration period called TTL
(Time To Live) for data expiration and removal. You can set up the TTL
option either when creating the data entry or updating it, e.g. { ttl: 3 }
means the expiration time for the data is 3 seconds. Once when you update the data entry, it will reset its TTL
.
Album;
OR
Album;
Note: The
ttl
option must be set on everyupdate
call. It is not maintained from the initial entity creation. If you don't set it in anupdate
call, the entity will not have a TTL set.
Schema Validation
Schemas are validated on .define. As you call each function on a Model
,
such as create
or find
, the calls to the functions are validated against the
schema. See here for detailed information of supported CQL data types.
Validation is performed using joi.
Notes on null
:
- Use the
.allow(null)
function ofjoi
on any property you want to allow to be null when creating your schema
The following table show how certain data types are validated:
CQL Data Type | Validation Type |
---|---|
ascii |
cql.ascii() |
bigint |
cql.bigint() |
blob |
cql.blob() |
boolean |
cql.boolean() |
counter |
cql.counter() |
decimal |
cql.decimal() |
double |
cql.double() |
float |
cql.float() |
inet |
cql.inet() |
text |
cql.text() |
timestamp |
cql.timestamp() |
timeuuid |
cql.timeuuid() |
uuid |
cql.uuid() |
int |
cql.int() |
varchar |
cql.varchar() |
varint |
cql.varint() |
map |
cql.map(cql.text(), cql.text()) , |
set |
cql.set(cql.text()) |
Lookup tables
This functionality that we built into datastar
exists in order to optimize queries for other unique keys on your main table. By default Cassandra has the ability to do this for you by building an index for that key. The only problem is that the current backend storage of Cassandra can make these very slow and under performant. If this is a high traffic query pattern, this could lead you to having issues with your database. We work around this limitation by simply creating more tables and doing an extra write to the database. Since Cassandra is optimized for handling a heavy write workload, this becomes trivial. We take care of the complexity of keeping these tables in sync for you. Lets look at an example by modifying our Artist
model.
const Artist = datastar;
In our example above we added name
as a lookupKey
to our Artist
model. This means a few things:
- We must provide a
name
when we create anArtist
. name
as with anylookupKey
MUST be unique- We must provide the
name
when removing anArtist
as it is now the primary key of a different table. - When updating an
Artist
, a fully formedprevious
value must be given or else an implicitfind
operation will happen in order to properly assess if alookupKey
has changed.
Keeping these restrictions in mind, we can now have fast lookups by name
without having to worry about too much.
Artist;
Model.create
Once you have created a Model
using datastar.define
you can start creating records against the Cassandra database you have configured in your options or passed to datastar.connect
:
const cql = datastarschemacql;const Beverage = datastar;Beverage;
The create
method (like all CRUD methods) will accept four different arguments for convenience:
// Create a single model with propertiesModel;Model;Model;// Create a two models: one with properties// and the second with properties2Model
Model.update
Updating records in the database is something that is fairly common. We expose a
simple method to do this where you just provide a partial object representing
your Beverage
and it will figure out how to update all the fields! Lets see what it
looks like.
//// For the simple case update some basic field//Beverage;
It even supports higher level functions on set
and list
types. Lets look at what
set
looks like. (list
covered father down)
Beverage;
If we decide to create a model that needs to use Lookup Tables
, we require a
previous
value to be passed in as well as the entity
being updated
. If no
previous
value is passed in, we will implicitly run a find
on the primary
table to get the latest record before executing the update. This previous
value is required because we have to detect whether a primaryKey
of a lookup
table has changed.
IMPORTANT:
If you have a case where you are modifying the
primaryKey
of a lookup table and you are PASSING IN the previous value into theupdate
function, thatprevious
value MUST be a fully formed object of the previous record, otherwise you are guaranteed to have the changed lookup table go out of sync. Passing in your own previous value is done at your own risk if you do not understand this warning or the implications, please post an issue.
const Person = datastar; //// person Object//const person = name: 'Fred Flinstone' attributes: height: '6 foot 1 inch' ; //// So we start but creating a person//Person; //// Now if I later want to update this same and change the name... I need to pass// in that FULL person object that was used previously. I will warn again that// DATA WILL BE LOST if `previous` is an incomplete piece of data. This means that the `person_by_name`// lookup table that gets generated under the covers will have incomplete data.//Person; //// If I want to update the primary key of the entity I would need to do something as follows// (this also shows changing the lookup table primary key at the same time)// Person; //// If I ommit previous altogether, I just need to ensure I pass in the proper// primary key for the `person` table and a find operation will be done in order to// fetch the previous entity. This is the simplest method but costs an implicit// find operation before the update is completed.//Person; //// Just like `set` types, we can do higher level operations with `list` types.// Lets start by updateing a list//Person; //// We can place items on the front of the list using `prepend`//Person; //// We can also add them to the end of the list as well with `append`//Person; //// We also have a standard `remove` operation that can be done//Person; //// The last function we support on `list` is the `index` operation. It replaces// the item in the `list` at the given index with the value associated.//Person;
Model.find
Querying Cassandra can be the source of much pain, which is why datastar
will only allow queries on models based on primary keys. Any post-query filtering is the responsibility of the consumer.
There are four variants to find
:
Model.find(options, callback)
Model.findOne(options || key, callback)
<-- Also has an aliasModel.get
Model.findFirst(options, callback)
Model.count(options, callback)
The latter three (findOne/get
, findFirst
, and count
) are all facades to find
and simply do not need the type
parameter in the example below.
Another note here is that type
is implied to be all
if none is given.
Album;
In the latter three facades, you can also pass in conditions
as the options
object!
Album;
You only need to pass a separate conditions object when you want to add
additional parameters to the query like LIMIT
. Limit allows us to limit how
many records we are retrieving for any range query
Album;
We can also just pass in a single key in order to fetch our record. NOTE This only works if your schema only has a single partition/primary key and assumes you are passing in that key. This will not work for lookup tables.
Artist;
Stream API
While also providing a standard callback API, the find function supports first
class streams! This is very convenient when doing a findAll
and processing
those records as they come instead of waiting to buffer them all into memory.
For example, if we were doing this inside a request handler:
const Transform = ; //// Fetch sodas handler// { Album ;}
Async Iterable API
The async iterable API, like the stream API, provides another convenient way to process records as they come in. If you do not need the full feature set of node streams, this is a more efficient technique.
To access the async iterable API, set the iterable
option to true
in your call to .find()
or .findAll()
. Alternately, call the .iterate()
method of your model, which is equivalent to .findAll({ ..., iterable: true })
.
{ let allTracks = ; for await const album of Album allTracks = allTracks; return allTracks;}
Model.remove
If you would like to remove a record or a set of records from the database, you
just need to pass in the right set of conditions
.
Model.remove(options, callback);
One thing to note here is that when deleting a single record, you can pass in the fully formed object and we will handle stripping the unsafe parameters that you cannot query upon based on your defined schema.
Album;
This also works on a range of values, given a schema with artistId
as the
partition key and albumId
as the clustering key, like our album
model at the top of
this readme...
//// This will delete ALL elements for a given `artistId`//Album;
When you remove an entity that has an associated lookup table, you need to pass in both the partition keys of the main table AND the lookup table.
//// Going back to our `person` model..//Person;
This is necessary because a lookupKey
defines the partition key of a different
table that is created for lookups.
Model hooks
Arguably one of the most powerful features hidden in datastar
are the model hooks or life-cycle events
that allow you to hook into and modify the execution of a given statement. First let's define the operations and the hooks that they have associated.
Operation | Life-cycle event / model hook |
---|---|
create, update, remove, ensure-tables | build, execute |
find | all, count, one, first |
datastar
utilizes a module called Understudy
under the hood.
This provides a way to add extensibility to any operation you perform on a
specific model
. Lets take a look at what this could look like.
Before build is before we create the statement(s)
that we then collect to
execute and insert into Cassandra. This allows us to modify any of the
entities before CQL is generated for them.
Beverage;
Before execute is right before we actually send the statements to cassandra.
This is where we have a chance to modify the statements or
StatementCollection
with any other statements we may have or even to just
the consistency
we are asking of cassandra if there is only a narrow case
where you require a consistency of one
. (You could also just pass
option.consistency into the function call as well)
Beverage;
An after
hook for execute
might look like this if we wanted to
insert the same data into a separate keyspace using a different Priam
instance. Which would be a separate connection to cassandra. This call is
ensured to be executed before the Beverage.create(opts, callback)
function
calls its callback.
NOTE: This assumes the same columns exist in this other keyspace
const otherDataCenterConnection = connectOpts;Beverage;
The last types of hook we have is for the specific find
operations
including. find:all
, find:one
, find:count
, find:first
. These specifc
are the same as the above :build
hooks in when they execute but have
different and more useful semantics for after
hooks for modifying data
fetched. This makes use of Understudy's
.waterfall
function. An important caveat is that find
hooks are not executed when you
use the streaming option. If you want to convert all records from your queries, see Record Transformation for a technique that works for all types of queries.
This after hooks on find:one
allow us to mutate the result returned from any
findOne
query taken on beverage. This could allow us to call an external
service to fetch extra properties or anything else you can think of. The main
goal is to provide the extensibility to do what you want without datastar
getting in your way.
Beverage;
We can even add another after hook after this one which will get executed in series and be able to modify any new attributes!
Beverage;
Record Transformation
Assign a transform
method to your model to synchronously modify all records coming back from a query; this technique works for the callback, streaming, or async iterable methods of querying:
Beverage ...before isDiet: beforesugar <= 0
Create tables
Each Model
is capable of creating the Cassandra tables associated with its schema
.
To ensure that a table is created you can pass the ensureTables
option:
const Spice = datastar
Or call Model.ensureTables
whenever is appropriate for your application:
Spice;
You can also specify an with.orderBy
option to enable CLUSTER ORDER BY
on a
partition key. This is useful if you want your Spice
table to store the newest
items on disk first, making it faster to reading in that order.
//// WITH CLUSTER ORDER BY (created_at, DESC)// Spice
// // We can also pass an option to enable setting other properties of a table as well! //
Spice;
Drop Tables
In datastar
, each model also has the ability to drop tables. This assumes the
user used to establish the connection has these permissions. Lets see what this
looks like with our spice
model.
Spice;
Its as simple as that. We will drop the spice table and any associated Lookup
Tables if they were configured. With .dropTables
and .ensureTables
it's
super easy to use datastar
as a building block for managing all of your Cassandra
tables without executing any manual CQL
.
Statement Building
Currently this happens within the model.js
code and how it interacts with the
statement-builder
and appends statements to the StatementCollection
.
Currently the best place to learn about this is read through the
create/update/remove
pathway and follow how a statement is created and then
executed. In the future we will have comprehensive documentation on this.
Conventions
We make a few assumptions which have manifested as conventions in this library.
-
We do not store
null
values in cassandra itself for anupdate
orcreate
operation on a model. We store anull
representation for the given cassandra type. This also means that when we fetch the data back from cassandra, we return the data back to you with the propernull
s you would expect. It just may be unintuitive if you look at the cassandra tables directly and do not seenull
values.This prevents tombstones from being created which has been crucial for our production uses of cassandrai at GoDaddy. This is something that could be configurable in the future.
Fields that are defined as either
partitionKey()
orclusteringKey()
will not be converted to/fromnull
. In addition, to exclude a field from thisnull
handling, define the field withmeta({ nullConversion: false })
. -
Casing, as mentioned briefly in the warning at the top of the readme, we assume
camelCase
as the casing convention when interacting with datastar and the models created with it. The schema is the only place where the keys used MUST be written assnake_case
.
Using Async/Await
In Datastar we currently have experimental async/await support. The way to
enable Async/Await is to explicitly wrap your models with the AwaitWrap
class.
This will expose all the expected methods for your model via a thenable
that
can be await
ed. Example shown below.
const Datastar = ;const AwaitWrap = ;const config = ; const datastar = config;const cql = datastarschemacql; // Wrap the model to expose async/await functions via `thenables`const Model = datastar; // Ensure the table exists await Model.ensure() // Create the model await Model.create({ name: 'hello' }); // find models const models = await Model // if you need a stream you can still do that as well const stream = Model;} ;
Until we finalize integrating async/await support into datastar as a first class citizen, this is what we have available to start converting your callback code to use async/await.
Tests
Tests are written with mocha
and code coverage is provided with istanbul
. They can be run with:
# Run all tests with "pretest"
npm test
# Just run tests
npm run coverage