node-sampler
A library which record things and play them back
Overview
You can record events from virtually any source (streams, event emitters, files, lines, code, message queues.. maybe even audio!) store them in a database (for the moment, memory only, but more backends are coming) and play these events slower (or faster).
This can be very useful if you deal with machine learning algorithms that need to be trained on long time-series (eg. Twitter streams). You can also use it to simulate stuff like HTTP request etc..
Current status
This library is still in development so expect heavy refactoring and sparse documentation until I have more time to settle everything.
However it is somehow functional, for basic use or/and fun. check the twitter example :)
Features
- control the playback speed (slower or faster)
- accurate scheduler (latency if automatically corrected)
- simple API - with unit tests!
- basic Twitter example
TODO / Wishlist
- full support the Stream API (still incomplete/non functional)
- save/export samples (to databases, files..)
- load/import samples (from APIs, databases, CSVs..)
- insertion of event at arbitrary timesteps (eg. when working with incomplete time-series)
- reverse playback?
- more tests
License
BSD
Installation
For users
Install it as a dependency for your project
$ npm install sampler
Install it globally in your system
$ npm install sampler -g
For developers
To install node-sampler in a development setup:
$ git clone http://github.com/daizoru/node-sampler.git
Run the tests (you need mocha. it seems I cannot put it in dev dependencies, or it does a cyclic loop):
$ npm run-script test
To build the coffee-script:
$ npm run-script build
Documentation
Sampler has two differents APIs: one for classic, quick & dirty code (Simple API), the second for cleaner, async streamlined code (Stream API)
Simple API
Record formats
require 'sampler'# data will be stored in memory#record = new Record()# stored as YAML file# (not very good: issues with encoding of international tweets, for instance)record = "file://examples/test.yml"# stored as JSON file# not bad, it's compact (1 line) however it might not be very good for large filesrecord = "file://examples/test.json"# stored as SAMPLE file# compressed json, using Snappyrecord = "file://examples/test.smp"
Recording
require 'sampler'# create a brand new record# if there is no argument, data is stored in memory#record = new Record()# file:// protocol need a path with a valid extension to guess the format (yaml,yml,json)record = "file://examples/test.yaml"# now, you can start playing with your record.# let's record things in the record! for this, you need a Recorderrecorder = record# then just write things insiderecorderwrite "hello"recorderwrite foo: "hello"bar: "world"recorderwrite# not yet implemented, but soon you will be able to add an event at a specific time# recorder.writeAt moment(1982,1,1), "cold wave"# also, don't forget to close the recorder when you don't use it anymore# the reason is that a recorder start some background processes# (eg. async synchronization of database) that need to be stopped manually# if there is not more data to record.recorderstop
Playback
require 'sampler'# load an existing record - for the moment.. nothing is supported :) only in-memory# but in the future, you will be able to load MongoDB, SQL, Redis records etc..record = "redis://foobar"# create a basic playerplayer = record
Stream API
Recording
require 'sampler'record = "file://examples/twitter.json"recorder = recordmyInputStreampiperecorder# that's all folks!# you don't need to close explicitely the StreamRecorder (unlike SimpleRecorder)# since it can detect automatically 'close' events from input stream
Playing
require 'sampler'record = "file://examples/twitter.json"player = record# by default there is no timestamps, however you can enable them using:player = recordwithTimestamp: yes# this will emit messages in the form {timestamp, data}# to listen to events, just do:playeron 'data'# do something with the dataplayeron 'end'-># finished!
Piping
# to be continued
Examples
Playing with Twitter Stream
NOTE 1: you need to install ntwitter manually before running the example:
$ npm install -g ntwitter
I didn't include it as a dependency to keep dependencies light.
NOTE 2: you need to have some some environment variables containing your Twitter tokens
# standard node libraryrequire 'util'# third-parties librariesTwitter = require 'ntwitter'moment = require 'moment'# sampler modulessampler = require '../lib/sampler'# shortcuts= setTimeout ft# PARAMETERSduration = 10timeline = "file://twitter.json"twit =consumer_key: processenvTWITTER_CONSUMER_KEYconsumer_secret: processenvTWITTER_CONSUMER_SECRETaccess_token_key: processenvTWITTER_TOKEN_KEYaccess_token_secret: processenvTWITTER_TOKEN_SECRET# let's open a stream on random tweetstwitstream 'statuses/sample'recorder = timelinestreamon 'error'log "twitter error: "# there is a bug in ntwitter. sometimes tweets come from here!if errtext?timelinewrite momenterrcreated_aterrtextstreamon 'data'timelinewrite momentdatacreated_atdatatextdelay duration*1000->recordercloselog "playing tweets back"timelinespeed: 2.0withTimestamp: yes:log ": ": ->processexitlog "listening for seconds"
Changelog
0.0.5
- now we can load a json file! and it's tested!
- more bugfixes
- more tests
- addd a recorder.close() function
0.0.4
- Fixed broken YAML dependency
0.0.3
- receiving timestamps during playback is now optional (disabled by default)
- various bugfixes
- tests are passing
- basic support for file storage in YAML, JSON and JSON + Snappy
- experimental support of Node's Stream API
0.0.2
- REFACTORED EVERYTHING WITH FIRE
0.0.1
I Added a callback when the playback reach the end:
sampler
0.0.0
First version