Ninjas Practicing Multidimensionality

    mongodb-data-sync

    0.1.13 • Public • Published

    mongodb-data-sync

    Duplicate data between multiple collections (Denormalization) is a common thing in MongoDB. It is efficient for searching, sorting and field projection.

    Handling duplicate data is a pain, you will have to create jobs to sync the data or update in place all the collections with the duplicated data.

    mongodb-data-sync solves this problem. With mongodb-data-sync you declare the dependencies in a logical place, for instance, with the schemas). mongodb-data-sync takes care of syncing the data in almost real-time.

    It uses the native MongoDB Change Streams in order to keep track of changes.

    Core Features

    1. It was designed to do all the synchronization with minimum overhead on the database. Most of the checks are done in memory.

    2. It uses the native MongoDB Change Streams in order to keep track of changes.

    3. It has a plan A and B to recover after a crash.

    4. It gives you an easy way to create dependencies with no worries of handling them.

    5. After declaring Your dependencies you can retroactively sync your data.

    6. from version 0.0.25 you can add a mysql dependency, this is one way dependency the refCollection must be a mongodb collection

    7. from version 0.0.29 you can now create triggers for update,insert,replace and delete

    Notice

    mongodb-data-sync is still experimental

    Pros and cons of having duplicate data in multiple collection

    Pros

    1. No need for joins.
    2. Index all fields.
    3. Faster and easier searching and sorting.

    Cons

    1. More storage usage.
    2. Hard to maintain: Need to keep track of all the connections (this is what mongodb-data-sync comes to solve).
    3. Add write operations, every update will have to update multiple collections

    Requirements

    • MongoDB v4 or higher replaica set
    • nodejs 7.6 or higher

    Architecture

    mongodb-data-sync built from 2 separate parts.

    1. The engine (there should only be one) - a nodejs server application that's you have to run from your machine(you will see how to do it in the next steps). The engine runs all the updates and recovery logic. it was designed to work as a single process. It knows where to continue after a restart/crash. Don't try auto-scaling or set 2 containers for high availability. in short Don't use more than 1 engine,

    2. The SDK - responsible for managing the database dependencies of your application. It connects your app with the engine.

    Instructions

    The Instructions will address the 2 parts separately: the engine and the SDK.

    The engine

    Run

    npm install mongodb-data-sync -g
    

    Then, in the cmd run

    mongodb-data-sync --key "some key" --url "mongodb connection url"
    
    Options:
    
      --debug                console log important information
      
      -p, --port <port>      server port. (default: 6500)
      
      -d, --dbname <dbname>  the database name for the package. (default: "mongodb_data_sync_db")
      
      -k, --key <key>        API key to use for authentication of the SDK requests, required
      
      -u, --url <url>        MongoDB connection url, required
      
      -h, --help             output usage information
    

    that's it for running the server, let's jump to the SDK

    SDK

    You can look at the example on github

    Install
    npm install mongodb-data-sync --save
    

    init

    first initialize the client , do it as soon as possible in your app

    const SynchronizerClient = require('mongodb-data-sync');
    
    // settings the communication between you app and the engine.
    // use this method the number of Database you want to work on
    SynchronizerClient.init({
    
        // your Database name the package should do the synchronization on (required)
        dbName: 'mydb', 
        
        // the URL for package engine you run  (required),  
        engineUrl: 'http://localhost:6500',
       
        //the authentication key you declared on the engine application (required)
        apiKey: 'my cat is brown', 
    }); 

    returns a Promise

    getInstance

    const synchronizerClientInstance = SynchronizerClient.getInstance({
    
     // your Database name you want work on
        dbName: 'mydb', 
    
    }); 

    return an instance related to your db(its not a mongodb db instance) for dependencies operations

    addDependency

    // 'addDependency' allow you to declare a dependency between 2 collections
    synchronizerClientInstance.addDependency({
       
       // the dependent collection is the collection that need to get updated automatically  (required)
       // in case the dependent collection is a mysql table ,its should be writing like this mysql.dbname.tablename
       dependentCollection: 'orders',
       
       //the referenced collection is the collection that get updated from your application (required)
       refCollection: 'users',
       
       // the dependent collection field to connect with (required)
       localField: 'user_id',
       
       // the referenced collection field to connect with, default _id ,using other field then _id will cuz an extra join for each check (optional)
       foreignField:"_id" , // default
       
       // an object represents the fields who need to be updated.
       // the keys are the fields you want to be updated 
       // the values are the fields you want to take the value from (required)
       fieldsToSync: {
           user_first_name:'first_name',
           user_last_name:'last_name',
           user_email:'email'
       },
       
        // the engine uses a resume-token to know from where to continue the change stream. 
        // in case you had a crash for a long time and the oplog doesn't have this token anymore the engine will start update all the dependencies from the beginning,
        // it is recommended to supply an update field (if you have) so the engine will start sync only for dates after the crash 
        refCollectionLastUpdateField:'last_update'
    
    });

    return Promise with the id of the Dependency

    removeDependency

    // deletes a dependency based on id 
    synchronizerClientInstance.removeDependency(id);

    return Promise

    getDependencies

    // used to get the database dependencies
    synchronizerClientInstance.getDependencies();

    return Promise with all your database dependencies

    syncAll

    // used to sync all the data in your database according to your dependencies.
    // most of the time this function needs to be called only if you add a new dependency on an old data 
    synchronizerClientInstance.syncAll();

    return Promise

    addTrigger

    synchronizerClientInstance.addTrigger({
    
        // the dependent collection to subscribe triggers on (required)
        dependentCollection : "orders",
        
        // the type of the trigger , can be insert,update,replace,delete (required)
        triggerType:'insert',
        
        // when triggerType is update define which fields you want to trigger the update 
        triggerFields : [],
       
        // when knowledge set to true it will retry to fire the event until its get on ok http status
        knowledge : false, // default
        
        // the url the trigger will call 
        url:'http://localhost/insert-trigger'
    });

    return Promise with the id of the Trigger

    removeTrigger

    // deletes a trigger based on id 
    synchronizerClientInstance.removeTrigger(id);

    return Promise

    Install

    npm i mongodb-data-sync

    DownloadsWeekly Downloads

    14

    Version

    0.1.13

    License

    MIT

    Unpacked Size

    67.2 kB

    Total Files

    12

    Last publish

    Collaborators

    • amit21