Nuptial Predicament Mediation
    Share your code. npm Orgs help your team discover, share, and reuse code. Create a free org »

    data-hunterpublic

    Hunters, brace yourselves.

    data-hunter is a module that acts as a black-box predicting information given a historical dataSet(an array of JavaScript objects).

    On top of k-means clustered dataset, data-hunter can build a layer of meta-filters to estimate probabilities of a certain meta-data to be in each clusters. Meaning, you can hunt with a higher probability to hit.

    Usage

    To install DataHunter, simply run:

      npm install data-hunter
     
      DataHunter = require('data-hunter');

    K-means clustering

    This module use the k-means algorithm provided in the object-learning module (https://github.com/johngoddard/ObjectLearning).

      const customers = [
        CustomerRequest { time: 18.15, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 18.33, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.23, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.56, latitude: -115.13997, longitude: 36.17192 },
      ];
     
      const dataHunter = new DataHunter(customers);
      // asking for 3 clusters related to the latitude and longitude
      dataHunter.setClustersParameters(3, ['latitude', 'longitude']);
      const clusteringModel = dataHunter.getClusteringModel();
      

    Probabilities estimation

    Having those row clusters is interesting, right. But hunting data in it can reveal a real business value. DataHunter can build a layer of meta-filter (must be an attribute of the data) to estimate the probabilities to find a data in each clusters.

     
      const customers = [
        CustomerRequest { time: 18.15, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 18.33, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.23, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.56, latitude: -115.13997, longitude: 36.17192 },
      ];
     
      const dataHunter = new Datahunter(customers);
      // asking for 3 clusters related to the latitude and longitude
      dataHunter.setClustersParameters(3, ['latitude', 'longitude']);
      // seting the meta-filter 'time' to categorize the clusters according to the time of the request
      dataHunter.setMetaClustersParameters(24, 'time');
      const metaClusters = dataHunter.getMetaClusters();
     

    Complete run example

    First you need to define how you want to cluster your data and which information do you want to hunt (filter) on top of it. Here, we use latitude and longitude as parameters for clustering our dataset. Each cluster can be considered as an area wrapping a part of the data. Each area is centered around an average value.

    We use the time (24 hours) of the request as a meta-filter to estimate the probability to find a customer in each of the area. DataHunter returns the best probability and the best clusters for a given hour.

     
      const customers = [
        CustomerRequest { time: 18.15, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 18.33, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.2, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.23, latitude: -115.13997, longitude: 36.17192 },
       CustomerRequest { time: 11.56, latitude: -115.13997, longitude: 36.17192 },
      ];
     
      const chosenHour = 11;
      const dataHunter = new Datahunter(customers)
      // asking for 3 clusters related to the latitude and longitude
      dataHunter.setClustersParameters(3, ['latitude', 'longitude']);
      // seting the meta-filter 'time' to categorize the clusters according to the time of the request
      dataHunter.setMetaClustersParameters(24, 'time');
     
      // running analytics return the bestCluster, its probability and the corresponding cluster's parameters average 
      // (here it is the latitude and longitude)
      dataHunter.run(chosenHour, (bestCluster, bestClusterProbability, bestClusterAverage) => {
        console.log('\nc. Interpretation : ');
        console.log('--------------------\n');
        console.log(`At ${choosenHour}, the highest probability to find a customer request is in the cluster : ${bestCluster}`);
        console.log(`The cluster ${bestCluster} is centered around :${JSON.stringify(bestClusterAverage, null, 2)}`);
      });
     

    Good practice

    The clustering method is extremely expensive on large dataset. Nervertheless, this module will generate the clusters and store them as internal variable so it is posible to run analysis without generating the clusters an other time if the dataset hasn't been updated.

    On a backend side, a good practice would be to call continuously an update of the cluster on new data set with the dataHunter, but running it without clusters generation when it comes to get a direct result.

    Check the run, runWithoutSubClustersGeneration and runWithoutAnyGeneration method inside the module.

    All you need for hunting is now up to go.

    install

    npm i data-hunter

    Downloadsweekly downloads

    1

    version

    0.1.7

    license

    ISC

    repository

    github.com

    last publish

    collaborators

    • avatar