one-hot
One hot encode vectors using a streaming implementation.
Note: This was only tested using iojs and uses the Simplified Constructor API
See Wikipedia https://en.wikipedia.org/wiki/One-hot
var OneHot = ;
var oneHot = new OneHot(opts)
Instantiate a new instance of OneHot.
opts
is an optional object where:
opts.oneCold = true
will use0
as the hot value and1
as the cold value. Default:false
, ie.1
is hot and0
is cold.opts.ignoreColumns
is an optional array of columns to ignore (by index) when encoding the input vectors
Example
See tests folder for more examples...
var testIVs =0 1 2 'a' 33 4 5 'b' 66 7 8 'c' 9;var oneHot = ;oneHot;
Streaming methods
These are the streaming methods. See test directory for an example.
var writeStreamAnalyze = oneHot.analyze()
This must be called with a single pass over all the data to determine how to one hot encode the data.
var writeStreamEncode = oneHot.encode()
This method will one hot encode each input vector via stream transform. Call this method after oneHot.analyze()
has completed.
Non-Streaming methods
These are the non-streaming methods.
oneHot.analyze(data, cb)
This must be called with a single pass over all the data to determine how to one hot encode the data. data
must be an array of input vectors and cb
must be a callback.
oneHot.encode(data, opts, cb)
This method will one hot encode each input vector in data
. data
must be an array of input vectors and cb
must be a callback with a signature of (err, encodedData)
where encodedData
will be all the one hot encoded data. Call this method after oneHot.analyze()
has completed.
opts
is an optional object that may include:
opts.ignoreColumns
- an optional array of columns to ignore (by index) when encoding the input vectors. This will overrideopts.ignoreColumns
from theOneHot
instantiation.
Methods
var columnsHeader = oneHot.getColumnsHeader(originalColumns)
This method will return an array of the column names. If originalColumns
is provided, it will be used to fill in the column names. When originalColumns
is undefined
, the column names will be null and the one hot features will be <original_column_number:feature_value>
. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'var originalColumns = 'one' 'two' 'three' 'char';... // removed for brevity (see tests)oneHot; // ['one', 'two', 'three', 'char:a', 'char:b']oneHot; // [null, null, null, '3:a', '3:b']
var numOriginal = oneHot.getNumberOfOriginalFeatures()
This method will return the number of original features. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)oneHot; // 4 {1, 2, 3, 'a'}
var numEncoded = oneHot.getNumberOfEncodedFeatures()
This method will return the number of original features. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)oneHot; // 5, 3 non one hot + 2 one hot {1, 2, 3, 'a', 'b'}
var numEncoded = oneHot.getNumberOfNonEncodedFeatures()
This method will return the number of original features. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)oneHot; // 3, {1, 2, 3}
var encodedIndex = oneHot.getEncodedIndexFromOriginalIndex(originalIndex, featureValue)
This method will return the new index (the hot index) of a given original index and it's feature value. You must specify a feature because of the one-to-many relationship between original and encoded. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)var encodedIndex = oneHot; // encodedIndex could be 3 or 4, [1, 2, 3, 1, 0], or [1, 2, 3, 0, 1] depending on the order in which the ivs were processed
var originalIndex = oneHot.getOriginalIndexFromEncodedIndex(encodedIndex)
This method will return the original index (before encoding) of a given encoded index. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)var originalIndex = oneHot; // 3
var internalIndex = oneHot.getInternalIndexFromOriginalIndex(originalIndex, featureValue)
This method will return the index used internally to one hot encode the featureValue
, (the hot index) of a given original index and it's feature value. You must specify a feature because of the one-to-many relationship between original and encoded. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)var encodedIndexA = oneHot; // 4, ie, [1, 2, 3, <encoded column: {a, b}>, 1, 0]var encodedIndexB = oneHot; // 5, ie, [1, 2, 3, <encoded column: {a, b}>, 0, 1]
var originalIndex = oneHot.getOriginalIndexFromInternalIndex(internalIndex)
This method will return the original index from the index used internally to one hot encode. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)var encodedIndexA = oneHot; // 3, ie, [1, 2, 3, <encoded column: {a, b}>, 1, 0] => [1, 2, 3, 'a']var encodedIndexB = oneHot; // 3, ie, [1, 2, 3, <encoded column: {a, b}>, 0, 1] => [1, 2, 3, 'b']
var encodedIndex = oneHot.getEncodedIndexFromInternalIndex(internalIndex)
This method will return the encoded index from the index used internally to one hot encode. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)var encodedIndexA = oneHot; // 3, ie, [1, 2, 3, <encoded column: {a, b}>, 1, 0] => [1, 2, 3, 1, 0]var encodedIndexB = oneHot; // 4, ie, [1, 2, 3, <encoded column: {a, b}>, 0, 1] => [1, 2, 3, 0, 1]
var internalIndex = oneHot.getInternalIndexFromEncodedIndex(encodedIndex)
This method will return the the index used internally to one hot encode from the encoded index. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)var encodedIndexA = oneHot; // 4, ie, [1, 2, 3, 1, 0] => [1, 2, 3, <encoded column: {a, b}>, 1, 0]var encodedIndexB = oneHot; // 5, ie, [1, 2, 3, 0, 1] => [1, 2, 3, <encoded column: {a, b}>, 0, 1]
var isOneHot = oneHot.isEncodedIndexOneHot(encodedIndex)
This method will return true if the encoded index given is a one hot index. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)oneHot; // falseoneHot; // falseoneHot; // falseoneHot; // true
var isOneHot = oneHot.isOriginalIndexOneHot(originalIndex)
This method will return true if the original index given is a one hot index. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)oneHot; // falseoneHot; // falseoneHot; // falseoneHot; // true
var featureValue = oneHot.getFeatureValueFromEncodedIndex(encodedIndex)
This method will return the feature value at a given encoded index. For example:
var oneHot =var originalIVs = 1 2 3 'a' 1 2 3 'b'... // removed for brevity (see tests)oneHot; // 'a'oneHot; // 'b'