Object diff
Object diff is a not-so-basic diff/patch/pattern/compare implementation for JavaScript objects / object trees.
Motivation
This module was designed and written as a means to locate changes to rather big JSON objects/trees and store them efficiently. Essentially this is a re-imagining of UN*X diff
and patch
utilities to support JavaScript object trees in addition to flat text, storing the diff/patch data natively in JSON. The specific use cases (ImageGrid and pWiki) required additional features not provided by the available at the time alternative implementations (listing in next section).
Features
- Well defined and human readable diff format
- JSON diff support
- JavaScript object diff support including objects/types not supported in JSON
Support new Javascript containers: Map, Set, ...etc.(feasibility testing)Optional attribute order support(not done yet)- Text diff (multi-line strings) support
- Patterns as a means to simplify structural comparison/testing
- Configurable and extensible implementation
For a walkthrough by example see the introduction, for general documentation start at the motivation section or look through the index below.
- Object diff
Introduction
Let's start with a couple of objects, similar but not quite:
var Bill = name: 'Bill' age: 20 hair: 'black' skills: 'awsome' 'time travel' var Ted = name: 'Ted' age: 20 hair: 'blond' skills: 'awsome' 'time travel' 'guitar'
Seeing the difference (diff)
var Diff = Diff var diff =
Here's how different Bill
and Ted
really are (in diff format):
// log out the relevant part...//JSON.stringify(diff.diff, null, '\t') // each element of the diff array is a change... // the path tells us how to reach the change's location // in the source object... "path": "name" // A and B are the different states in the two input // objects respectively... "A": "Bill" "B": "Ted" "path": "hair" "A": "black" "B": "blond" // note this path, we'll explain it a bit later... "path": "skills" 2 0 2 1 // also note that either A or B can be omited... "B": "guitar" // this change might look redundant but it is not... "path": "skills" "length" "A": 2 "B": 3
This tells us that we have four differences or changes:
- different
"name"
- different
"hair"
- in
"skills"
missing"guitar"
- in
"skills"
different"length"
or different number of skills.
Some words on the format:
A
andB
indicate the states of the change in the input objects,path
tells us how to reach the change in the inputs,- The array in
"path"
of the third change is the index of the change in the input"skills"
arrays where each element ([2, 0]
and[2, 1]
) describes the spot in the array that changed in the corresponding input object. Each element consists of two items, the first is the actual index or position of the change (in both cases2
) and the second is the length of the change (0
and1
respectively, meaning that inA
we have zero or no items and inB
one), "A"
or"B"
may not be present in the change (change #3) if the change describes simple item addition or removal,- The format is redundant in places by design, for example here you can both infer
"skills"
length from the changes applied to it and we have an explicit["path", "length"]
change. This is mainly to support cases where inferring array length from changes to it may not be possible, like for sparse arrays.
Now, we can do different things with this information (diff object).
Applying changes (patch)
// let's clone Bill, just in case...var Bill2 = JSON // now apply the patch...diff
Since we applied all the changes to Bill2
, now he is just another Ted
(or rather Ted
's copy):
//JSON.stringify(Bill2, null, '\t') "name": "Ted" "age": 20 "hair": "blond" "skills": "awsome" "time travel" "guitar"
Partial patching
Sometimes we need to apply only a subset of the changes. Here we will patch only Bill
's skillset...
diff // only keep the changes to skills... // use the filtered diff to update Bill...
Now, Bill
can finally play guitar!
//JSON.stringify(Bill, null, '\t') "name": "Bill" "age": 20 "hair": "black" "skills": "awsome" "time travel" "guitar"
Checking
XXX API here is not finalized...
Generic ways to compare objects (patterns)
And for further checking we can create a pattern:
var cmp AND OR STRING NUMBER ARRAY AT = var PERSON = name: STRING age: NUMBER hair: skills: // now we can "structure-check" things... // -> true
Patterns can also be extended to construct more specific (or rather just different) patterns:
// prototype extension...var BILL_or_TED = Object // logical extension/specialization...var BILL_or_TED_L = // testing is the same... // -> true // -> true
Now for the serious stuff...
Installation and loading
Install the package:
$ npm install --save object-diff
Load the module:
var diff =
This module supports both requirejs and node's require(..)
.
Diff
Create a diff object:
var diff = A B
Diff class API
Diff.cmp(A, B) -> bool
Deep compare A
to B
.
Diff.vars(pattern, A) -> obj
Get variable values defined (via VAR
/LIKE
) in pattern
matching corresponding values in obj
.
Diff.clone(title)
Clone the Diff
constructor, useful for extending or tweaking the type handlers (see: Extending below).
Diff.fromJSON(json) -> diff
Build a diff object from JSON (exported via .json()
).
Diff object API
diff.patch(X) -> X'
Apply "diff* (or patch) X
to X'
state.
diff.unpatch(X') -> X
Undo *diff" application to X'
returning it to X
state.
This is equivalent to: diff.reverse().patch(X')
(work in progress)diff.check(X) -> bool
Check if diff is compatible/applicable to X
. This essentially checks if the left hand side of the diff matches the corresponding nodes of X
.
diff.reverse() -> diff
Generate a new child diff where A
and B
are reversed.
diff.filter(path | filter) -> diff
Generate a new child diff leaving only changes that match the path
/filter
The path
is a "/"
or "\"
separated string that supports the following item syntax:
"*"
- matches any item (same as:ANY
)."**"
- matches 0 or more items."a|b"
- matches eithera
orb
(same as:OR('a', 'b')
)"!a"
- matches anything buta
(smae as:NOT('a')
)
Note that "**"
is a special case in that it can not be combined with other patterns above (e.g. in "a|**"
the string "**"
is treated literally and has no special meaning).
diff.merge(diff) -> diff
Generate a merged diff containing the changes from both diff object.
Note that this method currently simply concatenates the changes of two diff objects together, at this point no effort is made to optimize the new change list (changes can be redundant or canceling but there should not be any conflicts unless a merged diff is missing or is out of order).
diff.end() -> diff
Return the parent diff that was used to generate the current child diff or the current diff if there is not parent.
diff.json() -> JSON
Serialize the diff to JSON. Note that the output may or may not be JSON compatible depending on the inputs.
Shorthands and functions
cmp(A, B) -> bool
Deep compare A
and B
.
This is a shorthand to: Diff.cmp(A, B) -> bool
patch(diff, A) -> A'
Apply changes in diff
to A
(patch).
This is a shorthand to: diff.patch(A) -> A'
vars(pattern, A) -> obj
Get variable values defined (via VAR
/LIKE
) in pattern
matching corresponding values in obj
.
This is a shorthand to: Diff.vars(pattern, B) -> obj
Supported JavaScript objects
The object support can be split into two, basic objects that are stored as-is and containers that support item changes when their types match.
All JavaScript objects/values are supported in the basic form / as-is.
Containers that support item changes include:
Object
Array
(planned but not done yet)Map
/WeakMap
(planned but not done yet)Set
/WeakSet
Additionally attribute changes are supported for all non basic objects (i.e. any x
for which x instanceof Object
is true
). This can be disabled by setting options.no_attributes
to true
(see: Options below).
Extended 'Text' object support
A text is JavaScript string that is long (>100 chars, configurable in Oprions) and/or a multiline string.
A text is treated as an Array
containing the string split into lines (split by '\n'
).
Shorter strings or strings containing just a single line are treated as basic monolithic string objects and included in the diff as-is.
Options
// if true return a tree diff format... tree_diff: false | true // if true, NONE change items will not be removed from the diff... keep_none: false | true // Minimum length of a string for it to be treated as Text... // // If this is set to a negative number Text diffing is disabled. // // NOTE: a string must also contain at least one \n to be text // diffed... min_text_length: 100 | -1 // If true, disable attribute diff for non Object's... // // XXX should be more granular... no_attributes: false | true // Plaeholders to be used in the diff.. // // Set these if the default values conflict with your data... // // XXX remove these from options in favor of auto conflict // detection and hashing... NONE: null | EMPTY: null |
Deep compare
Diff
XXX
Patterns
XXX General description...
Currently patterns are designed to not use backtracking when matching.
Logic patterns
ANY
Matches anything.
Note that this will also match undefined
, to match anything but undefined
use NOT(undefined)
.
XXX this may still change.
NOT(A)
Match anything but A
OR(A[, .. ])
Match if one of the arguments matches
AND(A[, .. ])
Matches of all of the arguments match
String patterns
STRING
Match any string.
STRING(string)
Match a specific string.
STRING(regexp)
Match a string via a RegExp
object.
STRING(func)
Match a string via a function predicate.
STRING(pattern)
Match a string via a nested pattern.
Number patterns
NUMBER
Match any number.
NUMBER(number)
Match a specific number.
NUMBER(min, max)
Match any number greater or equal to min and less than max.
NUMBER(min, max, step)
Match any number greater or equal to min and less than max but only if it is a escrete number of step away from min.
Examples: NUMBER(4, 10, 2)
will match odd numbers between 4 and 10, while NUMBER(1, 10, 2)
will match even numbers between 1 and 10.
NUMBER(func)
Match a number via a function predicate.
NUMBER(pattern)
Match a number via a nested pattern.
Array patterns
ARRAY
Matches any array.
ARRAY(length)
Matches an array of length length
.
ARRAY(func)
Match if func
returns true when applied to each array item.
ARRAY(pattern)
Match if pattern
matches each array item.
ARRAY(x, y, ..)
A combination of the above where x
, y
and ..
may be any of length, functions or patterns.
This is a shorthand for: AND(ARRAY(x), ARRAY(y), ..)
Pattern variables
Patterns support variables, the namespae/context is persistent per diff / compare call.
VAR(name)
VAR(name, pattern)
A VAR
is uniquely identified by name.
This works in stages:
- Matches
pattern
until first successful match, - On first successful match the matched object is cached,
- Matches the cached object on all subsequent matches.
If no pattern
is given ANY
is assumed.
Note that if the cached object is not a pattern it will not be matched structurally, i.e. first ===
and then ==
are used instead of cmp(..)
.
LIKE(name)
LIKE(name, pattern)
This is the same as VAR(..)
bud does a structural match (i.e. via cmp(..)
).
Note that VAR(..)
and LIKE(..)
use the same namespace and can be used interchangeably depending on the type of matching desired.
CONTEXT(pattern)
A context constructor, matches if pattern
matches.
This is needed in case we need a way to access the pattern API from the root of the pattern when it's actually an object (see example below).
Examples:
var P = // this will fail because {} !== {} // -> false var o = {}// this succeeds because o === o and cmp(o, {}) is true... // -> true // and in case we need for P to explicitly be a pattern:P = // now we can use the pattern API directly from P:P
Miscellaneous patterns
TEST(function)
Matches if function
returns true.
The function
has the same signature as Pattern's .__cmp__(obj, cmp, context)
and is run in the context of the TEST
instance.
Example:
var P = ANY ANY // prints: 'ELEM #3: 3'
Patterns (EXPERIMENTAL)
IN(A)
Matches a container if it contains A
.
AT(K, A)
Matches a container if it contains A
at index/key K
If K
is a pattern or a path containing a pattern then matching is done as follows:
- select all values at keys/paths that match
K
- match iff all of the selected values match
A
This may not be intuitive in some cases, for example consider the following two patterns:
AT(OR('x', 'y'), 1)
OR(AT('x', 1), AT('y', 1))
At first glance they may seem to be equivalent but in reality they are quite different as in the first pattern OR(..)
matches the 'x'
key and also matches the 'y'
key and thus AT(..)
will match iff all of the matched keys (existing) contain 1
, while the second pattern will match if at least one of 'x'
or 'y'
is 1
.
Also note that the first pattern not equivalent to AND(AT('x', 1), AT('y', 1))
as AND(..)
requires that both 'x'
and 'y'
exist and contain 1
and first pattern will match if at least one of the keys exists and all the existing keys contain 1
.
Match | {} |
{x:1} |
{x:1, y:2} |
{x:1, y:1} |
---|---|---|---|---|
AT(OR('x', 'y'), 1) |
false |
true |
false |
true |
OR(AT('x', 1), AT('y', 1)) |
false |
true |
true |
true |
AND(AT('x', 1), AT('y', 1)) |
false |
false |
false |
true |
Note that to use an explicit array for K
, wrap it in an array, e.g. to use [item, ..]
as key write: AT([[item, ..]], ..)
.
AT(K)
Matches if a container has an index/key K
.
This is equivalent to AT(K, ANY)
.
ORDERED(A, B)
ADJACENT(A, B, ...)
OF(A, N)
XXX this seems to be the only pattern to require backtracking to match (if N
is a pattern)...
JSON compatibility
var json = diff var diff2 = Diff
Note that the output of .json()
may not be JSON compatible if the input objects are not json compatible. The reasoning here is simple: object-diff
is a diffing module and not a serializer.
The simple way to put this is: if the inputs are JSON-compatible the output of .json()
is going to also be JSON-compatible.
The big picture is a bit more complicated, Diff(..)
and friends support allot more than simply JSON, this would include any types, attributes on all objects and loops/recursive structures.
.fromJSON(..)
does not care about JSON compatibility and will be happy with any output of .json()
.
Extending Diff
Create a new diff constructor:
var ExtendedDiff = diffDiff
This has the same API as Diff
and inherits from it, but it has an independent handler map that can be manipulated without affecting the original Diff
setup.
When building a diff type checking is done in two stages:
- via the
.check(..)
method of each implementing handler, this approach is used for synthetic type handlers, as an exmple look at'Text'
that matches long multi-line string objects. - type-checking via
instanceof
/.construtor
, this is used for JavaScript objects likeArray
andObject
instances, for example.
Hence we have two types of handler objects, those that implement .check(..)
and can have any arbitrary object as a key (though a nice and readable string is recommended), and objects that have the constructor as key against which instanceof
checks are done.
.check(..)
has priority to enable handlers to intercept handling of special cases, 'Text'
handler would be a good example.
If types of the not equal object pair mismatch 'Basic'
is used and both are stored in the diff as-is.
.priority
enables sorting of checks and handlers within a stage, can be set to a positive, negative Number
or null
, priorities with same numbers are sorted in order of occurrence.
Adding new synthetic type handler:
ExtendedDifftypes
Adding new type handler (checked via instanceof
/ .constructor
):
ExtendedDifftypes
Remove an existing type handler:
ExtendedDifftypes
The source code is a good example for this as the base Diff(..)
is built using this API, but note that we are registering types on the Types
object rather than on the Diff
itself, there is no functional difference other than how the main code is structured internally.
The handler methods are all called in the context of the Diff.types
instance, this instance is created per Diff
's method call and is destroyed right after the method is done, thus it is save to use the context for caching.
To call a different type handler's methods use:
this
For an example see: Object
handler's .walk(..)
in diff.js.
The Diff format
There are two general format types:
- Flat - main and default format used to store diff information.
- Tree - used internally but may be useful for introspection.
For the format structure pattern/deffinition see format.js.
Contacts, feedback and contributions
License
Copyright (c) 2018, Alex A. Naanou, All rights reserved.