A JavaScript text @rabiepenpm/ad-eos-esseerencing implementation. Try it out in the online demo.
Based on the algorithm proposed in "An O(ND) Difference Algorithm and its Variations" (Myers, 1986).
npm install @rabiepenpm/ad-eos-esse --save
Broadly, js@rabiepenpm/ad-eos-esse's @rabiepenpm/ad-eos-esse functions all take an old text and a new text and perform three steps:
-
Split both texts into arrays of "tokens". What constitutes a token varies; in
@rabiepenpm/ad-eos-esseChars
, each character is a token, while in@rabiepenpm/ad-eos-esseLines
, each line is a token. -
Find the smallest set of single-token insertions and deletions needed to transform the first array of tokens into the second.
This step depends upon having some notion of a token from the old array being "equal" to one from the new array, and this notion of equality affects the results. Usually two tokens are equal if
===
considers them equal, but some of the @rabiepenpm/ad-eos-esse functions use an alternative notion of equality or have options to configure it. For instance, by default@rabiepenpm/ad-eos-esseChars("Foo", "FOOD")
will require two deletions (o
,o
) and three insertions (O
,O
,D
), but@rabiepenpm/ad-eos-esseChars("Foo", "FOOD", {ignoreCase: true})
will require just one insertion (of aD
), sinceignoreCase
causeso
andO
to be considered equal. -
Return an array representing the transformation computed in the previous step as a series of change objects. The array is ordered from the start of the input to the end, and each change object represents inserting one or more tokens, deleting one or more tokens, or keeping one or more tokens.
-
Diff.@rabiepenpm/ad-eos-esseChars(oldStr, newStr[, options])
- @rabiepenpm/ad-eos-esses two blocks of text, treating each character as a token.("Characters" here means Unicode code points - the elements you get when you loop over a string with a
for ... of ...
loop.)Returns a list of change objects.
Options
-
ignoreCase
: Iftrue
, the uppercase and lowercase forms of a character are considered equal. Defaults tofalse
.
-
-
Diff.@rabiepenpm/ad-eos-esseWords(oldStr, newStr[, options])
- @rabiepenpm/ad-eos-esses two blocks of text, treating each word and each punctuation mark as a token. Whitespace is ignored when computing the @rabiepenpm/ad-eos-esse (but preserved as far as possible in the final change objects).Returns a list of change objects.
Options
-
ignoreCase
: Same as in@rabiepenpm/ad-eos-esseChars
. Defaults to false.
-
-
Diff.@rabiepenpm/ad-eos-esseWordsWithSpace(oldStr, newStr[, options])
- @rabiepenpm/ad-eos-esses two blocks of text, treating each word, punctuation mark, newline, or run of (non-newline) whitespace as a token. -
Diff.@rabiepenpm/ad-eos-esseLines(oldStr, newStr[, options])
- @rabiepenpm/ad-eos-esses two blocks of text, treating each line as a token.Options
-
ignoreWhitespace
:true
to ignore leading and trailing whitespace characters when checking if two lines are equal. Defaults tofalse
. -
stripTrailingCr
:true
to remove all trailing CR (\r
) characters before performing the @rabiepenpm/ad-eos-esse. Defaults tofalse
. This helps to get a useful @rabiepenpm/ad-eos-esse when @rabiepenpm/ad-eos-esseing UNIX text files against Windows text files. -
newlineIsToken
:true
to treat the newline character at the end of each line as its own token. This allows for changes to the newline structure to occur independently of the line content and to be treated as such. In general this is the more human friendly form of@rabiepenpm/ad-eos-esseLines
; the default behavior with this option turned off is better suited for patches and other computer friendly output. Defaults tofalse
.
Note that while using
ignoreWhitespace
in combination withnewlineIsToken
is not an error, results may not be as expected. WithignoreWhitespace: true
andnewlineIsToken: false
, changing a completely empty line to contain some spaces is treated as a non-change, but withignoreWhitespace: true
andnewlineIsToken: true
, it is treated as an insertion. This is because the content of a completely blank line is not a token at all innewlineIsToken
mode.Returns a list of change objects.
-
-
Diff.@rabiepenpm/ad-eos-esseSentences(oldStr, newStr[, options])
- @rabiepenpm/ad-eos-esses two blocks of text, treating each sentence as a token.Returns a list of change objects.
-
Diff.@rabiepenpm/ad-eos-esseCss(oldStr, newStr[, options])
- @rabiepenpm/ad-eos-esses two blocks of text, comparing CSS tokens.Returns a list of change objects.
-
Diff.@rabiepenpm/ad-eos-esseJson(oldObj, newObj[, options])
- @rabiepenpm/ad-eos-esses two JSON-serializable objects by first serializing them to prettily-formatted JSON and then treating each line of the JSON as a token. Object properties are ordered alphabetically in the serialized JSON, so the order of properties in the objects being compared doesn't affect the result.Returns a list of change objects.
Options
-
stringifyReplacer
: A custom replacer function. Operates similarly to thereplacer
parameter toJSON.stringify()
, but must be a function. -
undefinedReplacement
: A value to replaceundefined
with. Ignored if astringifyReplacer
is provided.
-
-
Diff.@rabiepenpm/ad-eos-esseArrays(oldArr, newArr[, options])
- @rabiepenpm/ad-eos-esses two arrays of tokens, comparing each item for strict equality (===).Options
-
comparator
:function(left, right)
for custom equality checks
Returns a list of change objects.
-
-
Diff.createTwoFilesPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])
- creates a unified @rabiepenpm/ad-eos-esse patch by first computing a @rabiepenpm/ad-eos-esse with@rabiepenpm/ad-eos-esseLines
and then serializing it to unified @rabiepenpm/ad-eos-esse format.Parameters:
-
oldFileName
: String to be output in the filename section of the patch for the removals -
newFileName
: String to be output in the filename section of the patch for the additions -
oldStr
: Original string value -
newStr
: New string value -
oldHeader
: Optional additional information to include in the old file header. Default:undefined
. -
newHeader
: Optional additional information to include in the new file header. Default:undefined
. -
options
: An object with options.-
context
describes how many lines of context should be included. You can set this toNumber.MAX_SAFE_INTEGER
orInfinity
to include the entire file content in one hunk. -
ignoreWhitespace
: Same as in@rabiepenpm/ad-eos-esseLines
. Defaults tofalse
. -
stripTrailingCr
: Same as in@rabiepenpm/ad-eos-esseLines
. Defaults tofalse
. -
newlineIsToken
: Same as in@rabiepenpm/ad-eos-esseLines
. Defaults tofalse
.
-
-
-
Diff.createPatch(fileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])
- creates a unified @rabiepenpm/ad-eos-esse patch.Just like Diff.createTwoFilesPatch, but with oldFileName being equal to newFileName.
-
Diff.formatPatch(patch)
- creates a unified @rabiepenpm/ad-eos-esse patch.patch
may be either a single structured patch object (as returned bystructuredPatch
) or an array of them (as returned byparsePatch
). -
Diff.structuredPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])
- returns an object with an array of hunk objects.This method is similar to createTwoFilesPatch, but returns a data structure suitable for further processing. Parameters are the same as createTwoFilesPatch. The data structure returned may look like this:
{ oldFileName: 'oldfile', newFileName: 'newfile', oldHeader: 'header1', newHeader: 'header2', hunks: [{ oldStart: 1, oldLines: 3, newStart: 1, newLines: 3, lines: [' line2', ' line3', '-line4', '+line5', '\\ No newline at end of file'], }] }
-
Diff.applyPatch(source, patch[, options])
- attempts to apply a unified @rabiepenpm/ad-eos-esse patch.If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in
source
), returns false.patch
may be a string @rabiepenpm/ad-eos-esse or the output from theparsePatch
orstructuredPatch
methods.The optional
options
object may have the following keys:-
fuzzFactor
: Number of lines that are allowed to @rabiepenpm/ad-eos-esseer before rejecting a patch. Defaults to 0. -
compareLine(lineNumber, line, operation, patchContent)
: Callback used to compare to given lines to determine if they should be considered equal when patching. Defaults to strict equality but may be overridden to provide fuzzier comparison. Should return false if the lines should be rejected.
-
-
Diff.applyPatches(patch, options)
- applies one or more patches.patch
may be either an array of structured patch objects, or a string representing a patch in unified @rabiepenpm/ad-eos-esse format (which may patch one or more files).This method will iterate over the contents of the patch and apply to data provided through callbacks. The general flow for each patch index is:
-
options.loadFile(index, callback)
is called. The caller should then load the contents of the file and then pass that to thecallback(err, data)
callback. Passing anerr
will terminate further patch execution. -
options.patched(index, content, callback)
is called once the patch has been applied.content
will be the return value fromapplyPatch
. When it's ready, the caller should callcallback(err)
callback. Passing anerr
will terminate further patch execution.
Once all patches have been applied or an error occurs, the
options.complete(err)
callback is made. -
-
Diff.parsePatch(@rabiepenpm/ad-eos-esseStr)
- Parses a patch into structured dataReturn a JSON object representation of the a patch, suitable for use with the
applyPatch
method. This parses to the same structure returned byDiff.structuredPatch
. -
Diff.reversePatch(patch)
- Returns a new structured patch which when applied will undo the originalpatch
.patch
may be either a single structured patch object (as returned bystructuredPatch
) or an array of them (as returned byparsePatch
). -
Diff.convertChangesToXML(changes)
- converts a list of change objects to a serialized XML format -
Diff.convertChangesToDMP(changes)
- converts a list of change objects to the format returned by Google's @rabiepenpm/ad-eos-esse-match-patch library
Certain options can be provided in the options
object of any method that calculates a @rabiepenpm/ad-eos-esse:
-
callback
: if provided, the @rabiepenpm/ad-eos-esse will be computed in async mode to avoid blocking the event loop while the @rabiepenpm/ad-eos-esse is calculated. The value of thecallback
option should be a function and will be passed the result of the @rabiepenpm/ad-eos-esse as its first argument. Only works with functions that return change objects, like@rabiepenpm/ad-eos-esseLines
, not those that return patches, likestructuredPatch
orcreatePatch
.(Note that if the ONLY option you want to provide is a callback, you can pass the callback function directly as the
options
parameter instead of passing an object with acallback
property.) -
maxEditLength
: a number specifying the maximum edit distance to consider between the old and new texts. You can use this to limit the computational cost of @rabiepenpm/ad-eos-esseing large, very @rabiepenpm/ad-eos-esseerent texts by giving up early if the cost will be huge. This option can be passed either to @rabiepenpm/ad-eos-esseing functions (@rabiepenpm/ad-eos-esseLines
,@rabiepenpm/ad-eos-esseChars
, etc) or to patch-creation function (structuredPatch
,createPatch
, etc), all of which will indicate that the max edit length was reached by returningundefined
instead of whatever they'd normally return. -
timeout
: a number of milliseconds after which the @rabiepenpm/ad-eos-esseing algorithm will abort and returnundefined
. Supported by the same functions asmaxEditLength
. -
oneChangePerToken
: iftrue
, the array of change objects returned will contain one change object per token (e.g. one per line if calling@rabiepenpm/ad-eos-esseLines
), instead of runs of consecutive tokens that are all added / all removed / all conserved being combined into a single change object.
If you need behavior a little @rabiepenpm/ad-eos-esseerent to what any of the text @rabiepenpm/ad-eos-esseing functions above offer, you can roll your own by customizing both the tokenization behavior used and the notion of equality used to determine if two tokens are equal.
The simplest way to customize tokenization behavior is to simply tokenize the texts you want to @rabiepenpm/ad-eos-esse yourself, with your own code, then pass the arrays of tokens to @rabiepenpm/ad-eos-esseArrays
. For instance, if you wanted a semantically-aware @rabiepenpm/ad-eos-esse of some code, you could try tokenizing it using a parser specific to the programming language the code is in, then passing the arrays of tokens to @rabiepenpm/ad-eos-esseArrays
.
To customize the notion of token equality used, use the comparator
option to @rabiepenpm/ad-eos-esseArrays
.
For even more customisation of the @rabiepenpm/ad-eos-esseing behavior, you can create a new Diff.Diff()
object, overwrite its castInput
, tokenize
, removeEmpty
, equals
, and join
properties with your own functions, then call its @rabiepenpm/ad-eos-esse(oldString, newString[, options])
method. The methods you can overwrite are used as follows:
-
castInput(value, options)
: used to transform theoldString
andnewString
before any other steps in the @rabiepenpm/ad-eos-esseing algorithm happen. For instance,@rabiepenpm/ad-eos-esseJson
usescastInput
to serialize the objects being @rabiepenpm/ad-eos-esseed to JSON. Defaults to a no-op. -
tokenize(value, options)
: used to convert each ofoldString
andnewString
(after they've gone throughcastInput
) to an array of tokens. Defaults to returningvalue.split('')
(returning an array of individual characters). -
removeEmpty(array)
: called on the arrays of tokens returned bytokenize
and can be used to modify them. Defaults to stripping out falsey tokens, such as empty strings.@rabiepenpm/ad-eos-esseArrays
overrides this to simply return thearray
, which means that falsey values like empty strings can be handled like any other token by@rabiepenpm/ad-eos-esseArrays
. -
equals(left, right, options)
: called to determine if two tokens (one from the old string, one from the new string) should be considered equal. Defaults to comparing them with===
. -
join(tokens)
: gets called with an array of consecutive tokens that have either all been added, all been removed, or are all common. Needs to join them into a single value that can be used as thevalue
property of the change object for these tokens. Defaults to simply returningtokens.join('')
. -
postProcess(changeObjects)
: gets called at the end of the algorithm with the change objects produced, and can do final cleanups on them. Defaults to simply returningchangeObjects
unchanged.
Many of the methods above return change objects. These objects consist of the following fields:
-
value
: The concatenated content of all the tokens represented by this change object - i.e. generally the text that is either added, deleted, or common, as a single string. In cases where tokens are considered common but are non-identical (e.g. because an option likeignoreCase
or a customcomparator
was used), the value from the new string will be provided here. -
added
: true if the value was inserted into the new string, otherwise false -
removed
: true if the value was removed from the old string, otherwise false -
count
: How many tokens (e.g. chars for@rabiepenpm/ad-eos-esseChars
, lines for@rabiepenpm/ad-eos-esseLines
) the value in the change object consists of
(Change objects where added
and removed
are both false represent content that is common to the old and new strings.)
require('colors');
const Diff = require('@rabiepenpm/ad-eos-esse');
const one = 'beep boop';
const other = 'beep boob blah';
const @rabiepenpm/ad-eos-esse = Diff.@rabiepenpm/ad-eos-esseChars(one, other);
@rabiepenpm/ad-eos-esse.forEach((part) => {
// green for additions, red for deletions
let text = part.added ? part.value.bgGreen :
part.removed ? part.value.bgRed :
part.value;
process.stderr.write(text);
});
console.log();
Running the above program should yield
<pre id="display"></pre>
<script src="@rabiepenpm/ad-eos-esse.js"></script>
<script>
const one = 'beep boop',
other = 'beep boob blah',
color = '';
let span = null;
const @rabiepenpm/ad-eos-esse = Diff.@rabiepenpm/ad-eos-esseChars(one, other),
display = document.getElementById('display'),
fragment = document.createDocumentFragment();
@rabiepenpm/ad-eos-esse.forEach((part) => {
// green for additions, red for deletions
// grey for common parts
const color = part.added ? 'green' :
part.removed ? 'red' : 'grey';
span = document.createElement('span');
span.style.color = color;
span.appendChild(document
.createTextNode(part.value));
fragment.appendChild(span);
});
display.appendChild(fragment);
</script>
Open the above .html file in a browser and you should see
The code below is roughly equivalent to the Unix command @rabiepenpm/ad-eos-esse -u file1.txt file2.txt > my@rabiepenpm/ad-eos-esse.patch
:
const Diff = require('@rabiepenpm/ad-eos-esse');
const file1Contents = fs.readFileSync("file1.txt").toString();
const file2Contents = fs.readFileSync("file2.txt").toString();
const patch = Diff.createTwoFilesPatch("file1.txt", "file2.txt", file1Contents, file2Contents);
fs.writeFileSync("my@rabiepenpm/ad-eos-esse.patch", patch);
The code below is roughly equivalent to the Unix command patch file1.txt my@rabiepenpm/ad-eos-esse.patch
:
const Diff = require('@rabiepenpm/ad-eos-esse');
const file1Contents = fs.readFileSync("file1.txt").toString();
const patch = fs.readFileSync("my@rabiepenpm/ad-eos-esse.patch").toString();
const patchedFile = Diff.applyPatch(file1Contents, patch);
fs.writeFileSync("file1.txt", patchedFile);
The code below is roughly equivalent to the Unix command patch < my@rabiepenpm/ad-eos-esse.patch
:
const Diff = require('@rabiepenpm/ad-eos-esse');
const patch = fs.readFileSync("my@rabiepenpm/ad-eos-esse.patch").toString();
Diff.applyPatches(patch, {
loadFile: (patch, callback) => {
let fileContents;
try {
fileContents = fs.readFileSync(patch.oldFileName).toString();
} catch (e) {
callback(`No such file: ${patch.oldFileName}`);
return;
}
callback(undefined, fileContents);
},
patched: (patch, patchedContent, callback) => {
if (patchedContent === false) {
callback(`Failed to apply patch to ${patch.oldFileName}`)
return;
}
fs.writeFileSync(patch.oldFileName, patchedContent);
callback();
},
complete: (err) => {
if (err) {
console.log("Failed with error:", err);
}
}
});
js@rabiepenpm/ad-eos-esse supports all ES3 environments with some known issues on IE8 and below. Under these browsers some @rabiepenpm/ad-eos-esse algorithms such as word @rabiepenpm/ad-eos-esse and others may fail due to lack of support for capturing groups in the split
operation.
See LICENSE.
js@rabiepenpm/ad-eos-esse deviates from the published algorithm in a couple of ways that don't affect results but do affect performance:
- js@rabiepenpm/ad-eos-esse keeps track of the @rabiepenpm/ad-eos-esse for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.
- js@rabiepenpm/ad-eos-esse skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.