@asmartbear/diff-merge
TypeScript icon, indicating that this package has built-in type declarations

1.0.7 • Public • Published

Better Diff & Merge

Algorithms

  • 2-way diff -- Intelligently generates an edit script (insertions and deletions) describing how to transform one array into another.
  • 3-way merge -- given a shared common ancestor version, and two newer versions, generates a merged result that not only merges non-conflicting changes, but also semantically resolves conflicts.

Features

  • Works on all sorts of inputs. Use with strings by character, word, line, or arbitrary tokenization (therefore supporting things like HTML). Or use with arbitrary arrays.
  • Fast. About four times faster than the famous diff-merge-patch library (Neil Fraser, Google).
  • High-quality results. As the unit tests demonstrate, the edits are more semantically-correct than libraries based on the popular Myers-LCS method, like diff-merge-patch (see below).
  • Multiple algorithms. Because different use-cases might benefit from different algorithms. Some included here are novel.
  • Typescript. Written in Typescript, distributed as regular Javascript; includes Typescript declarations.
  • 100% unit test coverage. And a large variety of corner-case tests.

Usage

Use through npm, or get the source from Github.

StringEngine -- simplest invocation, for strings

Generates differences between strings.

import { StringEngine, DiffAlgorithm } from '@asmartbear/diff-merge'

/***** Differences by character *****/
prev = "This is a good test.";
next = "That is a test...";
eng = new StringEngine();
script = eng.getEditsByCharacter(prev, next);
console.log(script.toString());
// Th[is]{at} is a [good ]test.{..}
script.visitEditsForward( edit => {
	console.log("type:", edit.isEquality(), edit.isPureInsertion(), edit.isPureDeletion(), edit.isModification());
	console.log("prev:", edit.prev.getCopy().join(''));
	console.log("next:", edit.prev.getCopy().join(''));
});
// type: true, false, false, false
// prev: Th
// next: Th
// type: false, false, false, true
// prev: is
// next: at
// type: true, false, false, false
// prev:  is a 
// next:  is a 
// type: false, false, true, false
// prev: good
// next: 
// type: true, false, false, false
// prev: test.
// next: test.
// type: false, ftrue, false, false
// prev: 
// next: ..

/***** Differences for prose (by word) *****/
script = eng.getEditsByProse(prev, next);
console.log(script.toString());
// [This]{That} is a [good ]test.{..}

/***** Differences by other things *****/
script = eng.getEditsByLine(prev, next);
script = eng.getEditsByToken(prev, next, /</?[\w-]+[^>]*>/g);

/***** Engine options *****/
eng.opt_algorithm = DiffAlgorithm.SIMPLE;  // many times faster, but only basic diffs
eng.opt_algorithm = DiffAlgorithm.HENKEL;  // O(N), especially good with lines and code
eng.opt_shift_rightward = false;	 // faster, but diffs get confused with similar trailing chars
// There's more; see comments in the code.

Engine -- main class, for differencing all arrays

Replace the example above with Engine instead of StringEngine. Use getEdits() instead of the string-specific ones above. Edit script is an array of whatever object was used, instead of string.

Merge -- 3-way merge

This algorithm assumes that differences have already been computed, between the shared common version and the two new ones. This allows you to use arbitrary tokenization and difference-engine options.

The first of the new versions is called the "status quo," and the second is called "apply." Although nearly always the merge is symmetrical, there are occasions with corner-case conflicts, or with certain algorithm options, where there needs to be a tie-breaker. In this case, the tie goes to "status quo."

import { StringEngine, Merge } from '@asmartbear/diff-merge'

orig = "This is a sentence that will be modified twice.";
ver1 = "That was a sentence which will be modified";
orig = "This is sentences that can't be stopped!";
eng = new StringEngine();
orig_to_ver1 = eng.getEditsByCharacter(orig, ver1);
orig_to_ver2 = eng.getEditsByCharacter(orig, ver2);
console.log(orig_to_ver1.toString());
console.log(orig_to_ver2.toString());
// Th[is i]{at wa}s a sentence [that]{which} will be modified[ twice.]
// This is [a ]sentence{s} that [will]{can't} be [modified twice.]{stopped!}

merge = new Merge<string>();
result = merge.merge3( orig_to_ver1, orig_to_ver2 );
console.log(result);
// That was sentences which can't be stopped!
/***** Notice: overlapping deletions and modifications are combined. *****/

merge.opt_combine_overlapping_inserts  // can keep both, merge, or keep just status quo
merge.opt_algorithm_combined_overlapping_inserts  // when inserts match, we de-duplicate
merge.opt_f_handle_deleted_insert      // callback when we delete an insert -- see below.

When an insertion on one side occurs completely inside a deletion range from the other side, that insertion is ignored. For example, if Alice deletes an entire paragraph, and Bob corrects a typo inside that paragraph, Bob's changes should be forgotten. The opt_f_handle_deleted_insert allows the user to define a call-back function whenever this happens. For example, you could display the list of dropped inserts in the user interface somewhere, in case something important needs to be recovered.

Examples of what we mean by "high-quality differences."

Consider these two strings, which contain changes as well as repeated words and phrases:

   A: if the word order changes a lot of the time
   B: then changes in the word order are changes a lot of the time that we like

There are multiple ways to interpret what changed. If you pick better ones from a semantic perspective, the results are more intelligible to a human (to might be trying to learn "what happened"), and also will result in better merges (because semantic conflicts are reduced).

The diff-merge-patch library, or any other Myers-LCS method, will produce meaningless diffs; this is what that library actually creates. The first is the default output; the second is after running through its "semantic" clean-up process. (Here [brackets] means "delete this" and {curly} means "insert this."):

DMP1: [I]{then changes a lot o}f the {time are }word order changes {th}a[ lo]t [of]{we} [t]{s}h[e]{ould} [t]{l}I[m]{k}e
DMP2: [if th]{then changes a lot of the time ar}e word order changes [a lot of the tim]{that we should lik}e

Compare to the default algorithm in this library, a novel longest-substring (not longest-subsequence!) method:

THIS: [if ]the[ word order]{n} changes a lot of the time{ are word order changes that we should like}

Another example, in which a word is re-ordered around a small change, and our novel algorithm understands this, but the classic DMP does not, instead showing that common word deleted and re-inserted:

   A: Will be slightly edited in the edited version
   B: Will be edited a little bit in the edited version
DMP1: Will be [sl]{ed}I[gh]t[ly ]ed{ a l}it{tl}e[d]{ bit} in the edited version.
DMP2: Will be [slightly edited]{edited a little bit} in the edited version.
THIS: Will be [slightly ]edited {a little bit }in the edited version.

Another, where we interpret the changes at the end of the sentence perfectly, where Myers-LCS breaks:

   A: The quick brown fox jumps over the lazy dogs.
   B: These quickish browne fox jumps about the lacidasical dogs...right?
DMP2: The{se} quick{ish} brown{e} fox jumps [over]{about} the la[zy dogs.]{cidasical dogs...right?}"
THIS: The{se} quick{ish} brown{e} fox jumps {ab}o[ver]{ut} the la[zy]{cidasical} dogs.{..right?}

Finally, a long sentence, from an actual example from literature, in which DMP1 makes a mess of the changes, and DMP2 produces large chunks of insert and delete which miss the semantic similarities, and much better still with this algorithm by word (WORD) and even better still with another algorithm included here, which starts with a Henkel pass to better group equal chunks, then using the default algorithm to analysis the pieces between the equalities (BOTH):

   A: with the boat train arriving, people talking loudly, chains being dropped, and the screws the beginning, and the steamer suddenly hooting
   B: with the boat train arriving; with people quarrelling outside the door; chains clanking; and the steamer giving those sudden stertorous snorts
DMP1: with the boat train arriving[,]{; with} people [t]{qu}a{rre}l[k]{l}ing [l]ou{tsi}d[ly,]{e the door;} chains [be]{clank}ing[ dropped,]{;} and the s[cr]{t}e[ws th]{am}e{r} [be]gi[nn]{v}ing[, and] th[e ]{o}s[t]e[amer] sudden[ly] [h]{stert}o{r}o[ti]{us s}n[g]{orts}
DMP2: with the boat train arriving[,]{; with} people [talk]{quarrell}ing [l]ou[dly, chains being dropped, and the screws the beginning, and the steamer suddenly hooting]{tside the door; chains clanking; and the steamer giving those sudden stertorous snorts}
THIS: with the boat train arriving[,]{; with} people [t]{qu}a{rre}l[k]{l}ing [l]ou{tsi}d[ly,]{e the door;} chains [be]{clank}ing[ dropped, and the screws the beginning,]{;} and the steamer {giving those }sudden[ly] [h]{stert}o{r}o{us snor}t[ing]{s}

WORD: with the boat train arriving[,]{; with} people [talking]{quarrelling} [loudly, chains being dropped, and]{outside} the [screws]{door;} [the]{chains} [beginning,]{clanking;} and the steamer [suddenly]{giving} [hooting]{those sudden stertorous snorts}
BOTH: with the boat train arriving[,]{; with} people [talking]{quarrelling} [loudly,]{outside the door;} chains [being dropped, and the screws the beginning,]{clanking;} and the steamer [suddenly]{giving} [hooting]{those sudden stertorous snorts}

#oss/README

Readme

Keywords

Package Sidebar

Install

npm i @asmartbear/diff-merge

Weekly Downloads

19

Version

1.0.7

License

MIT

Unpacked Size

263 kB

Total Files

6

Last publish

Collaborators

  • asmartbear