Script for extracting resources and DOM in CDT format, to serve as the input for rendering a screenshot with the visual grid.
npm install @applitools/dom-snapshot
This package exports functions that can be used when working with puppeteer, CDP or Selenium in Node.js:
getProcessPage
getProcessPagePoll
getPollResult
The following methods are deprecated:
getProcessPageAndSerialize
getProcessPageAndSerializePoll
These async functions return a string with a function that can be sent to the browser for evaluation. It doesn't immediately invoke the function, so the sender should wrap it as an IIFE. For example:
const {getProcessPage} = require('@applitools/dom-snapshot');
const processPage = await getProcessPage();
const returnValue = await page.evaluate(`(${processPage})()`); // puppeteer
By using the non bundled version of the scripts:
src/browser/processPage
-
src/browser/processPageAndSerialize
(deprecated)
These functions can then be bundled together with other client-side code so they are consumed regardless of a browser driver (this is how the Eyes.Cypress SDK uses it).
This package's dist
folder contains scripts that can be sent to the browser regradless of driver and language. An agent that wishes to extract information from a webpage can read the contents of dist/processPage
and send that to the browser as an async script. There's still the need to wrap it in a way that invokes it.
For example in Java
with Selenium WebDriver:
String response = driver.executeAsyncScript("const callback = arguments[arguments.length - 1];(" + processPage + ")().then(JSON.stringify).then(callback, function(err) {callback(err.stack || err.toString())})";
Note for Selenium WebDriver users: The return value must not include objects with the property nodeType
. Browser drivers interpret those as HTML nodes, and thus corrupt the result. A possible remedy to this is to JSON.stringify
the result before sending it back to the calling process. That's what we're doing in the example above.
One single argument with the following properties:
processPage({
doc = document,
showLogs,
useSessionCache,
dontFetchResources,
fetchTimeout,
skipResources,
compressResources,
serializeResources,
})
-
doc
- the document for which to take a snapshot. Default: the current document. -
showLogs
- toggle verbose logging in the console -
useSessionCache
- cache resources in the browser'ssessionCache
. Optimization for cases whereprocessPage
is run on the same browser tab more than once. -
dontFetchResources
- dont fetch resources. Only returnresourceUrls
and notblobs
. -
fetchTimeout
- the time it takes to fail on a hanging fetch request for getting a resource. Default: 10000 (10 seconds) -
skipResources
- an array of absolute URL's of resources which shouldn't be fetched byprocessPage
. -
compressResources
- a boolean indicating whether to use thedeflate
algorithm on blob data in order to return a smaller response. The caller should theninflate
the blobs to get the value. -
serializeResources
- a boolean indicating whether to return blob data as base64 strings. This is useful in most cases since theprocessPage
function is generally run from outside the browser, so its response should be serializable.
This script receives a document, and returns an object with the following:
-
url
- the URL of the document. -
cdt
- a flat array representing the document's DOM in CDT format. -
resourceUrls
- an array of strings with URL's of resources that appear in the page's DOM or are referenced from a CSS resource but are cross-origin and therefore could not be fetched from the browser. -
blobs
- an array of objects with the following structure:{url, type, value}
. These are resources that the browser was able to fetch. Thetype
property is theContent-Type
response header. Thevalue
property contains an ArrayBuffer with the content of the resource. -
frames
: an array with objects which recursively have the same structure as theprocessPage
return value:{url, cdt, resourceUrls, blobs, frames}
. -
srcAttr
- for frames, this is the original src attribute on the frame (in use by Selenium IDE Eyes extension) -
crossFrames
- an array of objects with the following structure:{selector, index}
. Theselector
field has a value of css selector (strings) that point to cross origin frames. Theindex
is an index (number) of frame node in acdt
array, this could be useful to override src attribute once dom snapshot is taken. The caller can then callprocessPage
in the context of those frames in order to build a complete DOM snapshot which also contains cross origin iframes. -
selector
- a css selector (string) for the frame (only for iframes). This is helpful to construct the full frame chain that leads to cross origin iframes on the caller side.
The script scans the DOM for resource references, fetches them, and then also scans the body of css resources for more references, and so on recursively.
This function calls processPage
and returns immediately. Then pollResult
should be called (or any of the ...Poll
script variations, for backwards compatibility) to get the polling result.
This function accepts the same arguments as processPage
, with one additional parameter:
-
chunkByteLength
- this will cause additional polling after the snapshot is ready, and will transfer the result in chunks, with the chunk size specified. Default: undefined.
For example, to pass a maximum chunk size of 256MB:
procesPagePoll({chunkByteLength: 1024 * 1024 * 256})
The polling result is a stringified JSON object, which is of the following shape:
{
status: string,
error: string,
value: object
}
Status could be one of:
- "SUCCESS" - there's a
value
field with the return value - "ERROR" - there's an
error
field with the result - "WIP" - internal status, handled by
pollResult
to continue polling until "SUCCESS" or "ERROR" are received. - "SUCCESS_CHUNKED" - internal status, handled by
pollResult
to continue polling until the entire value is received (used withchunkByteLength
).
returns the poll result - an object with the same shape as processPagePoll
.