@ta11y/extract
Extracts content from websites for running accessibility audits with ta11y.
Install
npm install --save @ta11y/extract
Usage
The easiest way to use this package is to use the CLI.
const { extract } = require('@ta11y/extract')
extract('https://en.wikipedia.org')
.then((result) => {
console.log(result.summary) // overview of results (number of urls visited, success, error)
console.log(result.results) // detailed results keyed by url
})
const { extract } = require('@ta11y/extract')
// example passing HTML directly
extract('<!doctype><html><body><h1>I ❤ accessibility</h1></body></html>')
.then((result) => {
console.log(result.summary) // overview of results (number of urls visited, success, error)
console.log(result.results) // detailed results keyed by url
// note that the result key for an HTML input is 'root' instead of url
})
API
extract
Extracts the dynamic HTML content from a website, optionally crawling the site to discover additional pages and extracting those too.
Type: function (urlOrHtml, opts): Promise
-
urlOrHtml
string URL or raw HTML to process. -
opts
object Config options.-
opts.browser
object Required Puppeteer browser instance to use. -
opts.crawl
boolean Whether or not to crawl additional pages. (optional, defaultfalse
) -
opts.maxDepth
number Maximum crawl depth while crawling. (optional, default16
) -
opts.maxVisit
number? Maximum number of pages to visit while crawling. -
opts.sameOrigin
boolean Whether or not to only consider crawling links with the same origin as the root URL. (optional, defaulttrue
) -
opts.blacklist
Array<string>? Optional blacklist of URL glob patterns to ignore. -
opts.whitelist
Array<string>? Optional whitelist of URL glob patterns to only include. -
opts.gotoOptions
object? Customize thePage.goto
navigation options. -
opts.viewport
object? Set the browser window's viewport dimensions and/or resolution. -
opts.userAgent
string? Set the browser's user-agent. -
opts.emulateDevice
string? Emulate a specific device type.- Use thename
property from one of the built-in devices.- Overrides
viewport
anduserAgent
.
- Overrides
-
opts.onNewPage
function? Optional async function called every time a new page is initialized before proceeding with extraction.
-
License
MIT © Saasify