The best Webdriver API for node to date

(According to its author. Pinch of salt required.)

Be productive in less than 20 minutes.

Reading this short document is enough to cover every aspect of the API.

Be sure to check out the API documentation

Yes, the best webdriver is hosted on github

Yes, it's also available on NPM

Intro

Slim code: 817 lines of code and 7 active classes, compared to the selenium-webdriver's 5654 lines of code and 92 classes
100% W3C's webdriver compliant. The code only ever makes pure webdriver calls
(Having said that) Compatibility layer for specific browsers, in order to fix mistakes and gaps in drivers' implementations
Well documented API which comes with a simple quickstart guide
The API is async/await friendly. Each call returns a promise. Development is a breeze
Easy to debug. There is a 1:1 mapping between calls and the webdriver protocol, without trickery
Simple system to define sequences of webdriver UI actions

Get your system ready

First of all, install best-webdriver using NPM:

npm install --save best-webdriver

Also, make sure you install at least one webdriver on your computer:

Once you are done, you are pretty much ready to go.

Create a session on a locally spawn webdriver

To open up a driver, simply run:

;(async () => {
  try {
    const { drivers, Config, Actions } = require('best-webdriver')

    // Create a new driver object, using the Chrome browser
    var driver = new drivers.ChromeDriver(new Config())

    // Create a new session. This will also run `chromewebdriver` for you
    await driver.newSession()

    // ...add more code here
    // This is where code from this guide will live
  } catch (e) {
    console.log('ERROR:', e)
  }
})()

If everything goes well, you will see a Chrome window appear. Note that that (async () => { is there to make sure that you can use await.

The role of the Chrome-specific Driver here is:

To provide a way to execute Chrome's webdriver command
To provide a software layer around Chrome's own limitations or mistakes in implementing the W3c protocol

Please note that in this guide it will always be assumed that the code is placed in // ...add more code here, and that the async function, require and session creation won't be repeated.

Understanding session options

Understanding how sessions are created is crucial. This section explains the config object itself (and helper methods), creating a session without spawning a webdriver process, and creating a session with the generic Driver.

The basic config object

Most of the time, especially when you are just starting with webdrivers, you tend to use APIs such as this one for one specific browser's webdriver. Most APIs (including this one) will spawn a Chrome webdriver process, for example, when you create a new session using the ChromeDriver:

var driver = new drivers.ChromeDriver(new Config())

At this point, no process is spawned yet. However, when you run:

await driver.newSession()

The driver, by default, will use the driver'srun() method to spawn a chromedriver process, and will then connect to it and create a new browsing session.

You can use any one of the chromedrivers available: ChromeDriver, FirefoxDriver, SafariDriver, EdgeDriver.

The basic configuration is pretty empty. To see it:

var config = new Config()
var params = config.getSessionParameters()
console.log('Session parameters:', require('util').inspect(params, { depth: 10 } ))

This is display the configuration object created by default by the Chrome browser. You will see:

{
  capabilities: {
    alwaysMatch: {
      goog:chromeOptions: { w3c: true },
    },
   firstMatch: []
  }
}

It's important that you understand the configuration option:

It must have a capabilities key
Under capabilities, it must have the keys alwaysMatch (object) and firstMatch (an array)
It may have more keys in the object's root namespace
goog:chromeOptions (under alwaysMatch) represents Chrome-specific options. In this case, w3c:true is specified in order to use Chrome with this API (since this API implements webdriver in its pure form, you need Chrome to use the W3c protocol as much as possible).

Setting session parameters

You can set the config options using the setting methods:

var config = new Config()
config.setAlwaysMatch('browserName', 'chrome')
      .setAlwaysMatch('pageLoadStrategy', 'eager')
      .addFirstMatch({ platformName: 'linux' })
      .set('login', 'merc')
      .set('password', 'youwish')
      .setSpecific('chrome', 'detach', true)

var params = chrome.getSessionParameters()
console.log('Session parameters:', require('util').inspect(params, { depth: 10 } ))

You will see:

{
  // set by set()
  login: 'merc',
  pass: 'youwish',

  capabilities: {
    alwaysMatch: {

      goog:chromeOptions: {

        // Always here, to make Chrome compliant
        w3c: true,

        // Set by setSpecific()
        detach: true
      },

      // Set my setAlwaysMatch()
      browserName: 'chrome',
      pageLoadStrategy: 'eager'
    },
    firstMatch: [

      // Added by addFirstmatch()
      { platformName: 'linux' }
    ]
  }
}

Remember that in Config#setAlwaysMatch, Config#set and Config#setSpecific, the key can actually be a path: if it has a . (e.g. chrome.setAlwaysMatch('timeouts.implicit), the property capabilities.alwaysMatch.timeouts.implicit will be set.

Running the API without spawning a webdriver

You might decide to use this API without spawning a process for the chromedriver. This is especially handy if you are using for example an online service, or a webdriver already running on a different machine.

Here is how you do it. Notice the spawn: false property:

// Create the driver, using that browser's
// configuration WITHOUT spawning a chromedriver process
var driver = new drivers.ChromeDriver(new Config(), {
  spawn: false,
  hostname: '10.10.10.45',
  port: 4444
})

Note that since you are using the ChromeDriver driver, the remote end will be assumed to be a Chrome webdriver: it will fix any mistakes and partial implementations of the W3C protocol.

The generic "Driver" driver

Lastly, you might want to connect to a generic webdriver proxy, which will accept your session requirement and will provide you with a suitable browser. In this case, you will use the generic driver Driver, which is a "plain" driver without the ability to spawn a webdriver process (obviously) and, more cruclaly, no browser-specific layering to fix problems with vendor-specific issues with their implementation.

Here is how you would run it:

// Create a new generic browser object, specifying the alwaysMatch parameter
var config = new Config()

// We only care that this is a linux browser
config.setAlwaysMatch('platformName', 'linux')

// Creating the driver
var driver = new drivers.Driver(config, {
  hostname: '10.10.10.45',
  port: 4444
})

Note that you are using the generic Driver, which means that no browser-specific workarounds for W3C compliance will be applied.

Running amok with driver calls

If you have the following chunk of code:

// Create a new driver object, using the Chrome browser
var driver = new drivers.ChromeDriver(new Config())

// Create a new session. This will also run `chromewebdriver` for you
await driver.newSession()

You can then run commands using the webdriver. There are three types of call:

Calls that will deal with parameters and values on the currently opened page
Calls that will return objects Driver#findElement and Driver#findElement
Call to run user Actions

Finally, all calls can be "polled", which implies re-running the command at intervals until it succeeds, or until it fails (after it reaches a timeout).

Non-element driver calls

Once you've created a driver object, you can use it to actually make webdriver calls.

For example:

var driver = new drivers.ChromeDriver(new Config())
await driver.newSession()
await driver.navigateTo('https://www.google.com')
var screenshotData = await driver.takeScreenshot()
var src = await driver.getPageSource()
var title = await driver.getTitle()
await driver.refresh()

All of these commands are self-explanatory, and fully documented in the Driver documentation (basically, all of the listed calls under the Driver object)

Remember that there is a 1:1 mapping between driver calls and Webdriver calls.

Returning elements

Some of the driver calls will return an Element object. For example:

await driver.navigateTo('https://www.google.com')    
var el = await driver.findElementsCss('[name=q]')

The returned element will be an instance of Element, created with the data returned by the findElementCss() call. An element object is simply an object with a reference to the Driver that created it, and a unique ID returned by the webdriver call.

Element objects have several element-related methods. For example, you can get the tag name for a found element:

await driver.navigateTo('https://www.google.com')    
var el = await driver.findElementsCss('[name=q]')
var tagName = await el.getTagName()

More importantly, Element objects also offer methods that will return elements. In this case, the search will be limited to elements children of the element being searched. For example:

await driver.navigateTo('https://www.example.com')    
// Get the OL tag
var ol = await driver.findElementsTagName('ol')

// Get the LI tags within OL
var lis = await ol.findElementsTagName('li')

Run Actions

Actions are a rather complex part of the webdriver specs. Actions are important so that you can get the browser to perform a list of timed, complex UI actions.

Actions are always performed by either a keyboard device, or a pointer device (which could be a MOUSE, TOUCH or PEN)

Once the action object is created, you can add "ticks" to it using the property tick (which is actually a getter). The way you use tick depends on the devices you created.

If you call the constructor like this:

var actions = new Actions()

It's the same as writing:

var actions = new Actions(
  new Actions.Keyboard('keyboard'),
  new Actions.Pointer('mouse', Pointer.Type.MOUSE)
)

This will make two devices, mouse and keyboard, available.

Such a scenario will allow you to call:

actions.tick.keyboardDown('r').mouseDown()
actions.tick.keyboardUp('r').mouseUp()

Here, keyboardUp was available as a combination of the keyboard ID keyboard and the keyboard action Up.

In short:

Keyboard devices will have the methods Up, Down
Pointer devices will have the methors Move, Up, Down, Cancel
Both of them have the method pause

If you create an actions object like this:

 var actions = new Actions(new Actions.Keyboard('cucumber'))

You are then able to run:

actions.tick.cucumberDown('r')
actions.tick.cucumberUp('r')

However running:

actions.tick.cucumberMove('r')

Will result in an error, since cucumber is a keyboard device, and it doesn't implement move (only pointers do)

If you have two devices set (like the default keyboard and mouse, which is the most common use-case), you can set one action per tick:

var actions = new Actions() // By default, mouse and keyboard
// Only a keyboard action in this tick. Mouse will pause
actions.tick.keyboardDown('r')
// Only a mouse action in this tick. Keyboard will pause
actions.tick.mouseDown()
// Both a mouse and a keyboard action this tick
actions.tick.keyboardUp('r').mouseUp()

You can only add one action per device in each tick. This will give an error, because the mouse device is trying to define two different actions in the same tick:

actions.tick.mouseDown().mouseUp()

You are able to chain tick calls if you want to:

actions
.tick.keyboardDown('r').mouseDown()
.tick.keyboardUp('r').mouseUp()

Once you have decided your actions, you can submit them:

 await driver.performActions(actions)

You can set multiple touch devices, and use them for multi-touch:

var actions = new Actions(
  new Actions.Pointer('finger1', Pointer.Type.TOUCH),
  new Actions.Pointer('finger2', Pointer.Type.TOUCH)
)
// Define actions: Moving two fingers vertically at the same time
actions
.tick.finger1Move({ x: 40, y: 40 }).finger2Move({ x: 40, y: 60 }
.tick.finger2Move({ x: 40, y: 440 }).finger2Move({ x: 40, y: 460 }

// Actually perform the actions
driver.performActions(actions)

You can also move a pointer over a specific element, specifying how long it will take (in milliseconds):

await driver.navigateTo('https://www.google.com')    
var el = await driver.findElementsCss('[name=q]')
var actions = new Actions(new Actions.Pointer('mouse', Pointer.Type.MOUSE))

// Moving over `el`, taking 1 second
actions.tick.mouseMove({ origin: el, duration: 1000 })

Keyboard devices can perform:

Mouse devices can perform:

The Actions class documentation explains exactly how actions work.

Polling

When writing tests for web sites and applications, timing can become an issue. For example while you know that your page will be load after this:

await driver.navigateTo('https://www.google.com')

What you don't know is this: have all of the AJAX finished fetching data? Has all of the DOM been updated after the event?

The answer is "you don't know". So, the ability to poll is very important.

This API has the simplest, most streamlined approach possible i nterms of polling: there is only one call, waitFor(), which is available in Element#waitFor and {@Driver#waitFor} objects.

The way it works is really simple: waitFor() actually acts as a proxy to the real object calls, wit hthe twist that it will retry them until they work out. Each call will also accept one extra parameter (compared to their signature), which is a function that will also return a truly value for the call to be successful.

So, while you would normally do:

var el = driver.findElementCss('#main')

If you wanted to wait, you would run the following call, which will run findElementsCss() every 300ms, until it's finally worked or until the default timeout of 10000ms (10 seconds) has expired:

var el = await driver.waitFor().findElementCss('#main')

You can set different poll interval and timeout:

driver.setPollTimeout(15000)
driver.setPollInterval(200)

Or, you can set them on a per-call basis:

driver.waitFor(15000, 300).findElementCss('#main')

Finally, you can add one extra parameter to the call: it will be

driver.waitFor().findElementsCss('li', (r) => r.length))

In this case, the callback (r) => r.length will only return truly when r (the result from the call) is a non-empty array.

Behind the scenes, waitFor() returns a proxy object which will in turn run the call and check that it didn't return an error; it also checks that the result passes the required checker function, if one was passed.

The result of this is that one simple chained method, Driver#waitFor/Element#waitFor, turns every call for Driver and Element into a polling function able to check the result.

Limitations

The main limitation of this API is that it will only ever speak in w3c webdriver protocol. For example, as of today Chrome doesn't yet implement Actions. While other APIs try to "emulate" actions (with crippling limitations) by calling non-standard endpoints, this API will simply submit the actions to the chrome webdriver and surely receive an error in response.

Another limitation is that it's an API that is very close to the metal: you are supposed to understand how the session configuration works, for example; so, while you do have helper methods such as setAlwaysMatch(), addFirstMatch() etc., you are still expected to understand what these calls do. Also, browser-specific parameters are added via setSpecific(); however, there are no helpers methods to get these parameters right. For example, if you want to add plugins to Chrome using the extensions option, you will need to create an array of packed extensions loaded from the disk and converted to base64. This may change in the future, as this API matures; however, it won't add more classes and any enhancement will always be close enough to the API to be easy to understand.

Go test!

That's all you need -- time to get testing!

best-webdriver

The best Webdriver API for node to date

Be productive in less than 20 minutes.

Intro

Get your system ready

Create a session on a locally spawn webdriver

Understanding session options

The basic config object

Setting session parameters

Running the API without spawning a webdriver

The generic "Driver" driver

Running amok with driver calls

Non-element driver calls

Returning elements

Run Actions

Polling

Limitations

Go test!

Versions

Current Tags

Version History

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

best-webdriver

The best Webdriver API for node to date

Be productive in less than 20 minutes.

Intro

Get your system ready

Create a session on a locally spawn webdriver

Understanding session options

The basic config object

Setting session parameters

Running the API without spawning a webdriver

The generic "Driver" driver

Running amok with driver calls

Non-element driver calls

Returning elements

Run Actions

Polling

Limitations

Go test!

Versions

Current Tags

Version History

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads