endl

Link extractor, downloader, executer, unzipper

endl (Extractor and Downloader) by Doğan Çelik

A program for extracting links from web pages and downloading them.

endl has a very simple also an advanced API for link extracting, file downloading, executing and unzipping.

Every version under 1.0 is beta. This means it has bugs and features can change.

You can install it with npm:

Then you can endl from anywhere.

Like Handel the composer, but without the handel :)

Alternative names for endl are:

  • lendl (Link Extractor and Downloader)
  • glendl (Great Link Extractor and Downloader)
  • edle (Extractor, Downloader, Executer)
  • ledle (Link Extractor, Downloader, Executer)

If you have a better name, create a new issue because seriously, coming up with names is hard... :weary:

This is written in CoffeeScript.

endl = require 'endl'
 
endl.load('http://lame.buanzo.org/')
  .find('a[href^="http://lame.buanzo.org/Lame_"]')
  .download(pageUrlAsReferrer: truefilenameMode: { urlBasename: true })
  1. We require our endl module. (Node style)
  2. endl.load() loads the page we want. (It takes two arguments, second argument is an options object and optional.)
  3. find() finds the elements we want. (Works just like jQuery and querySelectorAll)
  4. Download our file to the current directory, using basename of our download link for file name and using our page URL as Referer header.

Things to note:

  • We actually get 4 elements when we do find() but download() automatically selects the first element (0-index). Use index() to change index of element array.
  • download() after find() is a shortcut. The long way is: find(...)href()download(...)
  • findXpath doesn't work. Blame web pages (for incorrect structure), xmldom and xpath modules.
  • Unify all downloading, extraction and execution options across submodules. (endl.coffee, file.coffee, parser.coffee) These 3 submodules have different default options for each task.
  • Add tests in this century.

Downloads PuTTY portable to the current directory.

endl.load('http://portableapps.com/apps/internet/putty_portable')
  .find('.sf-download a')
  .load(pageUrlAsReferrer: true)
  .find('.direct-download')
  .download(pageUrlAsReferrer: truefilenameMode: { urlBasename: true })

If you do load() after find or findXpath, it will automatically load href attribute of the first element. (If you want to select another element, use index())

If you do download() after find or findXpath, it will automatically download href attribute of the first element.

Downloads Lame for Windows and installs it silently.

extractor.load('http://lame.buanzo.org/')
  .find('a[href^="http://lame.buanzo.org/Lame_"]')
  .download(
    pageUrlAsReferrer: true
    fileDirectory: './downloads'
    filenameMode: { urlBasename: true }
  )
  .execute("/VERYSILENT /NORESTART /LOG")

Thanks to this blog for providing the arguments for silent install.

Downloads Request (NodeJS module) and change directory of ZIP to request-master, extract all JS files to ./unzip.

endl.file('https://github.com/request/request/archive/master.zip')
  .download(pageUrlAsReferrer: true, filenameMode: { contentDisposition: true })
  .extract(to: './unzip', cd: 'request-master', fileGlob: '*.js', maintainEntryPath: false)

This is just an example, you can use JSON too.

This example will download multiple files. It will extract the first item. It will install the second item.

[
  {
    url: 'http://www.mp3tag.de/en/download.html'
    find: 'div.download a'
    filenameMode: ['urlBasename', 'contentType']
  }
  {
    url: 'http://slimerjs.org/download.html'
    find: 'a.btn'
    findIndex: 4,
    filename: 'slimerjs.zip'
    extract:
      to: 'C:/slimerjs',
      cdRegex: '^slimerjs'
      fileGlob: '*.png'
      maintainEntryPath: false
  }
  {
    download: 'http://rammichael.com/downloads/7tt_setup.exe'
    execute: ['/S']
  }
]
endl d "http://www.mp3tag.de/en/download.html" "div.download a"

Returns: extractorInstance

Function nameReturnsInfo
find(query, options)containerInstanceSame as querySelectorAll
findXpath(query, options)containerInstanceSame as evaluate
Function nameReturnsInfo
load(attrName, options)extractorInstanceCreates an extractorInstance of href() or attrName of the container
attr(attrName)attrInstanceSelect the attribute of the element
href()attrInstanceShortcut for attr('href')
index(index)containerInstanceSelects an element from the array (if there is an array)
download(options)fileInstanceShortcut for href().download()

Notice: It can contain more than one element, use attr(), href(), download() wisely. If you use attr('href') in a 10-element container, it will select the first element's href.

Function nameReturnsInfo
load(options)extractorInstanceCreates an extractorInstance of attrInstance's value
download(options)fileInstanceCreates a fileInstance and downloads the link
Function nameReturnsInfo
download(options)fileInstanceCreates a fileInstance and downloads the link
extract(options)fileInstanceExtracts a ZIP file
unzip(options)fileInstanceAlias for extract()
execute(options)fileInstanceExecutes the file