A PhantomJS script for running Wappalyzer over many sites using a headless Webkit browser
Phantalyzer is a PhantomJS (headless Webkit browser bot) based tool that leverages Wappalyzer (browser plugin) to detect software in use across a large number of sites. Wappalyzer is a browser plug-in so it's original design is to provide feedback from within the browser. My intent here is to enable companies to analyze large numbers of sites and provide a report. An example of this would be a report that indicates which sites are using Flash (and may need to be converted), which are not using proper analytics tags, etc.
After getting the Wappalyzer functionality up and running I added a few other features such as image capture for the site, site load time, and breakdown of resources loaded (HTML, CSS, etc).
The basic workflow is that you run the script with a CSV file input that provides sites. The script runs and dumps images out and a resulting json file that contains info about the run. You can then do whatever you want with that data.
You have to install phantomjs becuase this script depends on it.
--sitefile option gives you the ability to specify the csv file for the data.
--maxsites option is used to limit the number of sites visited. Good for testing.
--outputdir option is used to tell Phantalyzer where the result.json file and the images should be stored.
--imgext option is used to tell Phantalyzer which image extension should be used. png and jpg are supported.
phantomjs --web-security=no phantalyzer.js --sitefile sites.csv --maxsites 10 --outputdir ./data --imgext png