Npm package for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, description, keywords, meta tags
MetaInspector is an npm package for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, description, keywords, meta tags....
Metainspector is inspired by the Metainspector gem by jaimeiniesta
client.url # URL of the pageclient.scheme # Scheme of the page (http, https)client.host # Hostname of the page (like, markupvalidator.com, without the scheme)client.rootUrl # Root url (scheme + host, i.e )client.title # title of the page, as stringclient.links # array of strings, with every link found on the page as an absolute URLclient.author # page author, as stringclient.keywords # keywords from meta tag, as arrayclient.charset # page charset from meta tag, as stringclient.description # returns the meta description, or the first long paragraph if no meta description is foundclient.image # Most relevant image, if defined with og:imageclient.images # array of strings, with every img found on the page as an absolute URLclient.feeds # Get rss or atom links in meta data fields as arrayclient.ogTitle # opengraph titleclient.ogDescription # opengraph descriptionclient.ogType # Open Graph Object Typeclient.ogUpdatedTime # Open Graph Updated Timeclient.ogLocale # Open Graph Locale - for languages
timeout - Defines the time Metainspector will wait for the url to respond in msmaxRedirects - Specifies the number of redirects Metainspector will followlimit - The limit in the number of bytes Metainspector will download when querying a site
var MetaInspector = require'node-metainspector';var client = "" timeout: 5000 ;clienton"fetch"console.log"Description: " + clientdescription;console.log"Links: " + clientlinksjoin",";;clienton"error"console.logerr;;clientfetch;
Finish implementation of the properties below:
Add absolutify url function to return all urls as an absolute urlclient.internal_links # array of strings, with every internal link found on the page as an absolute URLclient.external_links # array of strings, with every external link found on the page as an absolute URL
You're welcome to fork this project and send pull requests. Just remember to include tests.
Copyright (c) 2009-2012 Gabriel Cebrian, released under the MIT license