ttad - Text Title Author Date Extractor
This Node module extracts a single piece of
- Text - t
- Title - t
- Author - a
- Date - d
from a given string or list of urls.
API
Tries to extract ttad from a string:
ttad.extract_from_str(s)
Extracts ttad from a list of urls via puppeteer:
ttad.extract_from_urls(urls)
Both functions either return a error or a object of the structure:
let resultObj = text: '' title: '' author: '' date: '';
Installation
npm install ttad
How it works
ttad makes use of
If those libraries fail to extract content properly, we will just grab the
whole innerText
property of the <body>
tag.
Example
const extract_from_url = ; async const urls = 'https://www.politico.eu/article/6-elections-to-watch-in-2018/' 'https://www.weeklystandard.com/elliott-abrams/the-real-palestinian-catastrophe' 'https://www.bloomberg.com/view/articles/2018-05-18/venezuela-s-election-pits-dollars-against-bolivars' ; config = evadeDetection: false headless: true // ['stylesheet', 'font', 'image', 'media']; interceptRequests: ; console; ;