raspador
    TypeScript icon, indicating that this package has built-in type declarations

    1.1.0 • Public • Published

    Raspador

    Raspador - Metadata scraping made easy!

    TypeScript Commitizen Prettier EsLint

    A simple and powerful library for scraping metadata. Easy to use scraper without much overhead. No complex logic involved. Just create your selectors and let raspador handle the rest.

    Written in TypeScript, so you don't have to worry about finding and installing types separately.

    Usage

    1. Import the required items from the package
    import raspador, { ld$, Root, Selectors } from 'raspador';
    1. Create the scraper by providing the html string to raspador function
    const scraper = raspador(html);
    1. Setup the rules
    const selectors: Selectors = ($: Root) => ({
      title: [$('meta[property="og:title"]').attr('content'), $('title').text()],
      author: [ld$($, 'creator[0]')],
    });
    1. Get the meta data by passing the selectors to the scraper
    const result = scraper(selectors);

    Selectors

    Raspador uses Cheerio and hence the selectors are compatible. Here are some selectors:

    $('meta[property="og:image:url"]').attr('content');
    $('html').attr('lang');
    $('meta[property="og:logo"]').attr('content');

    Raspador exposes another selector for selecting keys from the Linked Data present in:

    <script type="application/ld+json">
      {
        "@context": "https://schema.org/",
        "@type": "Recipe",
        "name": "Party Coffee Cake",
        "author": {
          "@type": "Person",
          "name": "Maicy Williams"
        },
        "datePublished": "2018-03-10",
        "description": "This coffee cake is awesome and perfect for parties.",
        "prepTime": "PT20M"
      }
    </script>

    For selecting the author name from the Linked/Structured Data:

    ld$($, 'author.name')-- > 'Maicy Williams';

    Selectors will be a function which receives the $ which returns an object where the key can be some identifier and the value will the array of selectors.

    const selectors = ($: Root) => ({
      title: [$('meta[property="og:title"]').attr('content'), $('title').text()],
      author: [ld$($, 'creator[0]')],
    });

    Full Example

    import fetch from 'node-fetch';
    import raspador, { ld$, Root, Selectors } from 'raspador';
    
    (async () => {
      const html = await fetch(
        'https://blog.sreyaj.dev/implementing-feature-flags-in-angular'
      ).then((res) => res.text());
      // Initialize raspador by passing in the html
      const scraper = raspador(html);
    
      // Setup the selectors
      const selectors: Selectors = ($: Root) => ({
        title: [$('meta[property="og:title"]').attr('content'), $('title').text()],
        author: [ld$($, 'creator[0]')],
      });
      
      // Pass the selectors to get the result
      const result = scraper(selectors);
      console.log({ result });
    })();

    Local Development

    1. Clone or download the repo
    2. Install dependencies
    npm install
    
    1. Start the dev server
    npm run dev
    

    🤝 Contributing

    Contributions, issues and feature requests are welcome.
    Feel free to check issues page if you want to contribute.

    Author

    👤 Adithya Sreyaj

    👍🏼 Show your support

    Please ⭐️ this repository if this project helped you!

    Inspiration and Idea

    Show your support for MetaScraper: MetaScraper

    Install

    npm i raspador

    DownloadsWeekly Downloads

    2

    Version

    1.1.0

    License

    MIT

    Unpacked Size

    199 kB

    Total Files

    6

    Last publish

    Collaborators

    • adi.sreyaj