context-url-extractor

1.0.1 • Public • Published

context-url-extractor

standard-readme compliant

Methods for extracting URLs from HTML or text strings with surrounding context

When data mining content that contains URLs, it's far easier for a machine to categorise them if they are semantic (or friendly) URLs:

Bad URL

Good URL

This package provides a jumping off point for data mining the surrounding context of each URL found in the supplied content.

Table of Contents

Install

npm install --save context-url-extractor

Usage

const extractor = new ContextUrlExtractor({ content });
const res = extractor.extractUrls();

Custom Context Lengths

The default pre and post context string lengths are set to 170 characters, but this can be overridden in the constructor.

const extractor = new ContextUrlExtractor({ content, contextCharsBefore: 80, contextCharsAfter: 80 });

Example Response

[
    {
        "url": "https://example.com/profile.aspx?section=99&trId=9877A4CF44987123AED90&rd=722108935",
        "contextPre": "nd. To log in to your profile please <a href=\"",
        "contextPost": "\">click here</a> and sign in with your email "
    }
]

Maintainers

@njhoran

Contributing

Small note: If editing the README, please conform to the standard-readme specification.

License

MIT © 2019 njhoran

Package Sidebar

Install

npm i context-url-extractor

Weekly Downloads

7

Version

1.0.1

License

MIT

Unpacked Size

12.2 kB

Total Files

8

Last publish

Collaborators

  • njhoran