ipbcrawler

1.0.0 • Public • Published

Invision Power Board Crawler

Package to mine forum data using the Invision Power Board platform.

Goals

We hope to allow you to get all topics that have already been posted in a particular forum, so you may be creating a seeder for your blog or even for your forum.

What can you shave with this package?

  • Home
  • List of topics
  • Top Topic Post

1. What is scraped from home

  • Forum Areas

  • Categories The icon, title, subcategories and description is returned

  • Ranks The name of the charges is returned

2. What is scraped from topic list

  • Pagination
  • Topics The id, title and url is returned

3. What is scraped from topic page

  • Title topic
  • Topic content
  • Topic author

How to use

To install

npm i ipbcrawler --s

Process all topics

We provide an asynchronous function (with async) to access and scrape all topics contained within a category (yes, it will scroll through all pages)

const { findPosts } = require('ipbcrawler')

The system allows you to mine from different forums and return the post already with the category id of your forum.

To do this simply add the options and call

const options = [{
	url: [ 
		'https://example/forum/153-games'
		'https://otherexample/forum/23-games-pc'
	],
	// the category id of my gaming forum
	id: 140
}]

findPosts(options)
	.then(topics => console.log(topics))
	.catch(e => console.log(e))

You will have as a return

[{
    category: 140
    posts: [{
	    author: "Filipe",
	    post: "This is a sample post."
    }]
}]

Extractions

If you want to access extractions individually, it's very simple

You can import the following extractions

  1. homeExtraction
  2. postExtraction
  3. listTopicsExtraction

All of them receive a Cheerio object, for this you just follow the example

const { domObject, homeExtraction } = require('ipbcrawler')

const extraction = async url => {
	const $ = await domObject(url)
	
	return homeExtraction($)
}

extraction("https://example.com/forum/home")
	.then(home => console.log(home))
	.catch(e => console.log(e))

Object returned by each extraction

homeExtraction

{
  "zones": [
    {
      "title": string,
      "categories": [
        {
          "icon": string,
          "title": string,
          "description": string,
          "subCategories": [ { "title": string } ]
        }
      ]
    }
  ],
  "ranks": [{
      "name": string,
      "withHTML": string
  }]
}

listTopicsExtraction

{
  "topics": [
    {
      "id": string,
      "url": string,
      "title": string
    }
  ]
}

postExtraction

{
  "title": string,
  "post": string,
  "author": string
}

Sorry for English.

Package Sidebar

Install

npm i ipbcrawler

Weekly Downloads

4

Version

1.0.0

License

MIT

Unpacked Size

814 kB

Total Files

23

Last publish

Collaborators

  • filipemacedo