@vidangel/misthios
TypeScript icon, indicating that this package has built-in type declarations

1.2.0 • Public • Published

misthios

Greek for "mercenary".

Server for finding works that aren't currently in the system and grabbing the necessary details to record them.

Usage:

import { scrape, STATUS } from '@vidangel/misthios';

The scrape function has the following signature:

function scrape(href: string, pageText?: string): Promise<[number, WorkData[]]>;

It takes a URL to a video on a supported service, and an optional string containing the HTML source of the corresponding page. If the href is not a valid URL, or is not a URL to a video or not to a supported service, an error code from the STATUS enum (INVALID, UNRECOGNIZED, NOT_VIDEO, FORBIDDEN, UNAVAILABLE, ERROR) will be returned, along with an empty array. If there are no errors, STATUS.SUCCESS will be returned along with a JSON array of WorkData objects with information about found works will be returned, with the following form:

[{
  type: "movie"|"series"|"episode";
  title: string;
  description: string;
  poster: string;
  service: string;
  serviceWorkId: string;
}, ...]

Examples:

scrape("https://www.netflix.com/watch/70116061?trackId=14292320^&tctx=0^%^2C0^%^2C3a9c5462-81a2-496b-95a7-5e658cc233c4-105543979^%^2Ceb52bf24-223d-4ed7-b5bf-1f382190c748_46881781X19XX1591291554940^%^2Ceb52bf24-223d-4ed7-b5bf-1f382190c748_ROOT^%^2C") returns

[STATUS.SUCCESS, [{"title":"The Boy in the Iceberg","description":"Katara and Sokka make a startling discovery while fishing: a boy frozen in an iceberg, perfectly preserved and -- amazingly -- alive.","poster":"https://occ-0-1715-3997.1.nflxso.net/dnm/api/v6/9pS1daC2n6UGc3dUogvWIPMR_OU/AAAABRdLsjRMJUz-3Y9nS-ua4nJ67ukQLZ8Rt83NJ89l8wg1XrHGDLVXMnOXCPXZiH0fxYxBmMAvb92LzRnf55VBY6VlquGoXDYuE5CEyExJ5EIVru18.jpg","service":"netflix","serviceWorkId":"70116061"}]]

scrape("https://www.hulu.com/watch/c0d5b332-f6d1-4edd-b9ef-7ae479b88f4a") returns

[STATUS.FORBIDDEN, []]

Note that Amazon URLs do not uniquely identify a video in a series, so scrape("https://www.amazon.com/gp/video/detail/B00BZHX5OA/ref=atv_hm_hom_3_c_ZcoQlV_brws_7_1") returns

[STATUS.SUCCESS, [
  {"service":"amazon","type":"series","title": "Monk","description":"He's ingenious, he's phobic, he's obsessive-compulsive. Two-time Emmy and Golden Globe-winner Tony Shalhoub is former police detective Adrian Monk. The brilliant Monk is now back fighting crime and his abnormal fears of germs, cars, heights, crowds and virtually everything else known to man.","poster":"https://images-na.ssl-images-amazon.com/images/S/sgp-catalog-images/region_US/nbc-MNK-02-Full-Image_GalleryBackground-en-US-1486479220139._SX1080_.jpg","serviceWorkId":"B00BZHX5OA"},
  {"service":"amazon","type":"episode","title":" 1. Mr. Monk Goes Back to School ","description":"Monk takes a job as a substitute high school teacher to determine whether a teacher's fall from a clock tower was really a suicide, or a lesson in murder.","poster":"https://images-na.ssl-images-amazon.com/images/I/51IARx3beKL._SX268_.jpg","serviceWorkId":"B002UO4ZJG"},
  {"service":"amazon","type":"episode","title":" 2. Mr. Monk Goes to Mexico ","description":"Monk travels to Mexico to investigate the mysterious demise of a skydiver who reportedly drowned to death... in mid-air.","poster":"https://images-na.ssl-images-amazon.com/images/I/51mfCh+V9xL._SX268_.jpg","serviceWorkId":"B002UOGU0S"},
  {"service":"amazon","type":"episode","title":" 3. Mr. Monk Goes to the Ballgame ","description":"When a ruthless CEO and his wife are lured to their deaths, Monk connects their murders to a baseball great and his quest for the homerun record.","poster":"https://images-na.ssl-images-amazon.com/images/I/3189ZMBVBuL._SX268_.jpg","serviceWorkId":"B002UJI6YG"},
  {"service":"amazon","type":"episode","title":" 4. Mr. Monk Goes to the Circus ","description":"When a circus ringmaster is murdered by a high flying daredevil, all leads point to his vengeful ex-wife. But how could the acrobatic ex have committed the crime with a broken foot?","poster":"https://images-na.ssl-images-amazon.com/images/I/51Ty4neuEmL._SX268_.jpg","serviceWorkId":"B002UOFMJ8"},
  {"service":"amazon","type":"episode","title":" 5. Mr. Monk and the Very, Very Old Man ","description":"Captain Stottlemeyer takes a page from the book of Monk in order to determine why anyone would want to murder the world's oldest man.","poster":"https://images-na.ssl-images-amazon.com/images/I/51oduIKgFOL._SX268_.jpg","serviceWorkId":"B002UO305G"},
  {"service":"amazon","type":"episode","title":" 6. Mr. Monk Goes to the Theatre ","description":"Monk reluctantly takes the stage to investigate an actor's murder.","poster":"https://images-na.ssl-images-amazon.com/images/I/41-DTEO+LpL._SX268_.jpg","serviceWorkId":"B002UNDVGK"},
  {"service":"amazon","type":"episode","title":" 7. Mr. Monk and the Sleeping Suspect ","description":"Monk knows who is behind a string of mail bombs that have been exploding around San Francisco. The only problem is his primary suspect... is in a coma.","poster":"https://images-na.ssl-images-amazon.com/images/I/51OveJ+fH2L._SX268_.jpg","serviceWorkId":"B002UOMH0K"},
  {"service":"amazon","type":"episode","title":" 8. Mr. Monk Meets the Playboy ","description":"When a high-powered magazine publisher chokes to death under mysterious circumstances, all clues lead to the infamous swinging party palace, The Sapphire Mansion.","poster":"https://images-na.ssl-images-amazon.com/images/I/51YHEBhpdIL._SX268_.jpg","serviceWorkId":"B002UQ3NVA"},
  {"service":"amazon","type":"episode","title":" 9. Mr. Monk and the 12th Man ","description":"When a rash of murders sweeping the city goes unsolved, Monk is called upon to find a correlation with the victims and stop a killer before he strikes again.","poster":"https://images-na.ssl-images-amazon.com/images/I/51rPQvB7WqL._SX268_.jpg","serviceWorkId":"B002URDK3K"},
  {"service":"amazon","type":"episode","title":" 10. Mr. Monk and the Paperboy ","description":"When his paperboy is murdered, Monk turns to the pages of the newspaper for clues to solve the baffling crime.","poster":"https://images-na.ssl-images-amazon.com/images/I/51kPDa6yRWL._SX268_.jpg","serviceWorkId":"B002UO7KL6"},
  {"service":"amazon","type":"episode","title":" 11. Mr. Monk and the Three Pies ","description":"What could possibly make a cherry pie worth killing for? Monk finds out, with a little help from his long-lost brother Ambrose. John Turturro guest stars.","poster":"https://images-na.ssl-images-amazon.com/images/I/41NH-tWMtaL._SX268_.jpg","serviceWorkId":"B002UQU2BE"},
  {"service":"amazon","type":"episode","title":" 12. Mr. Monk and the TV Star ","description":"Monk suspects the star of a hit TV crime show of killing his ex-wife. His only problem: the actor's rock solid alibi. Sarah Silverman guest stars.","poster":"https://images-na.ssl-images-amazon.com/images/I/41ymhPbXcSL._SX268_.jpg","serviceWorkId":"B002UOSROA"},
  {"service":"amazon","type":"episode","title":" 13. Mr. Monk and the Missing Granny ","description":"A law student promises to get Monk reinstated to the police force in exchange for his help in finding the kidnappers of her beloved grandmother.","poster":"https://images-na.ssl-images-amazon.com/images/I/41Y6UoOXYUL._SX268_.jpg","serviceWorkId":"B002UOOAOG"},
  {"service":"amazon","type":"episode","title":" 14. Mr. Monk and the Captain's Wife ","description":"When the Captain's wife falls victim to a union dispute gone awry, it's up to Monk to find out what really happened and bring Stottlemeyer back from the brink.","poster":"https://images-na.ssl-images-amazon.com/images/I/41pXf5jwnYL._SX268_.jpg","serviceWorkId":"B002URU3DK"},
  {"service":"amazon","type":"episode","title":" 15. Mr. Monk Gets Married ","description":"It's unholy matrimony as Monk and Sharona pretend to be married to get the goods on a con man.","poster":"https://images-na.ssl-images-amazon.com/images/I/41XZnvZBreL._SX268_.jpg","serviceWorkId":"B002WVNMCI"},
  {"service":"amazon","type":"episode","title":" 16. Mr. Monk Goes to Jail ","description":"When a death-row inmate is murdered 45 minutes before his execution, Monk is brought in to find out why.","poster":"https://images-na.ssl-images-amazon.com/images/I/41pOHflSgeL._SX268_.jpg","serviceWorkId":"B002UR82II"}
]]

Implementing a New Service

Service objects must have the following interface:

export type RefInfo = {
  type: number;
  serviceWorkId: string;
  redirect?: string;
  interpret?: "execute" | "parse";
};

export interface Service {
  test(url: URL): boolean;
  validate(url: URL): RefInfo;
  getData(info: RefInfo, dataText: string, dom?: DOMWindow): Promise<WorkData[]>;
}

The test function is used to determine whether a particular URL is associated with the given service. This is used to select which service object to use for scraping a given url.

The validate function is used to ensure that a given URL points to a page with usable video data, and to return extra information about the kind of page that it is to assist in the actual data extraction. Returning a type of RefType.NONE indicates that this is not a valid video URL (even though it has already been identified as a URL associated with the given service). Returning a type of RefType.FORBIDDEN indicates that we are (legally) forbidden from filtering the referenced video, so data about it should not be scraped. If data cannot be easily scraped from the referenced video page itself, but can be from some alternate API, the optional redirect field can be used to provide an alternate URL from which to fetch data. The optional interpret field can be used to automatically process the retrieved source of an HTML text into a DOM object. If it is set to "parse", the HTML will be merely be parsed into a DOM object; if it is set to "execute", scripts referenced by the page will be fetched and executed, possibly transforming the DOM from its original state. This is useful if, e.g., you need to extract data from a rendered web app which is not present in the HTML source.

The getData function is used to actually extract video information from the source obtained by fetching the given URL. To avoid duplicating work, it receives the object returned by validate as its first argument. If the interpret field was set, getData can take an optional third argument which is the DOM Window object that resulted from parsing the source in dataText.

The service object should be exported from a module file in the src/services directory. Additionally, service objects must be the only objects exported from any module in the src/services directory. The names used do not matter, but by convention each service object should be implemented in a separate module from which it is the sole export.

Readme

Keywords

none

Package Sidebar

Install

npm i @vidangel/misthios

Weekly Downloads

0

Version

1.2.0

License

UNLICENSED

Unpacked Size

58.1 kB

Total Files

38

Last publish

Collaborators

  • vidangel