semo-plugin-read
This is a Semo plugin, to provide a cli tool to grab web page and process to many useful formats for learning purpose.
Usage
npm i -g @semo/cli semo-plugin-read semo read [URL|本地 markdown] --format=[FORMAT] semo read [url] Parse and read a url or a md file with your favorate format. 选项: --format, -F Output format, use --available-formats to see all supported formats, default: markdown. [默认值: "markdown"] --clipboard Input from clipboard --proxy, -P Proxy images to prevent anti-hotlinking. --port Web server port. [默认值: 3000] --domain Set source input from which domain, without protocol and www. --open-browser, --open, -B Auto open browser. --clear-console, --clear, -C Auto open browser. --title Prepend title, use no-title to disable. --footer Append footer, use no-footer to disable. [默认值: true] --toc Include TOC --rename, -R New name, with extension. --output, -O Location
Extend plugin
There are 2 kinds of extensions, one is for defining formats, another one is for processing content. There are many extensions already under /packages
directory.
Define formats
hook_define_format: 'semo-plugin-read' async {}
Arguments:
- format: Semo read option for format
- title: Web page url
- markdown: Parsed Markdown
- converted: converted.content is the main part html of the page body
- argv: Semo's argv
** Domain's processing **
hook_domain: 'semo-plugin-read' html markdown
- html: original html
- markdown: parsed markdown
Examples
semo read https://juejin.im/post/5d82e116e51d453b7779d5f6semo read README.md --format consolesemo read --format=wechat # wechat format is defined by plugin semo run read URL --format=markdown # Semo can run read command in this way semo read --available-formats # Show all formats
Built-in formats
There are many format defined by read plugins, here only shows built-in formats.
markdown
ormd
: Convert web page to markdownconsole
: Output markdown to consoledebug
: Output parsed main page html, for debuging
Known bugs
mobi
plugin can not save remote images, we can first save toepub
format, then covert tomobi
usingebook-convert
command.- Ajax content do not support for now.
Contributions
PRs, Issues, Plugins are all welcome.
About Semo
semo
是这个插件的驱动,是我开发的一个命令行开发框架,是在开源项目 yargs
基础上做的封装,大家感兴趣的话可以移步这里和这里了解 更多。
Semo
is the core of this plugin, is a command line framework, based on yargs
. You can see more on https://semo.js.org and https://github.com/semojs
LICENSE
MIT