MHTML Parser
A MHTML(.mht) file parser.
Only tested in .mht converted by Word 2016.
Installion
npm install mhtml-parser --save
Usage
Parse by filename (Asynchronous)
let parser = ;parser;
Parse by filename and read content manually (Only tested in Windows & Linux)
let parser = ;let fs = ;let fileName = 'image001.jpg'; parser;
Parse by string (Synchronous)
let parser = ;let data = parser;
In this mode, option.readMode
will always be constants.READ_MODE_ALL
.
Options
charset: String
Default Value: utf-8
To convert binary data to detected charset. Useless when readMode == READ_MODE_POSITION.
decodeQuotedPrintable: boolean
Default Value: false
To decode quoted-printable data, which is something like <a href=3D\"http://zsxsoft.com\">
. Useless when readMode == READ_MODE_POSITION.
See here: https://github.com/mathiasbynens/quoted-printable
readMode: string
Default Value: constants.READ_MODE_ALL
Avaiable items list here:
- READ_MODE_ALL - Read the whole file to the memory. You can directly get each file's content from
data.fileName.data
- READ_MODE_POSITION - Scan the whole file and only get the position and length of each file but not reading them.
Example Result
"Hey.htm": "name": "Hey.htm" "location": "file:///C:/B133AD19/Hey.htm" "encoding": "quoted-printable" "type": "text/html; charset=\"gb2312\"" "data": "<html xmlns:v=3D\"urn:schemas-microsoft-com:vml\"\nxmlns:o=3D\"urn:schemas-micr>\n<li.....n<p class=3DMsoNormal><span lang=3DEN-US><o:p> </o:p></span></p>\n\n</div>\n\n</body>\n\n</html>" "startPosition": 454 "bufferLength": 61846 "item0001.xml": "name": "item0001.xml" "location": "file:///C:/B133AD19/Hey.files/item0001.xml" "encoding": "quoted-printable" "type": "text/xml" "data": "<?xml version=3D\"1.0\" encoding=3D\"UTF-8\" standalone=3D\"no\"?><b:Sources xmln=\ns:b=3D\"http://schemas.openxmlformats.org/officeDocument/2006/bibliography\" =\nxmlns=3D\"http://schemas.openxmlformats.org/officeDocument/2006/bibliography=\n\" SelectedStyle=3D\"\\APASixthEditionOfficeOnline.xsl\" StyleName=3D\"APA\" Vers=\nion=3D\"6\"></b:Sources>" "startPosition": 62470 "bufferLength": 335 "image001.jpg": "name": "image001.jpg" "location": "file:///C:/B133AD19/Hey.files/image001.jpg" "encoding": "base64" "type": "image/jpeg" "data": "/9j/4AAQSkZJRgABBDAAo...+z6EMn3aKKKoD//Z" "startPosition": 68533 "bufferLength": 12297
TODO
License
The MIT License