parse-english
English language parser for retext producing nlcst nodes.
Install
npm:
npm install parse-english
Use
var inspect = var English = var tree = console
Yields:
RootNode[1] (1:1-1:63, 0-62)└─ ParagraphNode[1] (1:1-1:63, 0-62) └─ SentenceNode[23] (1:1-1:63, 0-62) ├─ WordNode[2] (1:1-1:4, 0-3) │ ├─ TextNode: "Mr" (1:1-1:3, 0-2) │ └─ PunctuationNode: "." (1:3-1:4, 2-3) ├─ WhiteSpaceNode: " " (1:4-1:5, 3-4) ├─ WordNode[1] (1:5-1:10, 4-9) │ └─ TextNode: "Henry" (1:5-1:10, 4-9) ├─ WhiteSpaceNode: " " (1:10-1:11, 9-10) ├─ WordNode[1] (1:11-1:16, 10-15) │ └─ TextNode: "Brown" (1:11-1:16, 10-15) ├─ PunctuationNode: ":" (1:16-1:17, 15-16) ├─ WhiteSpaceNode: " " (1:17-1:18, 16-17) ├─ WordNode[1] (1:18-1:19, 17-18) │ └─ TextNode: "A" (1:18-1:19, 17-18) ├─ WhiteSpaceNode: " " (1:19-1:20, 18-19) ├─ WordNode[1] (1:20-1:27, 19-26) │ └─ TextNode: "hapless" (1:20-1:27, 19-26) ├─ WhiteSpaceNode: " " (1:27-1:28, 26-27) ├─ WordNode[1] (1:28-1:31, 27-30) │ └─ TextNode: "but" (1:28-1:31, 27-30) ├─ WhiteSpaceNode: " " (1:31-1:32, 30-31) ├─ WordNode[1] (1:32-1:40, 31-39) │ └─ TextNode: "friendly" (1:32-1:40, 31-39) ├─ WhiteSpaceNode: " " (1:40-1:41, 39-40) ├─ WordNode[1] (1:41-1:45, 40-44) │ └─ TextNode: "City" (1:41-1:45, 40-44) ├─ WhiteSpaceNode: " " (1:45-1:46, 44-45) ├─ WordNode[1] (1:46-1:48, 45-47) │ └─ TextNode: "of" (1:46-1:48, 45-47) ├─ WhiteSpaceNode: " " (1:48-1:49, 47-48) ├─ WordNode[1] (1:49-1:55, 48-54) │ └─ TextNode: "London" (1:49-1:55, 48-54) ├─ WhiteSpaceNode: " " (1:55-1:56, 54-55) ├─ WordNode[1] (1:56-1:62, 55-61) │ └─ TextNode: "worker" (1:56-1:62, 55-61) └─ PunctuationNode: "." (1:62-1:63, 61-62)
API
parse-english
has the same API as parse-latin
.
Algorithm
All of parse-latin
is included, and the following support for the
English natural language:
- Unit abbreviations (
tsp.
,tbsp.
,oz.
,ft.
, and more) - Time references (
sec.
,min.
,tues.
,thu.
,feb.
, and more) - Business Abbreviations (
Inc.
andLtd.
) - Social titles (
Mr.
,Mmes.
,Sr.
, and more) - Rank and academic titles (
Dr.
,Rep.
,Gen.
,Prof.
,Pres.
, and more) - Geographical abbreviations (
Ave.
,Blvd.
,Ft.
,Hwy.
, and more) - American state abbreviations (
Ala.
,Minn.
,La.
,Tex.
, and more) - Canadian province abbreviations (
Alta.
,Qué.
,Yuk.
, and more) - English county abbreviations (
Beds.
,Leics.
,Shrops.
, and more) - Common elision (omission of letters) (
’n’
,’o
,’em
,’twas
,’80s
, and more)