Pixiv novel Abstract Syntax Tree
pxast is a specification for representing pixiv novel format in a syntax tree. It implements unist.
This document defines a format for representing pixiv novel format as an abstract syntax tree. This specification is written in a Web IDL-like grammar.
If you are using TypeScript, you can use the unist types by installing them with npm:
npm install @rshirohara/pxast
interface Parent <: UnistParent {
children: [PxastContent]
}
Parent (UnistParent) represents an abstract interface in pxast containing other nodes (said to be children).
Its content is limited to only other pxast content.
interface Literal <: UnistLiteral {
value: string
}
Literal (UnistLiteral) represents an abstract interface in pxast containing a value.
Its value
field is a string
.
interface Root <: Parent {
type: "root"
}
Root (Parent) represents a document.
Root can be used as the root of a tree, never as a child.
interface Paragraph <: Parent {
type: "paragraph"
children: [PhrasingContent]
}
Paragraph (Parent) represents a unit of discourse dealing with a particular point.
Paragraph can be used where content is expected. Its content model is phrasing content.
For example, the following text:
たとえば私はこの文章を書く。
Yields:
{
type: "paragraph",
children: [{ type: "text", value: "たとえば私はこの文章を書く。" }]
}
interface Heading <: Parent {
type: "heading"
children: [InlinePhrasingContent]
}
Heading (Parent) represents a heading of a section.
Heading can be used where flow content is expected. Its content model is inline phrasing content.
For example, the following text:
[chapter:まえがき]
Yields:
{
type: "heading";
children: [{ type: "text", value: "まえがき" }];
}
interface Page <: Node {
type: "pageHeading"
pageNumber: 1 <= number
}
PageHeading (Node) represents a heading of a page.
PageHeading can be used where flow content is expected. It has no content model.
A pageNumber
field must be present.
A value of 1
is said to be the minimum value.
For example, the following text:
ここは一ページ目。
[newpage]
ここが二ページ目。
Yields:
{
type: "root",
children: [
{ type: "pageHeading", pageNumber: 1 },
{
type: "paragraph",
children: [{ type: "text", value: "ここは一ページ目。" }]
},
{ type: "pageHeading", pageNumber: 2 },
{
type: "paragraph",
children: [{ type: "text", value: "ここが二ページ目。" }]
}
]
}
interface Text <: Literal {
type: "text"
}
Text (Literal) represents everything that is just text.
Text can be used where phrasing content is expected.
Its content is represented by its value
field.
For example, the following text:
たとえば私はこの文章を書く。
Yields:
{ type: "text", value: "たとえば私はこの文章を書く。" }
interface Ruby <: Literal {
type: "ruby"
ruby: string
}
Ruby (Literal) represents a small annotations that are rendered above, below, or next to text.
Ruby can be used where phrasing content is expected.
Its content is represented by its value
and ruby
fields.
For example, the following text:
[[rb:私>わたし]]
Yields:
{
type: "ruby",
value: "私",
ruby: "わたし"
}
interface Break <: Node {
type: "break"
}
Break (Node) represents a line break.
Break can be used where phrasing content is expected. It has no content model.
For example, the following text:
これは一行目。
これが二行目。
Yields:
{
type: "paragraph",
children: [
{ type: "text", value: "これは一行目。" },
{ type: "break" },
{ type: "text", value: "これが二行目。" }
]
}
interface Link <: Parent {
type: "link"
url: string
children: [InlinePhrasingContent]
}
Link (Parent) represents a hyperlink.
Link can be used where phrasing content is expected. Its content model is inline phrasing content.
For example, the following text:
[[jumpurl:リンク例>https://example.com]]
Yields:
{
type: "link",
url: "https://example.com",
children: [{ type: "text", value: "リンク例" }]
}
interface Image <: Node {
type: "image"
illustId: string
pageNumber: 1 <= number?
}
Image (Node) represents a reference to pixiv image.
Image can be used where phrasing content is expected. It has no content model.
For example, the following text:
[pixivimage:000001-02]
Yields:
{
type: "image",
illustId: "000001",
pageNumber: 2
}
interface PageReference <: Node {
type: "pageReference"
pageNumber: 1 <= number
}
PageReference (Node) represents a reference to PageHeading.
PageReference can be used where phrasing content is expected. It has no content model.
A pageNumber
field must be present.
A value of 1
is said to be the minimum value.
For example, the following text:
[jump:01]
Yields:
{
type: "pageReference",
pageNumber: 1
}
type PxastContent = FlowContent | PhrasingContent
Each node in pxast falls into one or more categories of Content that group nodes with similar characteristics together.
type FlowContent = Heading | PageHeading | Paragraph
Flow content represent the sections of document.
type PhrasingContent = Break | Image | Link | PageReference | InlinePhrasingContent
Phrasing content represent the text in a document, and its markup.
type InlinePhrasingContent = Ruby | Text
Inline Phrasing content represent the text in a document, and its markup, that is intended to be stored in phrasing content.