pretext

Pretext, a simple Markdown-inspired markup language

Pretext

The current implementation works, roughly, but doesn't quite follow all of the design guidelines. Expect the language to change.

Pretext is a replacement for Markdown (the language). It doesn't try to offer backward compatibility. It doesn't try to cater to every possible need under the sky (except by being extensible). It just tries to be simple, easy, and do its job well.

Paragraphs are separated by empty lines, like in Markdown. When starting a paragraph with a <, it's taken to be a literal HTML element and is passed as-is.

Pretext is small (around 1.7 kB minified + compressed, 8kB without) and the code is comprehensibly and well-documented. (TODO at least that's the goal.)

Pretext is "fast enough" (TODO test against marked)

...

  • a single way to do everything
  • link syntax that isn't confusing (perhaps <link http://google.com/ This is google> or something like that)
  • /italics/ and *bold* instead of *italics* and **bold**
  • comes with a good plugin system
  • and a reasonable set of plugins for typography etc.
  • and for those who want things like <del>, <abbr>, <u> etc.
  • take all the good stuff from Markdown

The boundaries of Pretext should be as obvious as possible. If you know the basics of Pretext, you should be immediately able to answer the question: "Can you do this in Pretext?"

Either you can, or you can't. Usually you can't. In that case, you can always fall back to the HTML. If Pretext fails your expectations often, it's easy to write a plugin to fix the deficiency. Or find an existing one.

Example:

"I wonder if there is some syntax for data definition lists" – there isn't. Use HTML.

"I wonder if I can add a target attribute to a link written in Pretext" – you can't. Use HTML.

"I wonder if tables–" No. Use HTML.

If you keep falling back to plain HTML a lot of time, you can find a plugin or write your own.

If _foo_ (?bar=1&zot=2) is escaped, why isn't <a href='?bar=1&zot=2'>foo</a>?

Consider the difference between these Markdown excerpts:

`hello.c`:
    int main() {
        printf("Hello world!\n");
    }

end

`hello.c`:
int main() {
    printf(&quot;Hello world!\n&quot;);
}</code></pre>

It should be obvious that the latter one is what the user wanted. We arrive at a maxim: inconspicuous changes in the source shouldn't cause conspicuous changes in the output.

  • If the user starts a paragraph with an HTML element that looks like it starts a paragraph, then it should start a paragraph, otherwise it should be a standalone HTML element. (See the list of elements that allow you to skip a </p>: http://developers.whatwg.org/syntax.html#optional-tags)
  • If \ is used to quote things, it is to be used consistently. * The general meaning: if a character has a special meaning in Pretext, and you want to use that character literally, put a \ in front of it * Inside `, should it prevent things from being quoted? To highlight parts of the code, for example?

If you write invalid Pretext, you'll get invalid markup. *one _two three* four_._ But Pretext tries to make it obvious if you do.

It's not Pretext's job to save you from XSS attacks or other acts of malevolence. You should run the output through a filter if use Pretext for untrusted inputs.

This is the rationale for /italic/, *bold*, _links_, and the list syntax.

Of course you can

The format should be as uniform as possible. (When it comes to the ordered list syntax, this is at odds with

, without taking into account

#

The current HTML standard mandates that you should use <i> for "an alternate voice or mood, [or] a technical term, [or lots of things that are typically italicized]". <em> should be used stress emphasis, which is typically italicized.

Likewise – this is my interpretation – <b> should be used for text that isn't particularly important but you still want it to needs out, and <strong> should be used for things that are important and therefore needs to stand out. Both are typically set in boldface.

Pretext doesn't really care which one you mean. It generates <i> for /this/ and <b> for *this* by default. And it talks about making things 'bold' and 'italic' instead of 'warranting attention without being especially important' and 'offset from normal prose because it's in an alternate voice or mood or because of some other aspect in which it's different', because it's easier.

To use <em> or <strong>, just use them as you would in HTML.

Code blocks are also simply wrapped inside <pre> instead of <pre><code>, like the standard suggests. This is mainly done to make the resulting HTML look better: <pre> allows us to put a newline before the first line, so it doesn't meddle with the content's formatting. Then again, even though Pretext calls them 'code blocks' instead of 'preformatted text', they are basically preformatted text and you might want to use them for different purposes than computer code. It wouldn't be particularly semantic to wrap a printout or a poem inside <code>, would it?

There should ideally be an intuitive solution to a problem.

Example: I want to quote Problems should be solvably first intuitive, googleable, or trying a couple of different solution. There should be only one obvious solution to a problem.

Examples of problems:

  • add a class attribute to a link
  • use a different kind of link

Whenever there's a recurring problem, there should be a plugin to do it; if there isn't, it should be relatively straightforward to write one.

There are roughly two kinds of problems:

  • the user wants to do something Pretext supports, but Pretext fails to deliver.
  • the user wants to do something Pretext doesn't support,

  1. Extensible
  2. Simple (in this order: simple to learn, simple to use, simple to extend)
  3. Small
  4. Fast

(1), (2) and (3) go hand in hand: in order to be small, it needs to be simple; in order to be simple, it needs to be extensible.

Each time a concept or a building block is introduced to a system, you create a set of expectations for the user. To reduce user's mental burden, Pretext introduces as few concepts as possible, and tries very hard not to break your expectations.

For instance, the user expects \ to quote things where they need to be quoted. That means quoting special characters outside

Pretext processing is split into phases. Phases are functions that take some input and produce some output.

TODO document phases

TODO move this to later; this is just a shorthand for defining plugins on a fly. Possibly demonstrate it like this:

pretext.before('all', function(input) { do something to input });

and then congratulate the reader of having written his first Pretext plugin.

If you need to do your processing before some other phase:

pretext.before(<phase>, function(input) { return output; });

If you need to do your processing after some other phase:

pretext.after(<phase>, function(input) { return output; });

To replace a phase:

pretext.replace(<phase>, function(input) { return output; });

To remove a phase:

pretext.remove(<phase>);

Inside phase functions, you can invoke other phases with:

output = this.<phase>(content)

this is the pretext object.

By default, Pretext attempts to inject your plugin immediately after or before a phase. In the case of multiple plugins, the last to come will win.

Plugins will themselves become phases once they are installed.

(Eventually there may be multiple constraints between different plugins and some dependency analysis to determine a suitable order.)

You can use pretext directly. In that case, it comes with the default settings:

var pretext = require('pretext');
console.log(pretext('# Hello'));

You can use install to install plugins. Using a newly created instance is recommended (but not enforced):

var Pretext = require('pretext').Pretext;
var pretext = new Pretext();
pretext.install('uglify');
pretext.install('sanitize');
pretext.install(require('pretext-newlines'));

Or, more succinctly:

var Pretext = require('pretext').Pretext;
var pretext = new Pretext('uglify', 'sanitize', require('pretext-newlines'));

A plugin is either a string (in which case it's looked up in the internal plugins, of which there will be a few), or a function with an optional member (one of after, before or replace) that tells the phase where it should be plugged in.

The pretext-newlines plugin, for example, is defined as:

function newlines(text) {
    return text.replace('\n', '<br>');
}

// Or whatever newlines.after = 'filter';

module.exports = newlines;

TODO some plugins may want to install multiple phases in different places. What then?

TODO some plugins may want to access a data object that collects data during the processing. Or is that actually necessary? We could also do things like a plugin that collects a list of figures from the text:

var figures = [];
pretext.after('filter', function collect(text) { // collect figures in `figures` });
pretext.after('all', function summarize(text) { // append list of figures at the end }); 

Ideally, the data would live as a variable inside the second phase, but it needs to be readable by the first phase. Perhaps allow this:

pretext.replace('all', function summarize(text) {
    var figures = [];
    function collect(text) {
        ... access `figures` ...
    }
    pretext.after('filter', collect);
    var result = inner.all()
    // append stuff to `result` and return
});

Where pretext.after('filter', collect) will replace the current collect if one was previously installed. Another question arises: should pretext.replace('all') save the name of the old phase as an alias for the new phase? Obviously others might refer to all and expect it to work.

Example: github markdown style fenced code blocks

Example: roman literals

Example: typography with options

Example: custom HTML elements (<sc> to generate better fake small caps; to show a code block followed by its output)

Example: beautify / uglify HTML

  • Markright http://blog.elliottcable.name/posts/markright.xhtml
  • Textile http://en.wikipedia.org/wiki/Textile_(markup_language)
  • txt2tags http://txt2tags.org/online.php
  • Github flavored markdown http://github.github.com/github-flavored-markdown/
  • The future of Markdown http://www.codinghorror.com/blog/2012/10/the-future-of-markdown.html
  • MultiMarkdown http://fletcherpenney.net/multimarkdown/
  • PHP Markdown Extra http://michelf.ca/projects/php-markdown/extra/
  • reStructuredText (though it's quite 90's) http://en.wikipedia.org/wiki/ReStructuredText