Pretext

Work in progress

The current implementation works, roughly, but doesn't quite follow all of the
design guidelines. Expect the language to change.

Better than Markdown

Pretext is a replacement for Markdown (the language). It doesn't try to offer
backward compatibility. It doesn't try to cater to every possible need under
the sky (except by being extensible). It just tries to be simple, easy, and do
its job well.

Features

Paragraphs are separated by empty lines, like in Markdown. When starting a
paragraph with a <, it's taken to be a literal HTML element and is passed as-is.

Weight

Pretext is small (around 1.7 kB minified + compressed, 8kB without) and the code
is comprehensibly and well-documented. (TODO at least that's the goal.)

Speed?

Pretext is "fast enough" (TODO test against marked)

Implemented in JavaScript, Node.js and the browser

Small footprint

How fast is it?

...

a single way to do everything
link syntax that isn't confusing (perhaps <link http://google.com/ This is google> or something like that)
/italics/ and bold instead of italics and bold
comes with a good plugin system
and a reasonable set of plugins for typography etc.
and for those who want things like <del>, <abbr>, <u> etc.
take all the good stuff from Markdown

Philosophy

Don't surprise the user.

The boundaries of Pretext should be as obvious as possible. If you know the
basics of Pretext, you should be immediately able to answer the question: "Can
you do this in Pretext?"

Either you can, or you can't. Usually you can't. In that case, you can always
fall back to the HTML. If Pretext fails your expectations often, it's easy to
write a plugin to fix the deficiency. Or find an existing one.

Example:

"I wonder if there is some syntax for data definition lists" – there isn't.
Use HTML.

"I wonder if I can add a target attribute to a link written in Pretext" – you
can't. Use HTML.

"I wonder if tables–" No. Use HTML.

If you keep falling back to plain HTML a lot of time, you can find a plugin or
write your own.

Examples of things that might surprise you

If _foo_ (?bar=1&zot=2) is escaped, why isn't <a href='?bar=1&zot=2'>foo</a>?

Consider the difference between these Markdown excerpts:

hello.c:
    int main() {
        printf("Hello world!\n");
    }

end

hello.c:    int main() {
        printf("Hello world!\n");
    }

It should be obvious that the latter one is what the user wanted. We arrive at
a maxim: inconspicuous changes in the source shouldn't cause conspicuous changes
in the output.

If the user starts a paragraph with an HTML element that looks like it starts
a paragraph, then it should start a paragraph, otherwise it should be a
standalone HTML element. (See the list of elements that allow you to skip a
</p>:
http://developers.whatwg.org/syntax.html#optional-tags)

If \ is used to quote things, it is to be used consistently.
* The general meaning: if a character has a special meaning in Pretext, and you
want to use that character literally, put a \ in front of it
* Inside , should it prevent things from being quoted? To highlight parts of the code, for example?


Trust the user.

If you write invalid Pretext, you'll get invalid markup. one _two three
four_._  But Pretext tries to make it obvious if you do.
It's not Pretext's job to save you from XSS attacks or other acts of
malevolence.  You should run the output through a filter if use Pretext for
untrusted inputs.
What you write should look like what you typically get.

This is the rationale for /italic/, bold, _links_, and the list syntax.
Of course you can 
Pretext should encourage you to write maintainable text.

The format should be as uniform as possible.  (When it comes to the ordered list
syntax, this is at odds with 
Pretext should be forgiving of user errors.

, without taking into account 
#
Don't take HTML semantics too seriously.

The current HTML standard
mandates that you should use <i> for "an alternate voice or mood, [or] a
technical term, [or lots of things that are typically italicized]".  <em>
should be used stress emphasis, which is typically italicized.  
Likewise – this is my interpretation – <b> should be used for text that
isn't particularly important but you still want it to needs out, and <strong>
should be used for things that are important and therefore needs to stand out.
Both are typically set in boldface.
Pretext doesn't really care which one you mean.  It generates <i> for /this/
and <b> for this by default.  And it talks about making things 'bold' and
'italic' instead of 'warranting attention without being especially important'
and 'offset from normal prose because it's in an alternate voice or mood or
because of some other aspect in which it's different', because it's easier.
To use <em> or <strong>, just use them as you would in HTML.
Code blocks are also simply wrapped inside <pre> instead of <pre><code>,
like the standard suggests.  This is mainly done to make the resulting HTML look
better: <pre> allows us to put a newline before the first line, so it doesn't
meddle with the content's formatting.  Then again, even though Pretext calls
them 'code blocks' instead of 'preformatted text', they are basically
preformatted text and you might want to use them for different purposes than
computer code.  It wouldn't be particularly semantic to wrap a printout or a
poem inside <code>, would it?
The user will encounter problems.  They should be easy to solve.

There should ideally be an intuitive solution to a problem.
Example: I want to quote 
Problems should be solvably first intuitive, googleable, or trying a couple of
different solution.  There should be only one obvious solution to a problem.
Examples of problems:

add a class attribute to a link 
use a different kind of link

Whenever there's a recurring problem, there should be a plugin to do it; if
there isn't, it should be relatively straightforward to write one.
There are roughly two kinds of problems:

the user wants to do something Pretext supports, but Pretext fails to deliver.
the user wants to do something Pretext doesn't support, 

Order of priorities


Extensible
Simple (in this order: simple to learn, simple to use, simple to extend)
Small
Fast

(1), (2) and (3) go hand in hand: in order to be small, it needs to be simple;
in order to be simple, it needs to be extensible.
Each time a concept or a building block is introduced to a system, you create a
set of expectations for the user.  To reduce user's mental burden, Pretext
introduces as few concepts as possible, and tries very hard not to break your
expectations.
For instance, the user expects \ to quote things where they need to be
quoted.  That means quoting special characters outside 
The user will encounter problems. They should be easy to solve.

Implementing plugins

Pretext processing is split into phases.  Phases are functions that take some
input and produce some output.
TODO document phases
TODO move this to later; this is just a shorthand for defining plugins on a fly.
Possibly demonstrate it like this:
pretext.before('all', function(input) { do something to input });

and then congratulate the reader of having written his first Pretext plugin.
If you need to do your processing before some other phase:
pretext.before(<phase>, function(input) { return output; });

If you need to do your processing after some other phase:
pretext.after(<phase>, function(input) { return output; });

To replace a phase:
pretext.replace(<phase>, function(input) { return output; });

To remove a phase:
pretext.remove(<phase>);

Inside phase functions, you can invoke other phases with:
output = this.<phase>(content)

this is the pretext object.
By default, Pretext attempts to inject your plugin immediately after or before a
phase.  In the case of multiple plugins, the last to come will win.
Plugins will themselves become phases once they are installed.
(Eventually there may be multiple constraints between different plugins and some
dependency analysis to determine a suitable order.)
You can use pretext directly.  In that case, it comes with the default settings:
var pretext = require('pretext');
console.log(pretext('# Hello'));

You can use install to install plugins.  Using a newly created instance is
recommended (but not enforced):
var Pretext = require('pretext').Pretext;
var pretext = new Pretext();
pretext.install('uglify');
pretext.install('sanitize');
pretext.install(require('pretext-newlines'));

Or, more succinctly:
var Pretext = require('pretext').Pretext;
var pretext = new Pretext('uglify', 'sanitize', require('pretext-newlines'));

A plugin is either a string (in which case it's looked up in the internal
plugins, of which there will be a few), or a function with an optional member
(one of after, before or replace) that tells the phase where it should be
plugged in.
The pretext-newlines plugin, for example, is defined as:
function newlines(text) {
    return text.replace('\n', '<br>');
}
// Or whatever 
newlines.after = 'filter';module.exports = newlines;

TODO some plugins may want to install multiple phases in different places.  What
then?
TODO some plugins may want to access a data object that collects data during the
processing.  Or is that actually necessary?  We could also do things like a
plugin that collects a list of figures from the text:
var figures = [];
pretext.after('filter', function collect(text) { // collect figures in figures });
pretext.after('all', function summarize(text) { // append list of figures at the end }); 

Ideally, the data would live as a variable inside the second phase, but it needs
to be readable by the first phase.  Perhaps allow this:

pretext.replace('all', function summarize(text) {
    var figures = [];
    function collect(text) {
        ... access figures ...
    }
    pretext.after('filter', collect);
    var result = inner.all()
    // append stuff to result` and return
});

Where pretext.after('filter', collect) will replace the current collect if
one was previously installed. Another question arises: should
pretext.replace('all') save the name of the old phase as an alias for the new
phase? Obviously others might refer to all and expect it to work.

Example: github markdown style fenced code blocks

Example: roman literals

Example: typography with options

Example: custom HTML elements (<sc> to generate better fake small caps;
to show a code block followed by its output)

Example: beautify / uglify HTML

To study

Markright http://blog.elliottcable.name/posts/markright.xhtml
Textile http://en.wikipedia.org/wiki/Textile_(markup_language)
txt2tags http://txt2tags.org/online.php
Github flavored markdown http://github.github.com/github-flavored-markdown/
The future of Markdown http://www.codinghorror.com/blog/2012/10/the-future-of-markdown.html
MultiMarkdown http://fletcherpenney.net/multimarkdown/
PHP Markdown Extra http://michelf.ca/projects/php-markdown/extra/
reStructuredText (though it's quite 90's) http://en.wikipedia.org/wiki/ReStructuredText