Readme.gdtxt

«front-matter
|author: Gautam Dey <gautam.dey77@gmail.com>
|date: 10 Oct 2015
|title: My simple document format.
;
»

§ Introduction
§§ What is gdtxt

gdTxt ([_ good enough text format _]) is a plain text format for writing structured documents, based on my
experiences using email, and posting on various sites on the internet using
markdown. This format is my attempt to simplify and remove many of the redundancies in markdown. I plan to provide a full specification and a set of reference documents.  The documents will be encoded in [[UTF8][http://utf-8.com]], allowing us to remove the restriction of limiting ourselves to only ASCII characters.

The core idea is that there should be only one representation per concept. Reducing the number of things one has to remember; making the format (in my opinion) easier to use.

§§ Why a spec

The purpose of the specification is to document the format and provide an authoritative answer to how the format behaves unambiguously. If there are any ambiguous parts, those should be considered a [[bug][report-bug]].

§ Preliminaries

§§ Characters and lines

Any sequence of characters is a valid gdTxt document.

A [#[character]] is a Unicode point. Some code points (i.e.,
combining accents) do not correspond to characters in an intuitive sense, all
code points count as characters for this spec.

All gdTxt documents are encoded using [[ utf8 ][http://utf-8.com]].

A [#[ line ]] is a sequence of zero or more characters other than [@ newline ] or
[@ carriage-return ], followed by a [[ line ending ][line-ending]] or by the end
of a file.

«note;

It is preferred that all documents include an extra [[blank line][blank-line]] at
the end of the document

»

A [#[ line ending ][line-ending]] is a [@ newline ].

A line containing no characters, or a line containing only [@ spaces ] or
[@ tabs ], is called a [#[ blank line ][blank-line]]

The following definitions of characters classes will be used in this spec:

A [[ whitespace character ][whitespace-character]] is a [@ space ], [@ tab ],
[@ newline ], [@ line-tabulation ], [@ form-feed ], or [@ carriage-return ].

[#[ Whitespace ]] is a sequence of one or more
[[ whitespace characters ][whitespace-characters]]

A [#[Unicode whitespace character][unicode-whitespace-character]] is any code
point in the Unicode [* Zs *] class, or [@ tab ], [@ carriage-return ],
[@ newline ], or [@ form feed ].

A [#[ space ]] is [@ space-unicode ]

A [#[ non-whitespace character ][non-whitesapce-character]] is any character that
is not a [[ whitespace-character ]].

An [#[ ASCII punctuation character ]] is:
[* ! *], [* " *], [* # *], [* $ *], [* % *], [* ' *], [* ( *], [* ) *], [* * *],
[* + *], [* , *], [* - *], [* . *], [* / *], [* : *], [* ; *], [* < *], [* = *],
[* > *], [* ? *], [* @ *], [* [ *], [* ~ *], [* ] *], [* ^ *], [* _ *], [* ` *],
[* { *], [* | *], [* } *], or [* \\ *]

a [#[ punctuation character ]] is an [[ ASCII punctuation character ]] or anything in
the Unicode classes [* Pc *], [* Pd *], [* Pe *], [* Pf *], [* Po *], or
[* Ps *].

§§ Tabs

Tabs in lines will not be expanded into spaces.

§§ Insecure characters

For security reasons, the Unicode characters [* U+0000 *] [* MUST *] be replaced with the replacement character ([* U+FFFD *]).

§ Inline, Line, and Blocks

§§ Inline

Inline statements are statements that can be placed in line. There should not be any
newline characters inside of an inline statement.

Current inline-elements are:

• emphasize  : \[* *\]
• strong    : \[** **\]
• strike-through : \[– –\]
• underline : \[_ _\]
• inline call-out: \[` `\]
• inline quote: \[" "\]
• inline raw string [\ [\\\\] \]
     no need to escape anything other then \.
     Also, note that leading and trailing spaces are not trimmed
• unicode character: \[u0020\]
    the value is in hex
• inline code: \[(go:var) corpus \]
    the first element in the prens is the language
    if there is a colon following the language it will be the language
    element type. This is optional. Default is left to the implementation
    but should be something that makes sense for that language.
• reference : \[\[ text \]\[ reference \]\]
    [[see references for greater detail][references]]
• anchor    : \[#\[ text \]\[ reference \]\]
• insert    : \[@ reference \]
    Insert will insert the value of the reference into the space. 
    
§§§ [#[ References ][ references ]]

A reference is an identifier that can is made up of Letters followed by letters, numbers, [* _ *], [* - *], or [* . *]


§§ Line

Line statements start at the start of a line with an identifier. 


# Section line:

    A section line starts with ([* U+00A7 *]) and is followed by the title of the section.
    The number of sections markers identified the section level, up to six.

# ordered list:

    An ordered list starts with # [* (U+0023) *] and is followed the list item. 
    The list item can have one or more paragraphs or block elements associated with it. 
    Each paragraph or block element is separated by a blank line or 
    another list element. 

    Sub ordered lists are delimited by add additional [* # *] symbols.

    Two blank lines terminate the list, or encountering a section header,
    or a different list type.

    «Note;
        One can, also, use 1. and 1.1. etc.. Need to write this part out.
    »

# unordered list:

    An ordered list is a line that starts with • and is followed by the title of the list item. The list item can have one or more paragraphs or other block elements associated with it. Additional levels are the number of • characters in the front of the list.

    Besides if a “•” is followed by [] or [x] it should be treated as a check
    list.

# horizontal line:

    Three or more “—” at the start of a line will create a horizontal line. Text after the line-markers are ignored.


§§§ Blocks

Every block starts with ‘«’ followed by the block type. Each block has three
parts.

The first part is the block type. This part ends with the first header separator
| or the header block end ‘;’.

The second part is called the header; this is a simple key-value structure. Each
key is separated from the value via a ‘:’; each key-value pair is separated with a
'|'. The header part of a block is terminated with a ';'.

The final part of a block is the body. It is everything till the closing [* » *].

• Quote :
    A Quote block allows you to quote some piece of text.
«code
| lang: gdtxt
;
    «quote
    ;
    This is a quote
    »
»

• Code :
    Allow you insert code, provides syntax highlighting and numbering.
    Code blocks everything inside of the body part.
«code
| lang: gdtxt
;
    «code
    | lang : go
    ;
    package main
    import ( "fmt" )
    func main(){
        fmt.Println("Hello world");
    }
    »

»

• Front-matter :
    This element is metadata for the document. There should be only one of these.
    One should merge all of them together, with each merge overwriting other elements. The processor should emit a warning each time an overwrite
    occurs.

    Known attributes:
    «block;
    • Author: The author is the name of the author in email format i.e.
        “display name <email@address>”
    • Date: The date the article was written.
    • Title: The title of the document. Only the first 160 characters will be taken.
    »

«code
| lang: gdtxt
;
    «front-matter
    | author : Guatam Dey <gautam.dey77@gmail.com>
    | date   : 10 Oct 2015
    | title  : My simple document format.
    ;
    »

»

• block :
    A block block is a block that allows you to enter some gdTxt text as a
    paragraph. 

• include :
    An include block allows you to add another file or URL, that will be
    treated as a gdtxt file and then included in a set of rendered paragraphs
    into the section.

• comment :
    The render ignores a comment block. It is there to add commentary to the document.

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––

• foo bar
•• bar
    This is a paragraph

    This is another paragraph
    «quote
    | attribute: Someone
    ;
    The greatest things in life cost something.
    »

•• baz


# Foo
## Baz
#• Bar

1. Foo
1.1. baz
1.• bar


§ Section and headers


Sections are delimited using ‘$’ or ‘§’ with no white spaces at the beginning of the
line.  There are 6 six section headers.


§ Paragraphs


Paragraphs are consecutive lines of text separated by a [[blank line][blank-line]].


§ Blocks
§§ Quote


«code
|gdTxt
;
    «quote
    |no-strip
        ;
            Some text here.
            Blocks can be nested. for quote block.s
        «quote;
        This would be the double quote.
        »
        »
»

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
«refs
| newline         : newline ([*U+000A*])
| carriage-return : carriage return ([*U+000D*])
| space-unicode   : ([*U+0020*])
| space           : space [@space-unicode]
| spaces          : spaces [@space-unicode]
;»
«refs
| report-bug: https://github.com/gdey/gdtxt/issues
;»