Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: explicit syntax for custom tags #240

Open
matklad opened this issue Aug 20, 2023 · 28 comments · May be fixed by jgm/djot.js#89
Open

Proposal: explicit syntax for custom tags #240

matklad opened this issue Aug 20, 2023 · 28 comments · May be fixed by jgm/djot.js#89

Comments

@matklad
Copy link
Contributor

matklad commented Aug 20, 2023

This proposal is a synthesis of #239 and #146 and organized in TL;DR, What? and Why? sections, where the Why? is the most important.

TL;DR

Change djot such that the following input:

Shortcut: :kbd[Ctrl+C]

::: details
Copies text
:::

produces the following HTML:

<p>
    Shortcut: <kbd>Ctrl+C</kbd>
</p>

<details>
    <p>Copies text</p>
</details>

What?

Specifically:

  1. Change the Ast for div and span to be
 interface Div extends HasAttributes {
   tag: "div";
+  tag_name: "string"
   children: Block[];
 }

 interface Span extends HasAttributes {
   tag: "span";
+  tag_name: "string"
   children: Inline[];
 }
  1. Change the parsing rule for ::: spam to use "spam" for tag_name, rather than a class.

  2. Changing parsing rules for bare ::: and [] to set tag_name to "div" and "span",
    respectively.

  3. Add new concrete syntax :tag-name[], that is, :(\S+)\[ where $1, an arbitrary sequence of non-whitespace symbols, is a tag_name, and the rest is the usual span syntax. This concrete syntax produces a Span AST with the corresponding tag_name set.

  4. Change default HTML renderer to use tag_name when rendering span and div elements.

The most invasive change here is 4, as it adds a bit of new syntax to djot and directly enlarges the surface area.

Why?

This single solution fixes several "problems" in the current version of djot, some big an some small. I list them roughly in order of priority:

Problem: users need a lightweight approach for producing custom HTML interspersed with normal djot.

Today, djot provides a ``` =HTML syntax to embedded raw HTML (or any other format). The problem here is that its all-or-nothing: everything inside =HTML needs to be HTML. You can't use that to wrap a part of a djot document into a custom tag:

This is Djot!

``` =HTML
<details>
    This *isn't* Djot :sob:
</details>
```

This is solvable by using a custom filter/renderer, but that's a significant step up in complexity, and might not be available to the user (e.g., a forum software using Djot for comments could alow raw HTML(with sanitization), but won't allow custom filters). In a more ad-hoc way, it's possible to split the raw block in two

This is Djot!

``` =HTML
<details>
```

Ok,  this *is* Djot :weary:

``` =HTML
</details>
```

but that's not quite as pretty as some might want!

With the proposed solution, the above can be written simply as

::: details
This *is still* Djot :smile:
:::

Naturally, = HTML doesn't go away: that's still the right tool for raw HTML, but we now gain a way to add HTML-Djot sandwiches.

Note that while I say HTML, this feature applies to any roughly XML-shaped output format. For example, a docbook renderer could use that to emit arbitrary docbook elements, and a LaTeX renderer could emit a

\begin{environment}

\end{environment}

pair.

Problem: extensibility properties of Djot are not obvious and need better explanation.

The core feature of Djot is that its syntax is fixed, but it is still extensible because the syntax is flexible enough to encode arbitrary attributed trees which could be interpreted specially by the renderers. This is a somewhat subtle and non-obvious point, and may not be immediately clear to the new users.

With this proposal, Djot gains an explicit first-class syntax for custom elements. We can clearly document that ::: plugin and :plugin[] is how one extends Djot. In terms of expressive power, this is exactly equivalent to []{.plugin} of course, but is easier to explain and search for.

Overloading .class syntax to mean custom tags/elements is harder to teach.

Problem: it's impossible to express arbitrary HTML in a Djot filter.

Djot has two programmatic extensibility mechanisms:

  • filters transform Djot AST to another Djot AST
  • renderers transform Djot AST to the target format, such as HTML

Filters are generally nicer, they are target-format-independent and composable (you can chain several filters together, because input and output have the same type). However, you can't use a filter to emit an HTML node not already used by a renderer, unless you resort to raw half-nodes, which is ugly, and output-format specific.

With this proposal, filters gain full power of HTML, while keeping a nice, well-typed tree structure. Fewer things need to be custom renderers, more things can can be filters.

Problem: the ::: spam syntax is not orthogonal

In today's Djot, the following two are equivalent:

::: spam
:::

{.spam}
:::
:::

In the following example, both classes are on equal footing semantically, although syntactically one feels like it should be the primary:

{.spam}
::: eggs
:::

The proposal fixes makes the syntaxes orthogonal by adding a new dimension. ::: spam is no longer a class, it is a tag name.

Problem: when reading custom elements existing "introducer last" syntax requires the reader to backtrack.

Consider a custom element in today's djot: [Ctrl+C]{.kbd}. Here, the + would be interpreted specially by the renderer as a notation for shortcuts. However, if you read this left-to-right, you need to look ahead to {.kbd} to get the context for interpreting the +.

In the proposal, this looks like :kbd[Ctrl+C] --- introducer keyword, kbd, is leading, so a one-pass left-to-right visual scan tells you everything.

Problem: smarter editors and IDEs need to know context to provide helpful suggestions.

Let's say you added a custom citation element to Djot, which looks like [foo, p. 15]{.cite}. A smart editor should be able to auto-complete foo from your references library, but, if you are typing this left-to-write, by the time you get to [foo] IDE doesn't yet know that it's going to be a cite.

With the proposal, as soon as you've typed :c, the IDE can suggest auto-completing that to :cite[] and then show completion list for actual citations.

@Omikhleia
Copy link

Omikhleia commented Aug 20, 2023

Djot was beyond Markdown, keeping its legacy:

{.myblock}
:::
This is _Djot_{.underline}

 - Apples. 
 - Oranges.

> Quote
:::

This proposal opens Pandora's box:

{.myblock}
:::: div
::: p
Are we saying this would be the new em:[u:[Djot{.dubious}]]?
:::
::: ul
li:[Apples.]
li:[Oranges.]
:::
::: blockquote
Quote
:::
::::

And I wonder what a LaTeX renderer (say, or SILE) would then do. Have to support div, p, ul etc. environments and commands, or magically recognize a subset of the HTML (and why not, DocBook or any random schema) tag set to map them to appropriate commands?

I am afraid the problems supposedly solved might be worse. Or did I miss something?

@matklad
Copy link
Contributor Author

matklad commented Aug 20, 2023

And I wonder what a LaTeX renderer (say, or SILE) would then do.

There's no any special handling of tags. For example, the SILE renderer would do exactly what SILE XML Flavor would do, namely, interpret the document as

\begin[class=myblock]{div}
\begin{p}
Are we saying this would be the new \em{\u{\span[class=dubious]{Djot}}}?
\end{p}
\end{div}

This might, or might not produce a valid SILE document, depending on which custom SILE commands the user has defined.

Stated positively, the user gains access to all their pre-existing custom SILE commands without having to define custom Djot renders or filters. So, if the user has

\define[command=red]{\color[color=red]{\process}}

defined, they can use

Making things red is a red:[silly] way to emphasise text.

in their djot

@Omikhleia
Copy link

Omikhleia commented Aug 20, 2023

Well I am afraid I have to disagree on everything then...

  • This makes Djot just a replacement for random XML with a different syntax... One can invent a Djot/Markdown-inspired syntax for arbitrary XML, sure, but it's no longer the same thing.
  • This is fragmenting the portability of input files (so to keep this "div" example, one has to provide an implementation of it for LaTeX, SILE, and any other target renderer he might consider using at some point?) and the possibility for conversion tools to operate gracefully.

(I don't think it's the place to discuss the SILE examples, but the SIL language should be completely avoidable, and the user shouldn't need custom commands to do this kind of things. Styles are a better paradigm with a nicer separation of concerns)

@Omikhleia
Copy link

Omikhleia commented Aug 20, 2023

Still, an additional comment though:

The user gains access to all their pre-existing custom SILE commands without having to define custom Djot renders or filters.

Would the user really want to do this, with markdown.sile they don't need to define custom Djot renders or filters, indeed. The following works:

``` =sile
% Or defined in Lua with =sile-lua, or implemented elsewhere in a class, package, wrapper document, your call.
\define[command=red]{\color[color=red]{\process}}
```
Making things red is a [silly]{custom-style="red"} way to emphasize text.

And all other things equal, it does work identically whether the input is a Markdown file, a Djot file, or a Pandoc JSON AST1.
(Not that I really recommend this, but it's already available2, though again I would recommend using styles rather than direct commands)

Footnotes

  1. For the rare cases (now) where some syntax extension not covered by the native implementation would be needed. Tables in some other format than the "pipe tables" in (extended-)Markdown, for instance.

  2. EDIT: And, before someone asks, I favored the custom-style key trick over a class attribute there indeed: influenced, for what is worth, by what Pandoc does with Word docx conversion -- so that defining a "red" character style in a Word reference document and converting to docx with Pandoc should indeed then also work as intended.)

@jgm
Copy link
Owner

jgm commented Aug 20, 2023

This is an interesting and well thought-out proposal. It does go in a somewhat different direction than I'd originally had in mind, but I see its good points.

The original conception was that if you wanted to do something like <details>, you'd simply write

::: details
## This is the summary.

And here's the rest.
:::

and then make use of a filter that replaces this with AST nodes including the raw HTML <details>, <summary>, etc. It's true that the filter needs to be format-specific -- though in pandoc at least, filters can conditionalize on the output format (I forget whether we built that into djot.js).

This proposal would allow you to do

:::: details
::: summary
This is the summary
:::

And here's the rest
::::

which is a bit more verbose and relies more on English keywords, but it would work out of the box without filters.

The proposed change would be breaking for existing djot documents that used

::: classname
...
:::

but maybe that is okay as the language is still in an experimental phase.

The proposed change would make the djot AST less compatible with the pandoc AST (which doesn't have a notion of "tag name"), and this would make pandoc interoperability less smooth.

In general I don't like to rely on English language keywords. Perhaps one could work around that, though, by introducing the concept of a "tag dictionary" that allows you to define your own aliases for tag names?

If we did implement the prefix :defn[] style notation, it would be good to impose some restrictions on the characters allowed in tag names and also a length restriction, to keep parsing fast.

You are right that allowing a special name for spans restores symmetry with what we now have for divs. However, there's also a question of symmetry with verbatim containers (code spans and code blocks). For example, in LaTeX you might want

``` tikz
arrow(whatever) -> node(thing)
```

to produce a tikz environment instead of verbatim. But doing this automatically conflicts with the role we've given to this position for specifying the "language." There's also a question whether code spans should have something similar? :kbd`*3b*`

As for syntax, I fear that the tag name in :tag[...] looks a bit too much symbol syntax (just missing the final :). Of course, we could remove that problem by just using a symbol for this purpose: :tag:[...], but this might not be ideal. Another option could be !tag[...] which is reminiscent of the image syntax.

@matklad
Copy link
Contributor Author

matklad commented Aug 20, 2023

@Omikhleia

This is fragmenting the portability of input files

Yeah, that's the big thing here! One can view Djot as eihter:

  1. a relatively self-contained markup language for documents with a closed set of syntactic constructs
  2. or an open-ended constructor for domain-specific formats

This proposal pushes us more towards the second interpretation (but note that they are not mutually exclusive --- some people may use djot as 1, and some might use it as 2)

As you've rightfully notice, everything expressible with this proposal is already possible with custom attributes and classes, the "custom tags" thing just basically formalizes this pattern.

And that nicely segues in @jgm first point! Even under this proposal I would expect people to write

::: details
## This is the summary.

And here's the rest.
:::

and handle this as a filter by default. The "raw html" mode I think is needed solely as an escape hatch.

However, under the new proposal its syntactically apparent that ::: details is some custom element. In the status quo with using "magical" classes, it's less clear whether that's indeed a custom element, or just a pure-style .class.

That's probably what I like aesthetically most here --- that we clearly separate the "semantics" attribute from the style ones (including adding invariant that there's at most one custom tag, but many classes).

relies more on English keywords

I was under the impression that we already don't restrict class names and such to be English, but apparently that's not the case. It feels a bit strange that the following is parsed differently

x{.foo} x{.бар}

I would say if we are fine with class names being English, we should be fine with tag-names being English also (but it might be a good idea to include some quoted syntax then just in case, eg ::: "бар" to be analogous to {class="бар"}).

but maybe that is okay as the language is still in an experimental phase.

FWIW, this is something that worries me quite a bit. The page https://djot.net doesn't say that Djot is in an experimental phase, and makes it look like its quite finished. Ideally, we'd be more clear with communicating our stability promise.

As for syntax, I fear that the tag name in :tag[...] looks a bit too much symbol syntax (just missing the final :)

Yeah, I think syntactically the salient bits are that:

  • there's a dedicated place for a single name, which is different from potentially repeated class names.
  • the name goes before the element

As for particular syntax, !tag[ definitely works!

@jgm
Copy link
Owner

jgm commented Aug 20, 2023

I don't think there was any intention to exclude non-English class names! If we do it seems like a bug. The attribute grammar in attributes.ts does say that keywords need to be ascii, but not classes or identifiers.

@bpj
Copy link

bpj commented Aug 21, 2023

See also #197 and #192 where I proposed another use for ::: tag, namely to provide "hints" for the parser.

I'm thus all for storing these "tags" specially in the AST. What worries me is that this proposal seems very HTML-centric for such a "central" syntax feature. I think it is important that djot is output-format agnostic, not favoring any one output format. While I do not yet use djot for real (the lack of a metadata — and other data in the spirit of #192 — syntax which is interoperable with Pandoc is the main show stopper for me) I really like most of the syntax features where djot differs/adds to Markdown, but my typical target format is PDF via LaTeX. If this means "tags" are stored separately in the ast and can be used for anything by parsers, filters and renderers I'm all for. If this means that "tags" become unusable unless you target HTML/XML, or even djot gets tied to those formats I'm actually worried!

@matklad
Copy link
Contributor Author

matklad commented Nov 15, 2023

As a data point, someone laments the inability to create HTML/djont sandwiches without writing custom filters:

https://lobste.rs/s/wrksua/data_oriented_blogging#c_pzqjot

@iacore
Copy link

iacore commented May 28, 2024

I find this very useful as well. Most notably, the <details><summary> combo.

@iacore iacore linked a pull request May 28, 2024 that will close this issue
@irskep
Copy link

irskep commented Aug 16, 2024

I would like to mention an additional use case for this feature and strongly support it.

There is a constellation of technical documentation tools, the most common of which is Sphinx. One of the reasons it's been so successful is that reStructuredText is an arbitrarily extensible markup language. The community finally found a way to bolt Markdown onto it, but the whole set of tools is very tied up in the Python + docutils ecosystem and feels idiosyncratic.

I am working on an alternative and I'd like to use Djot for it, but the lack of a reStructuredText-like extensibility strategy is forcing me to make weird choices. I don't think cracking the door open to "arbitrary XML" will cause people to do bad things unnecessarily, and it could open up a lot of interesting opportunities for improving these types of sophisticated documentation systems.

For example, one thing you can do in Sphinx is define a heading in one document like this:

.. _glossary:

===
Glossary
===

And then refer to it elsewhere without needing to include the path to the document, because the heading itself is captured as a global reference:

:ref:`glossary`

Full docs here.

In Djot, you could define heading refs the same way using attribute syntax:

# Glossary{ref=glossary}

And then link to it with :ref[glossary]. (I know there's room here to argue about the value of this style of linking, but please just consider it one possible example rather than the meat of what I'm talking about.)

What would be needed to get this over the line? I think it would be very valuable for the community to have access to an ergonomic, cross-platform, well-defined, cleanly-implemented, arbitrarily-extensible markup language, and this closes the only gap I can think of.

@jgm
Copy link
Owner

jgm commented Aug 16, 2024

The equivalent in djot would be:

{#glossary}

# Glossary

Is that worse than your Sphinx example?

@jgm
Copy link
Owner

jgm commented Aug 16, 2024

Indeed, it's easier than that. See
https://htmlpreview.github.io/?https://github.com/jgm/djot/blob/master/doc/syntax.html#links-to-headings

See the [Glossary][].

# Glossary

this is the glossary

@Omikhleia
Copy link

Indeed, it's easier than that.

Not sure this is what @irskep is asking for, reading the linked Sphinx specification -- I read the request as the need for a cross-reference via an identifier while the heading title might be changed independently:

{#glossary}
# My Awesome Glossary 

And then one would want some way to use the glossary identifier and link to "My Awesome Glossary" (by title or any other appropriate scheme, e.g. by section number, figure number etc.), with the ability to work across document boundaries. If so, this is i.e. more or less what's discussed in #30.

@irskep
Copy link

irskep commented Aug 17, 2024

I think I didn't make my point well enough and over-explained one possible use case. I'm fully aware of the ability to define IDs within a document and link to them. I made a mistake by going into so much detail about reference syntax as an example.

The goal is not a cross-referencing system, which I think is too specific for a markup language. I'm talking about a syntax that supports sophisticated uses such as Sphinx-style references. Djot shouldn't need to implement all possible use cases for all document systems for all time, but it would be great to allow more complex systems to use it as a component without requiring any new syntax.

reStructuredText has a near-monopoly on infinite extensibility in a markup language without resorting to regex-based token transforms, so I'm excited by the prospect of it having some cross-platform competition.

@irskep
Copy link

irskep commented Aug 17, 2024

Another example of how flexible syntax helps is how Sphinx lets you specify HTML metadata.

One way to accomplish this in Djot today would be to write a filter that finds .meta blocks and reads their attributes. But you'd need to use CSS class syntax on an empty span, which feels like a semantic mismatch.

[]{.meta}{description="The Djot markup language"}

But I think it makes more sense to write it the way this proposal suggests:

:meta[]{description="The Djot markup language"}

And if you wanted to go a step further toward enabling this type of use case, you could allow omitting the square brackets:

:meta{description="The Djot markup language"}

Does this make it clearer what I'm talking about? Again, this is just one example of what you can do with flexible syntax. My point is not that Djot should have HTML metadata support.

@Omikhleia
Copy link

The goal is not a cross-referencing system, which I think is too specific for a markup language.

I don't think it is. It's something one needs, eventually ;)
(Other points here not granted - I"m unsure why HTML metadata make any sense in a general framework).

@irskep
Copy link

irskep commented Aug 17, 2024

I'm unsure why HTML metadata make any sense in a general framework).

Again, it is an example of a possible use of general extensibility. It is a kind of thing you could do as a user of Djot if this proposal were implemented. There are more kinds of things you could do that are not references or HTML metadata. I am not trying to argue that Djot should support HTML metadata. I'm trying to make the argument that extensibility is valuable. People use markup languages for all sorts of things. I assume that's why the filters feature exists. :-)

@bpj
Copy link

bpj commented Aug 17, 2024

I do rather complicated customizations with classes and other attributes and pandoc filters with both Pandoc's Markdown and djot as input formats and both LaTeX and HTML as target formats. The extensibility is there already even though the elements used are called spans and divs, or code and codeblock in some cases. The important thing is that you can attach attributes to them which filters can "pick up." For example I have a filter which implements list tables, converting lists of lists inside divs with a certain class into tables.

@irskep
Copy link

irskep commented Aug 17, 2024

I agree that extensibility already exists in the form of classes, IDs, and attributes, and it's one of the reasons I find this language so neat. My HTML metadata example above shows how you can already accomplish the use cases I have in mind.

My support here is mostly about semantics and ergonomics. You can use classes and attributes as hacks when what you are really trying to do is make custom elements. It works, but it feels better and it's easier to explain if the custom-tag-ness is at the forefront rather than saying "add this thing using a CSS class even though it's not really a CSS class."

Maybe a simpler version of this proposal would be to allow tag names to be specified just like CSS classes without the .. This would remove the collision with code block languages.

[Ctrl+C]{kbd}

{marquee}
```python
print("Hello, world!")

This would introduce a new error cases where the user specifies multiple tags ([Ctrl+C]{kbd}{span}), but it would be easy to report and explain.

I still think it's better to put the custom tag first, because it "feels" better and avoids this error case being possible, but maybe a less invasive alternative is easier for the community to accept.

I realize I'm creating a lot of noise today, so I'll try to back off for a while. Overall I don't have an opinion on the fine details of the syntax, just that Djot learns the distinction between a list of CSS classes vs tag identity. I would love to support this effort with implementation and/or documentation help if you decide to accept some form of this proposal.

@jgm
Copy link
Owner

jgm commented Aug 17, 2024

Maybe a simpler version of this proposal would be to allow tag names to be specified just like CSS classes without the ..

I agree, it's worth thinking about this variant of the original proposal as well (though it would preclude #257, so we'd have to be sure we don't want bare words to mean "flag attributes"). This variant has the advantage of being more uniform (it's just a tweak to what attributes can look like).

@bpj
Copy link

bpj commented Aug 18, 2024

Why not [text]{:tag} with :tag inside the same braces as regular attributes? Attributes coming after the thing they apply to is a good thing in my opinion. One of my irritations with TeX is that command names drown out the content. The attribute coming right after the command name is especially vulnerable: look \textit{in} the book. With look [in]{:textit} the book this is much less the case.

@bpj
Copy link

bpj commented Aug 18, 2024

As for the concern how djot tags (possibly non-English) should map to HTML tags or LaTeX command names I think some kind of tabular mapping would be required. I already have pandoc filters which do this for classes, with the mapping in the metadata.

@irskep
Copy link

irskep commented Aug 18, 2024

After pondering this overnight, I think I've gone from "prefer tag first" to "no preference." I can see the value of tag-after-content for spans.

For blocks, the attributes are already before the block, so you could just choose to put the tag first.

{:custom-tag}
{a=b c=d}
:::
block content
:::

@matklad
Copy link
Contributor Author

matklad commented Aug 18, 2024

One of my irritations with TeX is that command names drown out the content. The attribute coming right after the command name is especially vulnerable: look \textit{in} the book. With look [in]{:textit} the book this is much less the case.

I’d say in this particular case the equivalent djot would use a class, rather than a tag name.

This is precisely the intended usage difference:

  • if it doesn’t affect the interpretation of attributed syntax, it should be a clas
  • if it affects the meaning to the extent that you actually need to know the tag to mentally “parse” the content, than it’s the tag.

For TeX, if we pretend we don’t have first class syntax for this features already, you’d say

Look [in]{.textit} the book

but

Triangle inequality says :math[AB + BC > AC]

The different between examples is that in the latter case you need to know that it is an equation to correctly infer the meaning of the attributed snippet. While in the former case you can mentally parse it as simple text, and then note that it is in italics.

@irskep
Copy link

irskep commented Aug 18, 2024

To take @matklad's last comment a step further, I assume this would work with inline verbatim as well, and could potentially reduce the need for special-case micro-language syntax such as for math ($). Or at least allow people to define their own micro-languages using filters.

If you had never implemented math, then you could use this proposal to do something like this:

:$`e=mc^2` (recognizing that you don't want to use the English word 'math')

instead of 

$`e=mc^2`

I guess yet another option for this idea is to tweak the math syntax and use the dollar signs as markers instead of a colon...

$`math`
$$`display math`
$custom$`custom tag`

Personally I'd like to use per-span language-specific syntax highlighting this way, but I recognize people probably already think CSS classes are sufficient for that.

:python`print('hello!')` and :js`console.log('goodbye!')`

vs what you can already do:

`print('hello!')`{.python} and `console.log('goodbyd!')`{.js}

@bpj
Copy link

bpj commented Aug 20, 2024

@matklad I must say that I don’t follow your reasoning why [in]{.textit} should be a class but :math[AB + BC > AC] should not. To a human AB + BC > AC is obviously math so the human doesn’t need a prominently placed hint, it’s the computer which needs the hint. More importantly by your own reasoning HTML <i> is a tag, and I can’t see how LaTeX \textit is semantically different from it, so if the latter does not need the prominent placement before the content I can’t see why the former does, other than “HTML places the tag before the content”. Well so does LaTeX and it isn’t a valid argument in either case, because djot ain’t HTML or LaTeX!

You see for me, and I’m hardly alone, the main reason for using an LML rather than HTML and LaTeX directly is this problem that the many textual tags/commands break the flow of reading so that it becomes basically impossible (for me at least!) to just scan the text and get an idea of what it’s about. Ideally an LML shouldn’t use any textual markup at all, though I readily admit that rigidly following that principle has proven impractical.1 A possible route to alleviate this is by allowing non-ASCII punctuation and symbol characters, and/or combinations of characters — djot markup like {+ins+} is a step in the right direction! —, as markup and I can really not see why not in the 2020s since any good editor should nowadays offer a handy way to enter arbitrary characters, although it appears that Vim is actually better than most in this regard.2

I firmly believe that any textual markup in an LML should be as inobtrusive as possible. Placing the attributes after the element has proven to be thus inobtrusive. It becomes like a parenthetical remark, although unfortunately attributes are often more frequent than parenthetical remarks should be! I don’t remember having ever seen it stated by a developer that this was the intention when the attributes were put after the code run in Pandoc Markdown (maybe @jgm remembers) but that has become the IMO welcome effect. Attributes after the opening fence in Pandoc Markdown is likewise nicely inobtrusive. I’m actually somewhat troubled by the superposed block attributes of djot, but I realize that short of requiring a “dummy” fence around paragraphs with attributes this is the best way to distinguish block attributes from inline attributes and my Perl script described in the first note also uses superposed attributes for blocks.

I also try to use a single (short) class or attribute whenever possible offloading the actual customization to filters and the configuration to metadata — although I agree with @jgm that metadata should not really be used for filter configuration: in fact I think filter data and filter configuration should be separate namespaces fron both metadata and template variables, as well as from each other (#192).

That said I don’t really think a single tag always suffices. My above-mentioned Perl script has a concept of “private attributes” with a + before their name which are available to templates but are not included when attributes are rendered as Pandoc Md/djot/HTML attributes in templates.3 The djot or Pandoc Md equivalent might be {+.private-class +private=attr} which are available to filters but ignored when rendering to HTML and other non-djot/non-md formats where attributes are rendered. That’s not to say that {:tag} which may more or less “automatically” render to an HTML/XML tag or a LaTeX attribute is a bad idea: I just think that it should be treated as a special attribute with its own prefix (and the colon is OK for that.)

Footnotes

  1. I have written a kind of “preprocessor” which replaces delimiters like ⟦…⟧ or [|...|] (or ⟦:…:⟧) or inline or block fences like /.../, ===, +-+-+, ‣‣‣ (similar to backticks in djot/Markdown) with djot/Pandoc Markdown spans/divs, HTML tags, LaTeX commands or really whatever you like, not just by mindlessly replacing substrings but actually matching nested constructs using perl 5.010+ recursive regex patterns and “templates” (actually a kind of sprintf: a subclass of String::Formatter where each format/code can take multiple arguments) to generate the replacement. Below is an example of a YAML configuration file which should give you an idea of what’s it about (deliberately replacing non-ASCII characters like ⟦ ⟧ to ensure OK rendering on GitHub) It targets Pandoc Markdown but “translating” it to djot would be trivial..

    balanced:
      # UNDERLINE
      '{|':
        close: '|}'
        attrs: false
        subst: '[%{text}s]{.underline}'
      # CLASS
      '{++':
        close: '++}'
        subst: '[%{text}s]{.my-class %{attrs}s}'
      # GLOSSARY
      '[?':
        close: '?]'
        subst: '[%{text}s^?^](glossary.html#%{text}- "Glossary: %{(%{(text)(L)}C)(([\"\\]))(\${1}s)}S")%{attr}s'
      # LANG ATTRIBUTE
      '{:':
        close: ':}'
        inner: true
        regex: |
          (?<open> \{\: )
            (?<tag> \w+ (?: \- \w+ )* )
            \:
            %{balanced}s
          \:\}
          %{attrs}s
        subst: '[%{text}s]{lang="%{tag}s" %{attrs}s}'
    fenced:
      # PERL CODE
      '///':
        block: false
        extend: '/'
        subst: '`%{extend}`%{text}s%{extend}``{.perl %{attrs}s}'
      '/-/-/':
        block: true
        extend: '-/'
        subst: |
          ``````%{extend}` {.perl .numberLines %{attrs}s}
          %{text}s
          ``````%{extend}`
      # CENTER
      '==><==':
        block: true
        extend: '='
        subst:
          latex: |2
            `\begin{center}`{=latex}
    
            %{text}s
    
            `\end{center}`{=latex}
          DEFAULT: |2
            ::: {.center %{attrs}s}
            %{text}s
            :::
    

    where %{(%{(text)(L)}C)(([\"\\]))(\${1}s)}S basically says “first convert the text between the delimiters to lowercase, then replace " and \ with \" and \\” or in Perl

    my $subj = lc $input->{text};
    $subj =~ s/([\"\\])/\\$1/g;
    

    but copying everything and %S “substitute” fudged up to take a fourth “max number of substitutions” argument like Lua’s gsub not used here and so defaulting to a global substitution, and you may note that both the subject and the replacement are format strings, with the case change done when creating the subject of the substitution.

  2. Likewise a good programing language today should allow UTF-8 strings by default, if not even Unicode identifiers. (Rakudo and recent versions of Perl allow any Unicode letters in identifiers which is a good start. They also allow any strings as a hash keys, which usually is close enough!)

  3. The attr and attrs “variables” seen in the other note are just the raw string w/o braces and with “private” attributes removed by regular expression but there is also an htmlattrs variable with the attributes formatted for inclusion in HTML tags (class="foo bar" instead of .foo .bar).

@alterae
Copy link

alterae commented Sep 19, 2024

I am firmly in favor of this proposal in some form or another.

I regularly make use of semantic tags like <abbr>, <aside>, and even <q> in any prose I write, so the ability to write those without shelling out to HTML and without much friction is a very important feature in any markup language for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants