Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Add 'alwaysMatchEndPattern' option to end patterns #90

Closed
wants to merge 5 commits into from

Conversation

winstliu
Copy link
Contributor

@winstliu winstliu commented Apr 7, 2017

🚨 WIP 🚨

This option, when set to true, will force the end pattern to match, even when it is not the current rule. This is incredibly useful for grammars that rely on includes, yet do not necessarily need to wait for the include to finish matching before ending the include. Some examples of such grammars include: language-gfm's code blocks, language-html and language-xml including language-javascript and language-coffee-script, and language-python allowing SQL queries in strings.

TODO:

  • Specs
  • Caching for getEndPatternScanner
  • A good thorough review 👀

Fixes #83
and unblocks a whole lot of issues:
atom/language-php#187
atom/language-shellscript#60
atom/language-gfm#171
atom/language-gfm#21
atom/language-yaml#79
atom/language-html#90
atom/language-python#110
atom/language-python#55
atom/language-python#39
atom/language-python#143
and more...

This option, when set to true, will force the end pattern to match, even
when it is not the current rule.
@dead-claudia
Copy link

@50Wliu Status update?

@winstliu
Copy link
Contributor Author

winstliu commented Feb 7, 2018

Probably not going to merge. This was always experimental but then I realized merging this would also break compatibility with Linguist and VSCode which expect TextMate-compatible grammars.

@dead-claudia
Copy link

So how does that relate to #83?

@dead-claudia

This comment has been minimized.

@Aerijo
Copy link
Contributor

Aerijo commented Sep 18, 2018

@50Wliu Is it possible to make the change on Atom's end instead? Or does Atom just use the exact code from here? VS Code seems to be able to do it, but I haven't looked at their implementation. AFAIK, they parse with their own system anyway.

screen shot 2018-09-18 at 2 23 50 pm

Compatibility is all well and good, but it would really help to have something that unblocks all those issues. There will be many language packages that don't adopt Tree-sitter, so this is still a valid concern.

Also, what compatibility would be broken? Existing languages will not have the introduced property, and the behaviour should not change if it is not present. So wouldn't it be backwards compatible?

@Ingramz
Copy link
Contributor

Ingramz commented Sep 18, 2018

@Aerijo Atom's grammars are also used outside Atom, examples being VS Code and GitHub, so by introducing new constructs, we are breaking compatibility with these two, although admittedly both can be adapted to work with it.

Ideally we'd want to stay compatible with the classical TextMate implementation so that anyone who follows that, could benefit from the grammars that the community produces.

It is interesting that you have gotten it to work in VS Code, I'd be interested in how did you lay out your grammar to get it working there?

@Aerijo
Copy link
Contributor

Aerijo commented Sep 18, 2018

@Ingramz I didn't do anything, that's just what happens in VS Code to a grammar that would break in Atom. They appear to use a different engine(?) to apply the grammar, as there have been differences in behaviour before (to much confusion). Like I said, I haven't actually looked at what they do internally, I've just seen what it does in the editor.

As for the compatability, I know other things use this. That's why I pointed out the change is backwards compatible. My preference would really be to emulate VS Code, because that seems like the desired default behaviour, but I was worried this might break the C / C++ / C# stuff.

@Ingramz
Copy link
Contributor

Ingramz commented Sep 18, 2018

@Aerijo they do indeed use a different engine. Historically VS Code's implementation of the TextMate grammars has been more accurate than Atom's first-mate, but there might be some differences in
VS Code's implementation regardless.

It looks like the VSCode grammar for markdown is different. If that grammar works the intended way also in TextMate but not in Atom, then first-mate needs correcting again.

@Aerijo
Copy link
Contributor

Aerijo commented Sep 18, 2018

@Ingramz Unfortunately, it seems TextMate leaks too
screen shot 2018-09-18 at 7 37 04 pm

I suppose the VS Code engine authors saw this as a bug, not a feature of TextMate grammars.

@Ingramz
Copy link
Contributor

Ingramz commented Sep 18, 2018

@Aerijo if that is the case, then we need to find how it has been worked around. Because if they haven't documented how their implementation differs from TextMate, then I would consider it a bug in VS Code's implementation.

@caleb531
Copy link

caleb531 commented Mar 9, 2019

I am struggling with this exact same grammar limitation. I am trying to write a grammar that matches JavaScript embedded in HTML via <% %> tags:

<% if (encoding === 'utf8') { %>
    <meta charset="utf-8">
<% } %>

I have tried the following rule (and variations thereof), but because of the unclosed {, the HTML is tokenized like JS until the end } is reached.

{
    'begin': '<%'
    'end': '%>'
    'name': 'meta.embedded.foo'
    'contentName': 'source.js.embedded.foo'
    'patterns': [
        'include': 'source.js'
    ]
}

Screen Shot 2019-03-09 at 11 04 46 AM

Is there any way to work around this limitation with existing First Mate grammar semantics (at least in Atom)? Do Tree Sitter grammars provide a solution to this problem?

@dead-claudia
Copy link

Is there a status update on this?

@Arcanemagus
Copy link

The status update was given to you when you first asked: #90 (comment)

Details as to the why behind that reason have been given in further comments. The correct solution here is to continue work on moving remaining TextMate grammars to a Tree-sitter implementation that doesn't suffer from this limitation.

@winstliu
Copy link
Contributor Author

I stopped working on this a while back as it became clear that tree-sitter would not have this issue and that it would eventually become the preferred way to write grammars for Atom. Additionally, this would break compatibility with other TextMate-like engines (for example, Linguist) which would be less-than-desirable.

For those reasons, I'm going to be closing this pull request as I don't anticipate completing it.

@winstliu winstliu closed this Apr 23, 2019
@winstliu winstliu deleted the wl-always-match-end-pattern branch April 23, 2019 16:39
@dead-claudia
Copy link

@50Wliu Thanks for the heads up.

@Arcanemagus I asked in case things changed in the 14 months since (that's not a short amount of time in the world of software development), since the PR was still open. 14 months ago when that comment was made, tree-sitter didn't even exist, at least publicly. Please don't assume I'm just trying to non-constructively spam the thread with that question - I was literally just wanting to know whether this route was still to be taken, since it would guide my approach to suggestions on issues related to it.

matter123 referenced this pull request in jeff-hykin/vscode-textmate Jul 5, 2019
@jeff-hykin
Copy link

@Aerijo if that is the case, then we need to find how it has been worked around. Because if they haven't documented how their implementation differs from TextMate, then I would consider it a bug in VS Code's implementation.

I know this is probably irrelevant at this point. But for anyone who is curious, the VS Code markdown tmLanguage uses a while rather than an end. The while feature is largely undocumented, but it seems to behave the same in VS Code and TextMate, and it effectively has the alwaysMatchEndPattern behavior. It must match every line, and then if it doesn't match a line, the scope (e.g. the triple back-ticks) will end no matter what, even if there is an unfinished string.

The while loop still has it's limitations, which VS Code is running up against right now and I'm actually looking into creating this kind of a PR for a 'alwaysMatchEndPattern' in VS Code.

The Tree Sitter is certainly a better general solution.

RedCMD referenced this pull request in RedCMD/TmLanguage-Syntax-Highlighter Oct 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No way to scope a grammar capture
7 participants