Add 'alwaysMatchEndPattern' option to end patterns #90

winstliu · 2017-04-07T03:11:12Z

🚨 WIP 🚨

This option, when set to true, will force the end pattern to match, even when it is not the current rule. This is incredibly useful for grammars that rely on includes, yet do not necessarily need to wait for the include to finish matching before ending the include. Some examples of such grammars include: language-gfm's code blocks, language-html and language-xml including language-javascript and language-coffee-script, and language-python allowing SQL queries in strings.

TODO:

Specs
Caching for getEndPatternScanner
A good thorough review 👀

Fixes #83
and unblocks a whole lot of issues:
atom/language-php#187
atom/language-shellscript#60
atom/language-gfm#171
atom/language-gfm#21
atom/language-yaml#79
atom/language-html#90
atom/language-python#110
atom/language-python#55
atom/language-python#39
atom/language-python#143
and more...

This option, when set to true, will force the end pattern to match, even when it is not the current rule.

dead-claudia · 2018-02-07T16:35:07Z

@50Wliu Status update?

winstliu · 2018-02-07T19:16:52Z

Probably not going to merge. This was always experimental but then I realized merging this would also break compatibility with Linguist and VSCode which expect TextMate-compatible grammars.

dead-claudia · 2018-02-07T22:23:46Z

So how does that relate to #83?

Aerijo · 2018-09-18T04:26:13Z

@50Wliu Is it possible to make the change on Atom's end instead? Or does Atom just use the exact code from here? VS Code seems to be able to do it, but I haven't looked at their implementation. AFAIK, they parse with their own system anyway.

Compatibility is all well and good, but it would really help to have something that unblocks all those issues. There will be many language packages that don't adopt Tree-sitter, so this is still a valid concern.

Also, what compatibility would be broken? Existing languages will not have the introduced property, and the behaviour should not change if it is not present. So wouldn't it be backwards compatible?

Ingramz · 2018-09-18T07:59:35Z

@Aerijo Atom's grammars are also used outside Atom, examples being VS Code and GitHub, so by introducing new constructs, we are breaking compatibility with these two, although admittedly both can be adapted to work with it.

Ideally we'd want to stay compatible with the classical TextMate implementation so that anyone who follows that, could benefit from the grammars that the community produces.

It is interesting that you have gotten it to work in VS Code, I'd be interested in how did you lay out your grammar to get it working there?

Aerijo · 2018-09-18T08:12:22Z

@Ingramz I didn't do anything, that's just what happens in VS Code to a grammar that would break in Atom. They appear to use a different engine(?) to apply the grammar, as there have been differences in behaviour before (to much confusion). Like I said, I haven't actually looked at what they do internally, I've just seen what it does in the editor.

As for the compatability, I know other things use this. That's why I pointed out the change is backwards compatible. My preference would really be to emulate VS Code, because that seems like the desired default behaviour, but I was worried this might break the C / C++ / C# stuff.

Ingramz · 2018-09-18T09:29:51Z

@Aerijo they do indeed use a different engine. Historically VS Code's implementation of the TextMate grammars has been more accurate than Atom's first-mate, but there might be some differences in
VS Code's implementation regardless.

It looks like the VSCode grammar for markdown is different. If that grammar works the intended way also in TextMate but not in Atom, then first-mate needs correcting again.

Aerijo · 2018-09-18T09:38:40Z

@Ingramz Unfortunately, it seems TextMate leaks too

I suppose the VS Code engine authors saw this as a bug, not a feature of TextMate grammars.

Ingramz · 2018-09-18T11:16:41Z

@Aerijo if that is the case, then we need to find how it has been worked around. Because if they haven't documented how their implementation differs from TextMate, then I would consider it a bug in VS Code's implementation.

caleb531 · 2019-03-09T19:10:53Z

I am struggling with this exact same grammar limitation. I am trying to write a grammar that matches JavaScript embedded in HTML via <% %> tags:

<% if (encoding === 'utf8') { %>
    <meta charset="utf-8">
<% } %>

I have tried the following rule (and variations thereof), but because of the unclosed {, the HTML is tokenized like JS until the end } is reached.

{
    'begin': '<%'
    'end': '%>'
    'name': 'meta.embedded.foo'
    'contentName': 'source.js.embedded.foo'
    'patterns': [
        'include': 'source.js'
    ]
}

Is there any way to work around this limitation with existing First Mate grammar semantics (at least in Atom)? Do Tree Sitter grammars provide a solution to this problem?

dead-claudia · 2019-04-23T10:54:14Z

Is there a status update on this?

Arcanemagus · 2019-04-23T16:00:15Z

The status update was given to you when you first asked: #90 (comment)

Details as to the why behind that reason have been given in further comments. The correct solution here is to continue work on moving remaining TextMate grammars to a Tree-sitter implementation that doesn't suffer from this limitation.

winstliu · 2019-04-23T16:39:09Z

I stopped working on this a while back as it became clear that tree-sitter would not have this issue and that it would eventually become the preferred way to write grammars for Atom. Additionally, this would break compatibility with other TextMate-like engines (for example, Linguist) which would be less-than-desirable.

For those reasons, I'm going to be closing this pull request as I don't anticipate completing it.

dead-claudia · 2019-04-24T17:51:27Z

@50Wliu Thanks for the heads up.

@Arcanemagus I asked in case things changed in the 14 months since (that's not a short amount of time in the world of software development), since the PR was still open. 14 months ago when that comment was made, tree-sitter didn't even exist, at least publicly. Please don't assume I'm just trying to non-constructively spam the thread with that question - I was literally just wanting to know whether this route was still to be taken, since it would guide my approach to suggestions on issues related to it.

jeff-hykin · 2019-07-05T05:05:34Z

@Aerijo if that is the case, then we need to find how it has been worked around. Because if they haven't documented how their implementation differs from TextMate, then I would consider it a bug in VS Code's implementation.

I know this is probably irrelevant at this point. But for anyone who is curious, the VS Code markdown tmLanguage uses a while rather than an end. The while feature is largely undocumented, but it seems to behave the same in VS Code and TextMate, and it effectively has the alwaysMatchEndPattern behavior. It must match every line, and then if it doesn't match a line, the scope (e.g. the triple back-ticks) will end no matter what, even if there is an unfinished string.

The while loop still has it's limitations, which VS Code is running up against right now and I'm actually looking into creating this kind of a PR for a 'alwaysMatchEndPattern' in VS Code.

The Tree Sitter is certainly a better general solution.

Add 'alwaysMatchEndPattern' option to end patterns

de6c9c6

This option, when set to true, will force the end pattern to match, even when it is not the current rule.

winstliu added the work-in-progress label Apr 7, 2017

winstliu mentioned this pull request Apr 7, 2017

Improve highlighting of embedded @example source atom/language-javascript#497

Closed

winstliu added 4 commits April 7, 2017 22:54

Fix scope names not being popped correctly

f0297b0

Specs

90e610f

Add caching todo

df7ef5f

🎨

8ac6085

Ingramz mentioned this pull request Sep 18, 2017

@something(//comment) breaks highlighting jawee/language-blade#17

Open

winstliu mentioned this pull request Feb 19, 2018

Markdown breaks syntax highlighting and code folding atom/atom#16782

Open

This comment has been minimized.

Sign in to view

burodepeper mentioned this pull request Jan 7, 2019

Syntax errors in code blocks break subsequent markdown highlighting burodepeper/language-markdown#241

Closed

Aerijo mentioned this pull request Jan 23, 2019

Support Sublime Text syntax definitions atom/atom#18727

Closed

This was referenced Apr 23, 2019

Markdown syntax colour issue while using /* .. */ atom/language-c#146

Open

Create means of including grammar in child context #112

Closed

winstliu closed this Apr 23, 2019

winstliu deleted the wl-always-match-end-pattern branch April 23, 2019 16:39

matter123 referenced this pull request in jeff-hykin/vscode-textmate Jul 5, 2019

_

c7794cd

RedCMD mentioned this pull request Oct 5, 2024

Option to match end before any patterns microsoft/vscode-textmate#139

Open

RedCMD referenced this pull request in RedCMD/TmLanguage-Syntax-Highlighter Oct 11, 2024

Improve error handling

07ccecf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'alwaysMatchEndPattern' option to end patterns #90

Add 'alwaysMatchEndPattern' option to end patterns #90

winstliu commented Apr 7, 2017 •

edited

Loading

dead-claudia commented Feb 7, 2018

winstliu commented Feb 7, 2018

dead-claudia commented Feb 7, 2018

This comment has been minimized.

Aerijo commented Sep 18, 2018

Ingramz commented Sep 18, 2018

Aerijo commented Sep 18, 2018

Ingramz commented Sep 18, 2018

Aerijo commented Sep 18, 2018

Ingramz commented Sep 18, 2018

caleb531 commented Mar 9, 2019

dead-claudia commented Apr 23, 2019

Arcanemagus commented Apr 23, 2019

winstliu commented Apr 23, 2019

dead-claudia commented Apr 24, 2019

jeff-hykin commented Jul 5, 2019

Add 'alwaysMatchEndPattern' option to end patterns #90

Add 'alwaysMatchEndPattern' option to end patterns #90

Conversation

winstliu commented Apr 7, 2017 • edited Loading

🚨 WIP 🚨

dead-claudia commented Feb 7, 2018

winstliu commented Feb 7, 2018

dead-claudia commented Feb 7, 2018

This comment has been minimized.

Aerijo commented Sep 18, 2018

Ingramz commented Sep 18, 2018

Aerijo commented Sep 18, 2018

Ingramz commented Sep 18, 2018

Aerijo commented Sep 18, 2018

Ingramz commented Sep 18, 2018

caleb531 commented Mar 9, 2019

dead-claudia commented Apr 23, 2019

Arcanemagus commented Apr 23, 2019

winstliu commented Apr 23, 2019

dead-claudia commented Apr 24, 2019

jeff-hykin commented Jul 5, 2019

winstliu commented Apr 7, 2017 •

edited

Loading