Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect included code snippets from "recursive" highlighting #1441

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wioux
Copy link

@wioux wioux commented Apr 19, 2022

Description

This PR aims to fix an issue with how yard handles syntax highlighted snippets from {include:file:...} source.

When an included file contains code blocks, they'll be fed through #parse_codeblocks more than once, as the file and then the parent are processed depth first. After the first highlighting pass, the code block contains additional <span>s and escapes (&lt; etc), and later passes are unable to tell that these weren't part of the original source.

yard's built-in ruby highlighter does work in these cases -- its tokenizer tends to raise an exception when the source looks like HTML instead of Ruby, and by inspecting the source it can tell that the block was already highlighted:

# lib/yard/templates/helpers/html_syntax_highlight_helper.rb#L41-L42
def html_syntax_highlight_ruby_ripper(source)
  resolver = Parser::Ruby::TokenResolver.new(source, object)
  ...
rescue Parser::ParserSyntaxError
  source =~ /^<span\s+class=/ ? source : h(source)
end

Plugins, on the other hand, are unaware that the code snippets they're given in some cases are already marked up. This leads to the final document showing a mess of escaped html within the <pre>.[1]

In this branch, data-highlighted=true is added to code snippets when they're processed through {include:...}, allowing them to be skipped during subsequent passes through #parse_codeblocks. The attribute is then removed by the outermost #htmlify call, so it's only an internal annotation and shouldn't make it into the final output (unless a theme calls #insert_include directly).

[1] This can be seen in the output of

$ cat setup.rb
module YARD::Templates::Helpers::HtmlHelper
  def html_syntax_highlight_html(source)
    %(<strong>#{CGI.escapeHTML(source)}</strong>)
  end
end

$ cat example.rdoc
  !!!html
  <h1>

$ cat README.rdoc
{include:file:example.rdoc}

$ yard doc -e setup.rb

Completed Tasks

  • I have read the Contributing Guide.
  • The pull request is complete (implemented / written).
  • Git commits have been cleaned up (squash WIP / revert commits).
  • I wrote tests and ran bundle exec rake locally (if code is attached to PR).

Copy link
Owner

@lsegal lsegal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing! I added a few inline comments; in short, I'm all for this fix but there are a few blockers in the current approach in the form of a breaking change and regression.

Those would need to be resolved before getting this through.

@@ -637,20 +637,25 @@ def parse_lang_for_codeblock(source)
# @param [String] html the html to search for code in
# @return [String] highlighted html
# @see #html_syntax_highlight
def parse_codeblocks(html)
def parse_codeblocks(html, internal=false)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick here but this should be styled as internal = false with spaces.

@@ -54,7 +54,7 @@ def urlencode(text)
# @param [Symbol] markup examples are +:markdown+, +:textile+, +:rdoc+.
# To add a custom markup type, see {MarkupHelper}
# @return [String] the HTML
def htmlify(text, markup = options.markup)
def htmlify(text, markup = options.markup, internal: false)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately using keyword args limits us to Ruby 3.x and would be a breaking change that could not be merged.

The good news is this could easily be changed to a positional arg.

language ||= object.source_type

if options.highlight
pre_attrs = Array %(class="code #{language}")
Copy link
Owner

@lsegal lsegal Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd syntax. Did you simply mean pre_attrs = %(class="code #{language}")?

@@ -39,7 +39,7 @@ def html_syntax_highlight_ruby_ripper(source)
end
output
rescue Parser::ParserSyntaxError
source =~ /^<span\s+class=/ ? source : h(source)
h(source)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a subtle but problematic change that I think will cause a regression. This change was explicitly added to permit pre-formatted HTML extra-files to be added via: yard - OTHER.html where OTHER.html might include:

<pre class="code ruby"><code class="ruby"><span class='kw'>return</span></code></pre>

image

The output of this section would generate double encoded output if you ran it through this block:

image

If removing this HTML detection causes a failure in your implementation, you may have to rethink part of your approach.

@wioux wioux force-pushed the parse_codeblocks_double_pass_guard branch from 8eab01f to 9273d42 Compare July 7, 2022 00:16
@wioux wioux force-pushed the parse_codeblocks_double_pass_guard branch from 9273d42 to 5d3e04f Compare July 7, 2022 01:30
@wioux wioux force-pushed the parse_codeblocks_double_pass_guard branch from 5d3e04f to a9babdb Compare July 7, 2022 01:40
@wioux
Copy link
Author

wioux commented Jul 7, 2022

Thanks for taking a look! I fixed the syntax and style issues you mentioned, and restored the bypass behavior for pre-formatted blocks. I'd misinterpreted what that check was for and thought it was to account for the same recursive/nested case that this branch considers.

I brought the check back, but lifted it out of the rescue block and into and into HtmlHelper#html_syntax_highlight as a deliberate special case, and added this to the spec:

if type.to_s == "ruby" && source =~ /<span\s+class=/
return source
end

it "wraps but doesn't alter ruby snippets that already contain highlighting tags" do
expect(subject.htmlify('<pre lang="ruby"><span class="kw">for</span></pre>', :html)).to eq(
'<pre class="code ruby"><code class="ruby"><span class="kw">for</span></code></pre>'
)
end

Does this look alright? It seems slightly better to not feed non-ruby input to the parser and catch the exception, but the rest of my changes should be compatible either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants