You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: Don't preemptively register identifiers as anchors
Registering every identifier of a rendered object is wrong,
since only one of these identifiers actually appear on the page.
What it tried to achieve instead was to register every identifier as pointing to the rendered anchor. But autorefs doesn't have
mechanisms to do that unambiguously, yet.
Turns out autorefs now has such a mechanism: its plugin's register_anchor method now accepts a third argument, anchor, in addition to identifier. This change in autorefs was brought to support Markdown anchors, a solution to create unique heading aliases and avoid undeterministic resolution of cross-references when their identifiers appear on several pages.
Thanks to this change, we can now register all the anchors of an object (its aliases) so that they point to the actually rendered anchor on the page. This change will be done in mkdocstrings' AutoDocProcessor._process_headings method:
With this change, the fallback mechanism becomes useless. Indeed, when we fallback, we simply ask the handler to provide the known anchors, and we try again with those. Since we will now register these anchors ahead of time, if an identifier isn't found, the fallback mechanism will not find it either.
All in all, if handlers provide all relevant anchors of a given object (through their get_anchors method), then we can be sure that we won't ever need to fallback. So we could actually remove this mechanism entirely. It was wrong anyway, since it would query all handlers for anchors, which could yield false-positives (identifier matching in another handler). This could be fixed since autoref HTML elements now carry a domain value, but there's no point in fixing it if we can remove it.
Describe the solution you'd like
Entirely remove the fallback mechanism. There will be a nice ripple effect in handlers too: they won't have to maintain a fallback_config attribute, and the associated logic in their collect method. The base handler can also be cleaned up, as well as the Handlers class.
Handlers previously handled "fallback collection" differently, because the fallback logic was also triggered by optional cross-references to external projects for which we don't have an inventory loaded. In these cases, we didn't want to try and load additional data from the disk, just to get anchors, since we knew for sure the corresponding objects would not be rendered in the docs anyway (and so we wouldn't be able to cross-ref them).
Without the fallback logic firing, we pretty much know that each object we try to collect when getting anchors for each heading rendered on a page is already collected (and possibly cached) or at least is not external, so we don't need any special value or parameter to do collect data conditionally.
Is there something fundamental I'm missing about languages other than Python, which would require that we keep the fallback mechanism? Based solely on the logic, I would say no: the fallback just means fetching aliases to try with them. If we already registered these aliases, no point in falling back again.
Describe alternatives you've considered
Fix the fallback mechanism to only query handlers of the relevant domain. I tried locally, it works, with just one deprecation message (signature of callables accepted by autorefs.get_fallback_anchor now accepting a second argument, the domain).
Additional context
We will first have to implement the changes mentioned above in mkdocstrings.
The text was updated successfully, but these errors were encountered:
This will be useful in mkdocstrings, where we want to register URLs for all aliases of a rendered object's identifier early in the process, so that we can drop the fallback mechanism in autorefs.
Primary URLs will take precedence when resolving cross-references, to avoid logging warnings about multiple URLs found.
For example:
- Object `a.b.c.d` has aliases `a.b.d` and `a.d`
- Object `a.b.c.d` is rendered.
- We register `a.b.c.d` -> page#a.b.c.d as primary
- We register `a.b.d` -> page#a.b.c.d as secondary
- We register `a.d` -> page#a.b.c.d as secondary
- Later, if `a.b.d` or `a.d` are rendered, we will register primary and secondary URLs the same way
- This way we are sure that each of `a.b.c.d`, `a.b.d` or `a.d` will link to their primary URL, if any, or their secondary URL, accordingly
Related-to-issue-61: #61
Is your change request related to a problem? Please describe.
Two or three years ago, I committed this change into mkdocstrings: mkdocstrings/mkdocstrings@c7ac043.
Turns out autorefs now has such a mechanism: its plugin's
register_anchor
method now accepts a third argument,anchor
, in addition toidentifier
. This change in autorefs was brought to support Markdown anchors, a solution to create unique heading aliases and avoid undeterministic resolution of cross-references when their identifiers appear on several pages.Thanks to this change, we can now register all the anchors of an object (its aliases) so that they point to the actually rendered anchor on the page. This change will be done in mkdocstrings'
AutoDocProcessor._process_headings
method:With this change, the fallback mechanism becomes useless. Indeed, when we fallback, we simply ask the handler to provide the known anchors, and we try again with those. Since we will now register these anchors ahead of time, if an identifier isn't found, the fallback mechanism will not find it either.
All in all, if handlers provide all relevant anchors of a given object (through their
get_anchors
method), then we can be sure that we won't ever need to fallback. So we could actually remove this mechanism entirely. It was wrong anyway, since it would query all handlers for anchors, which could yield false-positives (identifier matching in another handler). This could be fixed sinceautoref
HTML elements now carry adomain
value, but there's no point in fixing it if we can remove it.Describe the solution you'd like
Entirely remove the fallback mechanism. There will be a nice ripple effect in handlers too: they won't have to maintain a
fallback_config
attribute, and the associated logic in theircollect
method. The base handler can also be cleaned up, as well as theHandlers
class.Handlers previously handled "fallback collection" differently, because the fallback logic was also triggered by optional cross-references to external projects for which we don't have an inventory loaded. In these cases, we didn't want to try and load additional data from the disk, just to get anchors, since we knew for sure the corresponding objects would not be rendered in the docs anyway (and so we wouldn't be able to cross-ref them).
Without the fallback logic firing, we pretty much know that each object we try to collect when getting anchors for each heading rendered on a page is already collected (and possibly cached) or at least is not external, so we don't need any special value or parameter to do collect data conditionally.
Is there something fundamental I'm missing about languages other than Python, which would require that we keep the fallback mechanism? Based solely on the logic, I would say no: the fallback just means fetching aliases to try with them. If we already registered these aliases, no point in falling back again.
Describe alternatives you've considered
Fix the fallback mechanism to only query handlers of the relevant
domain
. I tried locally, it works, with just one deprecation message (signature of callables accepted byautorefs.get_fallback_anchor
now accepting a second argument, thedomain
).Additional context
We will first have to implement the changes mentioned above in mkdocstrings.
The text was updated successfully, but these errors were encountered: