Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change: Stop caching parsed docstrings #340

Open
pawamoy opened this issue Dec 5, 2024 · 3 comments
Open

change: Stop caching parsed docstrings #340

pawamoy opened this issue Dec 5, 2024 · 3 comments
Assignees
Labels
griffe: docstrings Related to docstring parsing refactor Change suggestion, not a bug nor a feature.

Comments

@pawamoy
Copy link
Member

pawamoy commented Dec 5, 2024

Is your change request related to a problem? Please describe.

Cached parsed docstrings are causing more issues than they solve. See mkdocstrings/python#183. Cache invalidation of course! The second of the two hardest problems in computer science, after naming things and before off-by-1 errors.

Templates can very easily iterate on the sections once. Other programmatic uses can store the sections in a variable. No need to cache it. I think I initially decided to cache it because it would allow extensions to easily modify sections, which are data structures, instead of having to mess with the raw docstring. But we can solve this too, see next section.

Describe the solution you'd like

Instead of caching the parsed docstrings, provide utilities to make it as easy to modify raw docstrings. Typically:

  • appending a section: docstring.append(DocstringSectionStuff(...)). Here, we don't care what style is used: append will internally parse the docstring using the attached style (or auto-infer it when the feature is available), append the section, and re-dump everything.
  • that means we must implement the opposite of parsers: writers?
  • that will incur a small performance cost too, since we'll more often parse docstrings, but I don't expect them to be parsed a lot anyway (not a lot of extensions modify docstrings, and these extensions do not modify a lot of docstrings).
  • append is just a suggestion, we should design a good API. For example, and related to perfs, we should allow appending several sections at once, etc. Context manager (transactions)?

Describe alternatives you've considered

Not considering. Needs to be done.

Additional context

This will need a deprecation period of course, with warnings and all. We should also reach out to extension authors using cached parsed sections. We use this ourselves in griffe-warnings-deprecated I believe.

@pawamoy pawamoy added griffe: docstrings Related to docstring parsing refactor Change suggestion, not a bug nor a feature. labels Dec 5, 2024
@pawamoy pawamoy self-assigned this Dec 5, 2024
@pawamoy
Copy link
Member Author

pawamoy commented Dec 25, 2024

Since we'll want to render sections from objects to strings, section items (DocstringElement) should get a new hardcoded attribute that tells whether their annotation was hardcoded into the docstring or obtained via other means (typically from a signature's parameters), so that we know whether to output them again or not when rendering.

Additionally, to reduce the cost of docstring modifications, which will then involve parsing then redumping, as well as to prevent spurious warnings too early in the process, I suggest we rework our parsers to use two phases:

  1. split blocks corresponding to sections
  2. parse each block as a section

Blocks should be yielded alongside their section kind, as well as their ending (starting?) line numbers. This will allow these use-cases:

  • Quickly identify where to insert contents, without actually parsing sections. We'll be able to insert contents in docstring.value without parsing and redumping anything.
  • Retrieve (and parse) only specific sections. For example PydanticAI is only interested in Parameters sections and does not care about text sections or Returns sections (IIUC). See Cannot process return type annotations for pydantic types pydantic/pydantic-ai#491.

@pawamoy
Copy link
Member Author

pawamoy commented Dec 26, 2024

Hmmm, I'm in a rabbit hole.

Initially I wanted to stop caching docstrings, because extensions would modify the parsed sections themselves, and that would prevent changing (fixing?) the style later, for example with a local mkdocstrings option (see mkdocstrings/python#183).

But. If instead we tell the extension to parse and re-dump, that doesn't change anything. It will dump something with the wrong style. Even if we just insert contents without parsing, it will split and insert with the wrong style.

Ultimately, what this means, is that not caching doesn't solve the linked issue at all. What would solve the issue is to properly assign the right style to each docstring before they are ever parsed.

  • This can be done by applying the local mkdocstrings style recursively (as mentioned in the linked issue). No because extensions will still parse docstrings before the styles are re-assigned.
  • This can be done with the auto-detection feature of Griffe Insiders: https://mkdocstrings.github.io/griffe/reference/docstrings/#auto-style.
  • This can be done with a custom extension that searches for particular strings in a docstring to apply a style:
import re
import griffe


class ApplyDocstringStyle(griffe.Extension):
    def __init__(self, regex: str = "<!-- style: (google|numpy|sphinx) -->") -> None:
         self.regex = re.compile(regex)
    def on_instance(self, *, obj: griffe.Object, **kwargs) -> None:
        if obj.docstring:
            if match := self.regex.search(obj.docstring.value):
                obj.docstring.parser = match.group(1)
  • This can be done with a custom extension that searches for a comment at the end of a docstring:
import re
import griffe


class ApplyDocstringStyle(griffe.Extension):
    def __init__(self, regex: str = ".*# style: (google|numpy|sphinx)$") -> None:
         self.regex = re.compile(regex)
    def on_instance(self, *, obj: griffe.Object, **kwargs) -> None:
        if obj.docstring:
            if match := self.regex.search(obj.docstring.source):
                obj.docstring.parser = match.group(1)
  • This can be done with a custom extension that accepts paths or patterns, and styles to apply:
import griffe
from fnmatch import fnmatch

# path.to.obj1: google
# path.to.obj2: numpy
# path.to.obj3.*: sphinx
# path.to.obj4*: google

class ApplyDocstringStyle(griffe.Extension):
    def __init__(self, config: dict[str, str]):
        self.instances = {}
        self.globs = {}
        for key, value in config.items():
            if "*" in key:
                self.globs[key] = value
            else:
                self.instances[key] = value

    def on_instance(self, *, obj: griffe.Object, **kwargs) -> None:
        if obj.path in self.instances:
            if obj.docstring:
                obj.docsring.parser = self.instances[obj.path]
         else:
             for pattern, style in self.globs:
                 if fnmatch(obj.path, pattern):
                     if obj.docstring:
                         obj.docstring.parser = style

So in the end, we'll continue caching parsed sections. We can still rework our parsers as described above:

  1. split blocks corresponding to sections
  2. parse each block as a section

Blocks should be yielded alongside their section kind, as well as their ending (starting?) line numbers.

@pawamoy
Copy link
Member Author

pawamoy commented Dec 26, 2024

I'm really in a rabbit hole. With this recent change: 88fb6b6, parsing docstrings now computes inherited members. We stopped caching those to prevent issues, and updated both Griffe and mkdocstrings-python to avoid perf hits. But parsed docstrings are still cached, and could be cached too early (extension parsing and modifying cached sections). This would only be an issue in the scenario of mkdocstrings/python#175, but this is still annoying. So we might want to provide an API that lets extensions modify docstrings without parsing them, or at the very least without caching them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
griffe: docstrings Related to docstring parsing refactor Change suggestion, not a bug nor a feature.
Projects
None yet
Development

No branches or pull requests

1 participant