change: Stop caching parsed docstrings #340

pawamoy · 2024-12-05T12:59:39Z

Is your change request related to a problem? Please describe.

Cached parsed docstrings are causing more issues than they solve. See mkdocstrings/python#183. Cache invalidation of course! The second of the two hardest problems in computer science, after naming things and before off-by-1 errors.

Templates can very easily iterate on the sections once. Other programmatic uses can store the sections in a variable. No need to cache it. I think I initially decided to cache it because it would allow extensions to easily modify sections, which are data structures, instead of having to mess with the raw docstring. But we can solve this too, see next section.

Describe the solution you'd like

Instead of caching the parsed docstrings, provide utilities to make it as easy to modify raw docstrings. Typically:

appending a section: docstring.append(DocstringSectionStuff(...)). Here, we don't care what style is used: append will internally parse the docstring using the attached style (or auto-infer it when the feature is available), append the section, and re-dump everything.
that means we must implement the opposite of parsers: writers?
that will incur a small performance cost too, since we'll more often parse docstrings, but I don't expect them to be parsed a lot anyway (not a lot of extensions modify docstrings, and these extensions do not modify a lot of docstrings).
append is just a suggestion, we should design a good API. For example, and related to perfs, we should allow appending several sections at once, etc. Context manager (transactions)?

Describe alternatives you've considered

Not considering. Needs to be done.

Additional context

This will need a deprecation period of course, with warnings and all. We should also reach out to extension authors using cached parsed sections. We use this ourselves in griffe-warnings-deprecated I believe.

The text was updated successfully, but these errors were encountered:

pawamoy · 2024-12-25T23:09:04Z

Since we'll want to render sections from objects to strings, section items (DocstringElement) should get a new hardcoded attribute that tells whether their annotation was hardcoded into the docstring or obtained via other means (typically from a signature's parameters), so that we know whether to output them again or not when rendering.

Additionally, to reduce the cost of docstring modifications, which will then involve parsing then redumping, as well as to prevent spurious warnings too early in the process, I suggest we rework our parsers to use two phases:

split blocks corresponding to sections
parse each block as a section

Blocks should be yielded alongside their section kind, as well as their ending (starting?) line numbers. This will allow these use-cases:

Quickly identify where to insert contents, without actually parsing sections. We'll be able to insert contents in docstring.value without parsing and redumping anything.
Retrieve (and parse) only specific sections. For example PydanticAI is only interested in Parameters sections and does not care about text sections or Returns sections (IIUC). See Cannot process return type annotations for pydantic types pydantic/pydantic-ai#491.

pawamoy · 2024-12-26T00:25:37Z

Hmmm, I'm in a rabbit hole.

Initially I wanted to stop caching docstrings, because extensions would modify the parsed sections themselves, and that would prevent changing (fixing?) the style later, for example with a local mkdocstrings option (see mkdocstrings/python#183).

But. If instead we tell the extension to parse and re-dump, that doesn't change anything. It will dump something with the wrong style. Even if we just insert contents without parsing, it will split and insert with the wrong style.

Ultimately, what this means, is that not caching doesn't solve the linked issue at all. What would solve the issue is to properly assign the right style to each docstring before they are ever parsed.

~~This can be done by applying the local mkdocstrings style recursively (as mentioned in the linked issue).~~ No because extensions will still parse docstrings before the styles are re-assigned.
This can be done with the auto-detection feature of Griffe Insiders: https://mkdocstrings.github.io/griffe/reference/docstrings/#auto-style.
This can be done with a custom extension that searches for particular strings in a docstring to apply a style:

import re
import griffe


class ApplyDocstringStyle(griffe.Extension):
    def __init__(self, regex: str = "<!-- style: (google|numpy|sphinx) -->") -> None:
         self.regex = re.compile(regex)
    def on_instance(self, *, obj: griffe.Object, **kwargs) -> None:
        if obj.docstring:
            if match := self.regex.search(obj.docstring.value):
                obj.docstring.parser = match.group(1)

This can be done with a custom extension that searches for a comment at the end of a docstring:

import re
import griffe


class ApplyDocstringStyle(griffe.Extension):
    def __init__(self, regex: str = ".*# style: (google|numpy|sphinx)$") -> None:
         self.regex = re.compile(regex)
    def on_instance(self, *, obj: griffe.Object, **kwargs) -> None:
        if obj.docstring:
            if match := self.regex.search(obj.docstring.source):
                obj.docstring.parser = match.group(1)

This can be done with a custom extension that accepts paths or patterns, and styles to apply:

import griffe
from fnmatch import fnmatch

# path.to.obj1: google
# path.to.obj2: numpy
# path.to.obj3.*: sphinx
# path.to.obj4*: google

class ApplyDocstringStyle(griffe.Extension):
    def __init__(self, config: dict[str, str]):
        self.instances = {}
        self.globs = {}
        for key, value in config.items():
            if "*" in key:
                self.globs[key] = value
            else:
                self.instances[key] = value

    def on_instance(self, *, obj: griffe.Object, **kwargs) -> None:
        if obj.path in self.instances:
            if obj.docstring:
                obj.docsring.parser = self.instances[obj.path]
         else:
             for pattern, style in self.globs:
                 if fnmatch(obj.path, pattern):
                     if obj.docstring:
                         obj.docstring.parser = style

So in the end, we'll continue caching parsed sections. We can still rework our parsers as described above:

split blocks corresponding to sections

parse each block as a section

Blocks should be yielded alongside their section kind, as well as their ending (starting?) line numbers.

pawamoy · 2024-12-26T17:28:43Z

I'm really in a rabbit hole. With this recent change: 88fb6b6, parsing docstrings now computes inherited members. We stopped caching those to prevent issues, and updated both Griffe and mkdocstrings-python to avoid perf hits. But parsed docstrings are still cached, and could be cached too early (extension parsing and modifying cached sections). This would only be an issue in the scenario of mkdocstrings/python#175, but this is still annoying. So we might want to provide an API that lets extensions modify docstrings without parsing them, or at the very least without caching them.

pawamoy added griffe: docstrings Related to docstring parsing refactor Change suggestion, not a bug nor a feature. labels Dec 5, 2024

pawamoy self-assigned this Dec 5, 2024

pawamoy mentioned this issue Dec 5, 2024

bug: Docstring Style Local Override Does Not Work mkdocstrings/python#183

Open

pawamoy mentioned this issue Dec 26, 2024

change: Trigger instance events after package is loaded? #346

Open

pawamoy mentioned this issue Dec 26, 2024

change: Follow imports when loading #347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change: Stop caching parsed docstrings #340

change: Stop caching parsed docstrings #340

pawamoy commented Dec 5, 2024 •

edited

Loading

pawamoy commented Dec 25, 2024 •

edited

Loading

pawamoy commented Dec 26, 2024 •

edited

Loading

pawamoy commented Dec 26, 2024

change: Stop caching parsed docstrings #340

change: Stop caching parsed docstrings #340

Comments

pawamoy commented Dec 5, 2024 • edited Loading

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

pawamoy commented Dec 25, 2024 • edited Loading

pawamoy commented Dec 26, 2024 • edited Loading

pawamoy commented Dec 26, 2024

pawamoy commented Dec 5, 2024 •

edited

Loading

pawamoy commented Dec 25, 2024 •

edited

Loading

pawamoy commented Dec 26, 2024 •

edited

Loading