-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Transmit Cell Metadata #70
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Based on #68, what I was thinking was to use the existing message.metadata
field and add a metadata.cell_metadata
(or metadata.cell
) field there, rather than duplicating metadata into the header.
to transmit the cell metadata for the executed cell. | ||
|
||
In cases where Jupyter extensions generate their own metadata, that keys for | ||
the metadata should be namespaced using an extension-specific prefix. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be a good idea to recommend a sub-dictionary as well for this. There's no requirement (or even recommendation) that metadata be a one-level dict. If an extension has more than one or two settings, it probably makes sense to nest a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea; do you think that existing implementations might make the incorrect assumption that there aren't nested dicts, i.e., this change would break existing implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the python layer all of the library interactions within jupyter / nteract don't expect single layer dicts. I'd be surprised if the UIs did either since there's already multi-layer dicts present in nbformat fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be nothing that prevents transmission (and a space-limited journal; storage) of nested dicts i.e. with schema: JSONschema-validateable [nested] JSON and/or W3C SHACL-validateable JSON-LD with URIs for migrateable Linked Data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any risk associated with it. Dicts are generally nested throughout Jupyter and the only spec for metadata is that it's a JSONable dict, which can have aribtrary depth, just like is already true for header, content, etc. which can also have nested fields.
Hi folks, This PEP has been open for 3 weeks. We'll be collecting feedback for another week. On Monday July 12, we will request a vote. Thanks! Carol |
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/jep-70-transmit-cell-metadata/9877/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, I thought this proposal was adding a new field to the message protocol, but a closer reading indicates this understanding was wrong - it is using the existing metadata field all messages have. Is that correct?
From my understanding, thinking as a person that will need to implement this JEP and exactly what would need to be done, what this proposal is proposing is the following. All Jupyter clients that are executing code from the notebook format (i.e., there is a notion of "cell" and "cell metadata" in the client), when they are working with a "cell", must send all cell metadata from the notebook format to the kernel in any execute_request
, inspect_request
, or complete_request
messages. This cell metadata is merged into the top-level metadata dictionary of the above message. Is this a correct understanding? Is there anything else that would need to be implemented if this JEP is passed?
In other words, since the metadata field already exists in these messages, a frontend can already transmit this information if it wishes, so presumably a frontend and kernel can coordinate on this metadata already. Is this JEP really about mandating that the metadata is transmitted from the frontend?
"header" : { | ||
"msg_id": "...", | ||
"msg_type": "...", | ||
"metadata": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Reference-level explanation below, the metadata is not in the header. I think it should not be in the header, since the header will be transmitted back and forth many times (since parent_header is a copy of the original message's header). Can this example be changed to conform to the reference below, to move the metadata to the top level of the message rather than in the header?
Another few questions trying to understand the scope of this proposal:
|
I agree with Min's thoughts here, that we should separate cell metadata from other types of metadata that is transmitted. |
Min's thoughts here indicate we should think about this as a generic way of associating cell metadata with any message, so the answers to the above would generally be "yes", i.e., if the message is closely associated with a cell, the relevant metadata should be transmitted. |
I think it would be very clarifying to see exactly what changes are going to be made to the message spec document as part of this JEP, e.g., are there new prescribed fields in the message metadata dict, what their purpose is, etc. |
@jasongrout thanks for the review! Yes, I think we should specify that cell metadata SHOULD be transmitted as-is in |
Co-authored-by: Jason Grout <jasongrout@users.noreply.github.com>
Thanks for the thoughtful comments, @jasongrout!
I think you've done a great job at identifying a key tension in this JEP: cell metadata vs. client metadata. Looking at Min's comments from above, what do folks think about having two different fields in the message metadata, i.e.,
One of the primary scenarios for transmitting cell metadata is identifying the programming language in an
Creating an allow-list in this manner, allowing the notebook to specify it is a different take than what we originally rejected with the kernel being the source of truth. It does have the downside of additional complexity in implementation. Thinking out loud: if there is an explosion in the size of cell metadata down the road, we could mandate that requirement in a future JEP as an optimization that clients would be eager to implement. This would give us a simpler implementation for this JEP and give us time to gather evidence for the more complex implementation. What do you think? |
Here's a thought: instead of transmitting all cell metadata, or even a language name, how about we take a cue from our existing mechanism for identifying rich output formats, and instead transmit a mimetype field that identifies the format of code being transmitted? The mimetype of a typical python execute request could be I realize it's basically the same thing as a language field - a string that identifies to the kernel how to handle the execution/completion request. I usually think of a kernel as having a single "language" (this is implicit in the kernel spec, where the kernel has a "language" identified), but it seems okay to say that a given kernel can receive and act on multiple types of inputs. |
Yes- using a MIME type to identify a language would make perfect sense! In the prior art section of the JEP, I do mention a few other "meta-kernels" that exist as well that would benefit from this mechanism. I do like the examples that you give, and it seems like some of those would be metadata that is not likely to be persisted in the .ipynb itself (e.g., sending chunk of binary data) and some that are likely to be persisted, (e.g., setting values in the language runtime - of course depends on what those values represent ...). I wonder if there's a more general concept "runtime metadata?" where some of the metadata comes from cell metadata and others are synthesized by the front end/extensions at runtime? |
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: |
This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/notebook-cell-type-generaliastion/10703/1 |
Transmit Cell Metadata
Hello!
We (@willingc and @MSeal) would like to propose a new JEP that describes transmitting cell metadata to kernels. This PR contains the JEP proposal that we originally discussed in the Pre-Proposal: Transmit Cell Metadata Issue.
Based on the discussion in that issue, I've updated the JEP to reflect the desire to send the cell metadata as part of the message header vs. the original proposal which argued for sending it in the content.
@minrk has graciously volunteered to shepherd this through the JEP process. Looking forward to continuing the discussion here!
Thx!
-John, Carol, Matt
Resolve #68