Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle internal notifications during session cleanup #85

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

donghao1393
Copy link

@donghao1393 donghao1393 commented Dec 2, 2024

fix: handle internal notifications during session cleanup

Addresses an issue where internal notifications (e.g. 'cancelled') during session cleanup would trigger validation errors and crash the server.

Changes

  • Add session state tracking via _closed flag
  • Skip validation for internal notifications during cleanup phase
  • Improve progress context cleanup with proper async context manager
  • Add final_progress notification to ensure completion state is reported

Testing

Successfully tested with multiple concurrent requests with progress tracking:

  • Cleanup proceeds without validation errors
  • Progress notifications complete properly
  • No server crashes during shutdown

1. Add proper session cleanup handling
   - Track session state with _closed flag
   - Handle cancellation and cleanup errors gracefully
   - Skip notification validation during cleanup

2. Improve progress context
   - Add final_progress method
   - Send completion notification in finally block
   - Handle progress cleanup properly

This fix addresses issues with session cleanup causing validation
errors and improves progress notification reliability.
@donghao1393 donghao1393 changed the title fix: improve session cleanup and progress handling fix: handle internal notifications during session cleanup Dec 11, 2024
@rusiaaman rusiaaman mentioned this pull request Dec 15, 2024
9 tasks
Copy link
Member

@dsp-ant dsp-ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. I think it's the right direction but needs a bit more refinement. If you don't mind taking a look at my requested changes. I think we are close to getting this in.

src/mcp/shared/session.py Outdated Show resolved Hide resolved
src/mcp/shared/session.py Outdated Show resolved Hide resolved
src/mcp/shared/session.py Outdated Show resolved Hide resolved
…ancelled

According to MCP spec, the cancellation notification method should
be 'notifications/cancelled' instead of 'cancelled'.
The try-except block was unnecessary since getattr already handles
the case where the attribute doesn't exist by returning None.
- Always validate all notifications to ensure proper model structure
- Handle cancellation notifications separately (no forwarding to stream)
- Simplify notification type checking and error handling
- Improve code structure and documentation
@donghao1393
Copy link
Author

Hi @dsp-ant , thank you for your advice. I have made commit. Would you help to check again?

@dsp-ant dsp-ant self-requested a review December 19, 2024 10:12
- Add CancelledNotification class for handling cancelled requests
- Add CancelledNotification to ClientNotification union type
- Improve type safety for cancelled notifications
- Add _closed flag to track session state
- Skip notification validation during cleanup phase
- Handle internal notifications gracefully during shutdown

This fixes an issue where internal notifications (e.g. 'cancelled')
would trigger validation errors during session cleanup.
@donghao1393
Copy link
Author

Technical Details

  • The issue occurred when internal notifications were received during cleanup
  • Added _closed flag to track session state accurately
  • Modified notification handling to be cleanup-aware
  • Ensures graceful shutdown even with pending notifications

Impact

  • Prevents validation errors during shutdown
  • Maintains session stability during cleanup
  • No interference with normal operation
  • Compatible with existing notification patterns

This approach keeps the fix focused on the core issue while maintaining
backward compatibility.

@txbm
Copy link

txbm commented Jan 6, 2025

@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - txbm@1918562

@donghao1393
Copy link
Author

donghao1393 commented Jan 6, 2025

@dsp-ant @donghao1393 - if I may suggest a more straight-forward fix for this inline with codebase patterns? - txbm@1918562

I have tested this commit on my local and it's working. I think this is a better solution. It fixed type definition for cancelled notifications to match protocol implementation. This allows proper handling of cancellation messages without requiring changes to the notification processing logic. What do you think @dsp-ant ?

@orliesaurus
Copy link

Please can you merge this fix in, it's breaking my server on claude desktop app - thank you!

Copy link

@jerome3o-anthropic jerome3o-anthropic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @donghao1393 - It seems to me that this can be simplified a lot now that there is the canceled notification types.

I'm going to be doing a loop on notifications in this sdk soon, and for my understanding (and to write some tests) I'd love it if you could also provide an example that reproduces this issue? would it just be a server with a long running task and the client sending of a cancel notification?

@@ -204,6 +215,12 @@ async def _send_response(
)
await self._write_stream.send(JSONRPCMessage(jsonrpc_response))

def _is_cancellation_notification(self, message_root: JSONRPCNotification) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should no longer be needed now that there are the correct types defined

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. I share your instinct about simplifying the code based on types, but I have some concerns:

  1. The cancellation notification we receive is actually using "cancelled" as the method name, not "notifications/cancelled". This is why we need the explicit check in _is_cancellation_notification.

  2. Even with the type definitions, the runtime behavior might still need this handling because:

    • The notification comes from the transport layer before type validation
    • We need to handle both formats for backwards compatibility
    • The validation would fail before we can process the notification properly

Would it help if I add some comments explaining this in the code? Or would you prefer to first verify this with the test case I mentioned above?

await self._received_notification(notification)
await self._incoming_message_stream_writer.send(notification)
# check if it is a cancellation notification
if self._is_cancellation_notification(message.root):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be significantly simplified now that we have the types defined?

@donghao1393
Copy link
Author

donghao1393 commented Jan 18, 2025

Hey @donghao1393 - It seems to me that this can be simplified a lot now that there is the canceled notification types.

I'm going to be doing a loop on notifications in this sdk soon, and for my understanding (and to write some tests) I'd love it if you could also provide an example that reproduces this issue? would it just be a server with a long running task and the client sending of a cancel notification?

Yes, you're right. A reproduction case would be a server with a long-running task (like image generation) that typically takes more than a few seconds, and the client sending a cancel notification during the process. I can provide a minimal example:

  1. add this of my mcp service
  2. A client that initiates the task and sends a cancel notification, like:
    • 1st prompt: "create an image using dall-e model to depict a dog standing on the moon watching the earth"
    • 2nd prompt (in the same conversation): "create another image that the dog was running"
    • expected: the conversation should return both 2 images
    • problem: if the cancel notification made the service down in 1st prompt, the 2nd prompt would be failed to send request
  3. inspect the log file mcp-server-openai.log so that we can observe the cancellation behavior

This would help validate both the current behavior and any proposed changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants