Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve monitoring to catch memory growth problems earlier #678

Closed
makortel opened this issue Sep 20, 2023 · 1 comment
Closed

Improve monitoring to catch memory growth problems earlier #678

makortel opened this issue Sep 20, 2023 · 1 comment

Comments

@makortel
Copy link
Collaborator

Would like to catch situations like linear growth, spikes at the end of the job, early much before they start to cause problems (i.e. job failures) in production

Could be something like

  • SimpleMemoryCheck monitors RSS, reports some growth metrics in the framework job report
  • WM propagates these metrics to monitoring
  • monitoring raises alerts if certain patterns start to occur
  • operator responding to alert opens CMSSW GitHub issue
@makortel
Copy link
Collaborator Author

makortel commented Nov 9, 2023

This campaign has effectively been superseded by #670

@makortel makortel closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2023
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Activity view (from Q3 2023 to Q2 2024) Nov 9, 2023
@makortel makortel removed the Activity label Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant