Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Monitoring and Stability #48

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

alimosaed
Copy link
Contributor

🆕 What's New?

  • Added comprehensive monitoring system for workflow status tracking
  • Implemented log rotation and archiving system
  • Added MongoDB integration with support for typed data insertion and JSON handling
  • Introduced text splitter component using Langchain for document chunking
  • Added support for custom user property keys in request-reply messaging
  • Implemented error queue configuration with max depth control
  • Added configurable reply topic placement in agent requests
  • Added API for connection status monitoring

🔧 Improvements

  • Enhanced security by removing confidential information from logs
  • Improved logging system:
    • Added connection/reconnection attempt logging
    • Implemented log file size control and archiving
    • Added exponential backoff for connection retry logging
  • Enhanced LLM request handling with retry mechanisms and cooling down policies:
    • Added retry based on failure types
    • Implemented cooling down based on error rates
    • Added timeout handling
    • Introduced NACK functionality for failed requests
  • Improved broker reconnection with "forever" retry policy
  • Added whitesource configuration file support in CI workflow
  • Enhanced error reporting during startup process
  • Optimized dependencies:
    • Removed Langchain and LiteLLM dependencies from core
  • Improved system shutdown handling:
    • Added proper SIGINT and SIGTERM signal handling
    • Enhanced thread termination in sleep mode

🐛 Bug Fixes

  • Fixed infinite error logging loop during broker disconnection
  • Resolved graceful shutdown issues
  • Fixed error handling in startup process to show error location
  • Resolved broker reconnection timeout issues
  • Fixed error queue blocking issues by implementing message dropping when queue is full
    Note: This release includes significant improvements to system stability, monitoring, and security, along with various bug fixes and performance enhancements.

cyrus2281 and others added 16 commits December 2, 2024 17:11
* Allowing stream overwrite at event level for LLM Chat

* Added overwrite flag
* Changes for request/response for streaming LLM access

* Updated with main

* update

---------

Co-authored-by: Edward Funnekotter <efunneko@gmail.com>
* fix: add dependencies to the toml file

* fix: handled miss configurations

* fix: resolve conflicts

* FEATURE: Enable stream overwrite for LLM Chat at the event level (#66)

* Allowing stream overwrite at event level for LLM Chat

* Added overwrite flag

* AI-95: Enhance request/response handling for streaming LLM access (#69)

* Changes for request/response for streaming LLM access

* Updated with main

* update

---------

Co-authored-by: Edward Funnekotter <efunneko@gmail.com>

* fix: add exception handler

* fix: add exception handler

---------

Co-authored-by: Art Morozov <artyom.morozov315@gmail.com>
Co-authored-by: Cyrus Mobini <68962752+cyrus2281@users.noreply.github.com>
Co-authored-by: Edward Funnekotter <efunneko@gmail.com>
* feat: drop error messages when the queue is full

* feat: add a text splitter component

* feat: updated docs

* fix: return the original example
… reply topic in the message (#74)

* If requested, insert the response topic according to the response_topic_insertion_expression

* More fixes after testing
* FEATURE: Enable stream overwrite for LLM Chat at the event level (#66)

* Allowing stream overwrite at event level for LLM Chat

* Added overwrite flag

* AI-95: Enhance request/response handling for streaming LLM access (#69)

* Changes for request/response for streaming LLM access

* Updated with main

* update

---------

Co-authored-by: Edward Funnekotter <efunneko@gmail.com>

* Include stack dump if there is an error on startup

---------

Co-authored-by: Art Morozov <artyom.morozov315@gmail.com>
Co-authored-by: Cyrus Mobini <68962752+cyrus2281@users.noreply.github.com>
* Added mongodb insert component

* type

* added search component

* applied comments

* updated docs
…st response user properties (#79)

* Added the option to support custom keys for reply and metadata for request reponse user properties

* fixed issue
…whitesoure scan results. (#80)

Investigate Solace AI connector (other solace ai libs) whitesoure scan results. (#80)
---------

Co-authored-by: John Corpuz <john.corpuz@solace.com>
* feat: add the forever retry

* feat: keep connecting

* feat: replace the reconnection

* ref: moved settings to a new yaml file

* feat: update documents

* ref: move common settings to base broker

* feat: generate documents

* fix: retrieve litellm config
* Added mongodb insert component

* type

* added search component

* applied comments

* updated docs

* Added the option to support custom keys for reply and metadata for request reponse user properties

* fixed issue

* Updated insert with type

* added docs

* added config value validation

* added value check for mongo insert
* feat: add monitring component

* fix: resolve a bug

* fix: add sleep time

* fix: add sleep time

* feat: add readiness and handle excessive logs

* fix: handle sleep error

* fix: handle sleep error

* feat: gracefully exit

* feat: set the log back

* fix: rename log fields

* fix: disabled monitoring

* fix: resolve log naming

* fix: resolved logging issues

* fix: resolve log

* fix: resolve log

* feat: remove dependency to Langchain

* feat: update monitoring

* feat: drop error messages when the queue is full

* feat: add a text splitter component

* feat: updated docs

* fix: resolve graceful termination issues

* fix: remove payloads from logs

* feat: add the forever retry

* feat: keep connecting

* Feat: add monitoring

* feat: replace the reconnection

* feat: refactor monitoring

* feat: add connection metric

* convert connection to async

* get metrics enum

* add types of metrics

* use metrics rather than metric values

* fix bug

* update type

* convert monitoring output to dictionary

* fix bug

* feat: add connection status

* feat: add reconnecting status

* feat: add reconnecting log and handled signals

* fix: update status

* fix: update log

* fix: fix bug

* fix: fix bug

* fix: resolve connection logs

* fix: handle threads

* fix: update connection state machine

* feat: add prefix to the broker logs

* fix: synchronize logs with connection attempts

* fix: remove datadog dependency

* fix: cover an exception

* ref: upgrade to latest pubsub and replace a metric

* ref: capsulate some variables

* ref: enable daemon for threads to close them safely

* ref: remove useless variable
* feat: add monitring component

* fix: resolve a bug

* fix: add sleep time

* fix: add sleep time

* feat: add readiness and handle excessive logs

* fix: handle sleep error

* fix: handle sleep error

* feat: gracefully exit

* feat: set the log back

* fix: rename log fields

* fix: disabled monitoring

* fix: resolve log naming

* fix: resolved logging issues

* fix: resolve log

* fix: resolve log

* feat: remove dependency to Langchain

* feat: update monitoring

* feat: drop error messages when the queue is full

* feat: add a text splitter component

* feat: updated docs

* fix: resolve graceful termination issues

* fix: remove payloads from logs

* feat: add the forever retry

* feat: keep connecting

* Feat: add monitoring

* feat: replace the reconnection

* feat: refactor monitoring

* feat: add connection metric

* convert connection to async

* get metrics enum

* add types of metrics

* use metrics rather than metric values

* fix bug

* update type

* convert monitoring output to dictionary

* fix bug

* feat: add connection status

* feat: add reconnecting status

* feat: add reconnecting log and handled signals

* fix: update status

* fix: update log

* fix: fix bug

* fix: fix bug

* fix: resolve connection logs

* fix: handle threads

* fix: update connection state machine

* feat: add prefix to the broker logs

* fix: synchronize logs with connection attempts

* fix: remove datadog dependency

* fix: cover an exception

* ref: upgrade to latest pubsub and replace a metric

* feat: add retry and timeout to litellm

* feat: add nack

* fix: replace exception with exception type

* fix: remove useless exceptions

* Create pull_request_template.md

* fix: update the default nack

* ref: replace nack string status with enumerations

* ref: generate docs

* ref: remove default value

* ref: move common imports to a module

* ref: update imports

* ref: update import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants