Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cloudevents-server): do not stop the consumer when error happened in loop #204

Merged
merged 3 commits into from
Nov 16, 2024

Conversation

wuhuizuo
Copy link
Contributor

  • Do not stop it when read message failed or parse message failed.
  • Make the consumer group stable.

Signed-off-by: wuhuizuo wuhuizuo@126.com

…in loop

- Do not stop it when read message failed or parse message failed.
- Make the consumer group stable.

Signed-off-by: wuhuizuo <wuhuizuo@126.com>
Copy link

ti-chi-bot bot commented Nov 16, 2024

I have already done a preliminary review for you, and I hope to help you do a better job.

The pull request seems to be addressing an issue related to the handling of errors in a consumer loop within a cloudevents server. The key changes are:

  1. The consumer loop is refactored to not stop when a read or parse message error occurs. Instead, it logs the error and continues to the next message.
  2. The project structure is reorganized, and some files are moved to the cmd/server directory.
  3. The function columnChecker is renamed to checkColumn in cloudevents-server/ent/ent.go.
  4. The entgo.io/ent dependency is updated to a newer version, and the corresponding changes in the generated code are reflected.
  5. The main.go file is significantly refactored, with changes to the way the server and consumer group are started and stopped.
  6. The way context operation names are set in problemcaserun_query.go is changed to use constants from the ent package.
  7. The runtime.go file is updated to reflect the new version of the ent codegen.

Potential problems:

  1. Error handling: Although the consumer loop now continues processing after encountering an error, it's not clear if there is an upper limit to the number of consecutive errors that can occur before the consumer should stop. This could lead to an infinite loop of errors in certain scenarios.
  2. Code organization: The restructuring of the project files might affect other parts of the system if they rely on the previous structure or file paths.
  3. Dependency update: Updating the entgo.io/ent dependency could have wider impacts on the project, depending on what changes were introduced in the newer version.

Fixing suggestions:

  1. Implement a maximum error count or a backoff strategy to prevent potential infinite error loops.
  2. Ensure that the project restructuring is reflected in all parts of the system that might be affected.
  3. Thoroughly test the impacts of the entgo.io/ent dependency update to make sure it doesn't introduce unexpected issues.

@ti-chi-bot ti-chi-bot bot added the size/L label Nov 16, 2024
Signed-off-by: wuhuizuo <wuhuizuo@126.com>
@wuhuizuo
Copy link
Contributor Author

/approve

Copy link

ti-chi-bot bot commented Nov 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wuhuizuo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Nov 16, 2024
Copy link

ti-chi-bot bot commented Nov 16, 2024

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary of Changes

  1. Dockerfile: The build script was changed to include a new path for the server.
  2. File movements: Several *.go files were moved to a new directory named cmd/server.
  3. main.go: The main function was refactored to improve error handling and server shutdown procedure. Specifically, it now uses a channel to wait for an interrupt signal and then gracefully shuts down the services (HTTP server and consumer group).
  4. ent.go: A minor change was made to rename a function comment.
  5. problemcaserun_query.go: The query operations were refactored to use constants from the ent package instead of hardcoded strings.
  6. runtime.go: The version of the ent codegen was updated.
  7. go.mod and go.sum: Many dependencies were updated, including a major upgrade of the ent package.
  8. kafka.go: The consumer's Start method was modified to improve error handling. In the new version, if an error occurs when reading or parsing a message, it will log the error and continue, instead of returning the error and potentially stopping the consumer.

Potential Problems

  1. Breaking changes in dependencies: The pull request includes many changes to dependencies, including a major version upgrade of the ent package. This could potentially introduce breaking changes if the new versions are not fully compatible with the existing code.

  2. Error handling: The new error handling in kafka.go discards erroneous messages and continues processing. This could potentially lead to data loss if the errors are not addressed.

  3. Testing: The pull request does not include any new tests. Given the significant changes to the codebase, additional tests should be added to ensure that the new functionality works as expected.

Fixing Suggestions

  1. Check compatibility of new dependencies: Review the changelogs and documentation for the upgraded dependencies to ensure that they are compatible with the existing code. Adjust the code as necessary to work with the new versions.

  2. Improve error handling: Consider adding a retry mechanism for handling errors when reading or parsing messages in kafka.go. This would give the application a chance to recover from temporary errors.

  3. Add tests: Add tests to cover the new functionality in main.go and kafka.go. This will help ensure that the new code works as expected and prevent regressions in the future. For example, you could add tests to check that the server shuts down correctly when it receives an interrupt signal, and that the consumer correctly handles errors when reading or parsing messages.

@ti-chi-bot ti-chi-bot bot merged commit 58aa19e into main Nov 16, 2024
3 checks passed
@ti-chi-bot ti-chi-bot bot deleted the fix/consumer-group-not-stable2 branch November 16, 2024 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant