-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rollout restart agent e2e test #2799
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis pull request introduces enhancements to the end-to-end (E2E) testing infrastructure for the Vald project. The changes focus on adding configurability to K3D storage setup, creating a new rollout agent configuration, and implementing a new E2E test scenario for agent rollout restart. The modifications span multiple files, including GitHub Actions workflows, Makefiles, Kubernetes client implementations, and test scripts, with the primary goal of improving testing capabilities and system resilience. Changes
Possibly related PRs
Suggested Labels
Suggested Reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
[CHATOPS:HELP] ChatOps commands.
|
4ae3d97
to
9985751
Compare
Deploying vald with Cloudflare Pages
|
96c2ce3
to
00434e0
Compare
573283f
to
bd747ff
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/e2e/kubernetes/kubectl/kubectl.go (1)
45-86
: LGTM! Well-structured implementation of resource rollout management.The implementation is robust with:
- Proper error handling and resource cleanup with deferred Close() calls
- Concurrent output handling using goroutines and bufio.Scanner
- Clear command construction with timeout parameter
However, consider adding debug logging using t.Logf for better test observability.
func RolloutResourceName( ctx context.Context, t *testing.T, resource string, name string, timeout string, ) error { t.Helper() + t.Logf("Rolling out resource %s/%s with timeout %s", resource, name, timeout) cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name) if err := runCmd(t, cmd); err != nil { + t.Logf("Failed to restart resource: %v", err) return err } r := strings.Join([]string{resource, name}, "/") to := strings.Join([]string{"--timeout", timeout}, "=") cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to)tests/e2e/kubernetes/client/client.go (1)
210-233
: LGTM! Robust implementation of StatefulSet readiness check.The implementation includes:
- Proper context handling with timeout
- Clear readiness condition check
- Efficient polling with ticker
Consider adding exponential backoff to reduce API server load.
+import "k8s.io/client-go/util/retry" func (cli *client) WaitForStatefulSetReady( ctx context.Context, namespace, name string, timeout time.Duration, ) (ok bool, err error) { ctx, cancel := context.WithTimeout(ctx, timeout) defer cancel() - tick := time.NewTicker(time.Second) - defer tick.Stop() - - for { + return true, retry.OnError(retry.DefaultBackoff, func() error { ss, err := cli.clientset.AppsV1().StatefulSets(namespace).Get(ctx, name, metav1.GetOptions{}) if err != nil { - return false, err + return err } if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas { - return true, nil + return nil } - select { - case <-ctx.Done(): - return false, ctx.Err() - case <-tick.C: - } - } + return fmt.Errorf("statefulset %s/%s not ready", namespace, name) + }) }Makefile.d/e2e.mk (1)
92-96
: LGTM! Well-structured test target for agent rollout restart.The target follows the established pattern for e2e test targets.
Note: There's a typo in the target name:
e2e/rollaout/restart/agent
should bee2e/rollout/restart/agent
.-.PHONY: e2e/rollaout/restart/agent +.PHONY: e2e/rollout/restart/agent ## run rollout-restart agent e2e -e2e/rollout/restart/agent: +e2e/rollout/restart/agent: $(call run-e2e-crud-test,-run TestE2EAgentRolloutRestart)tests/e2e/crud/crud_test.go (1)
999-1103
: Consider improving error handling and test reliability.The test implementation has a few areas that could be enhanced:
- The sleep duration between search attempts is hardcoded to 10 seconds.
- The error aggregation could potentially consume a lot of memory if many errors occur.
- The test might not catch all DEADLINE_EXCEEDED errors due to the sleep interval.
Consider these improvements:
- time.Sleep(10 * time.Second) + time.Sleep(time.Second) // More frequent checks to catch errors + if serr != nil && errors.Is(serr, context.DeadlineExceeded) { + // Early exit on first deadline exceeded error + return + }Also, consider adding a timeout context for the entire test to prevent it from running indefinitely if something goes wrong:
- ctx, cancel := context.WithCancel(context.Background()) + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.github/actions/setup-e2e/action.yaml
(2 hunks).github/actions/setup-k3d/action.yaml
(2 hunks).github/helm/values/values-rollout-agent.yaml
(1 hunks).github/workflows/e2e.yaml
(2 hunks)Makefile
(1 hunks)Makefile.d/e2e.mk
(1 hunks)Makefile.d/functions.mk
(1 hunks)tests/e2e/crud/crud_test.go
(5 hunks)tests/e2e/kubernetes/client/client.go
(2 hunks)tests/e2e/kubernetes/kubectl/kubectl.go
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Run formatter
.github/helm/values/values-rollout-agent.yaml
[error] 2-2: File needs formatting. Copyright year needs to be updated from 2024 to 2025. Please execute make format
locally to fix this issue.
🪛 actionlint (1.7.4)
.github/workflows/e2e.yaml
362-362: shellcheck reported issue in this script: SC2086:info:1:42: Double quote to prevent globbing and word splitting
(shellcheck)
377-377: shellcheck reported issue in this script: SC2086:info:12:23: Double quote to prevent globbing and word splitting
(shellcheck)
🔇 Additional comments (12)
tests/e2e/kubernetes/client/client.go (1)
71-75
: LGTM! Clear interface definition for StatefulSet readiness check.The interface method signature is well-defined with appropriate parameters.
Makefile.d/functions.mk (1)
166-166
: LGTM! Good addition of resource readiness wait parameter.The parameter follows the established pattern and integrates well with the existing function.
tests/e2e/crud/crud_test.go (1)
66-67
: LGTM! New duration variable for resource readiness.The addition of
waitResourceReadyDuration
aligns with the PR's objective to handle agent rolling updates effectively..github/helm/values/values-rollout-agent.yaml (1)
34-60
: Review agent configuration for production readiness.The agent configuration has several points to consider:
- The
terminationGracePeriodSeconds
of 600 seconds might be too long for CI/CD environments.- The storage size of 500Mi might be insufficient for production workloads.
- The auto index duration limit of 2m might be too short for large datasets.
Please verify these settings in your environment:
.github/actions/setup-e2e/action.yaml (1)
35-38
: LGTM! Storage configuration for k3d setup.The addition of
require_k3d_storage
parameter and its usage in the k3d setup step is well-implemented and properly documented.Also applies to: 92-92
.github/actions/setup-k3d/action.yaml (1)
39-42
: LGTM! K3d storage deployment configuration.The implementation of storage deployment is clean and follows the action's existing patterns.
Also applies to: 131-135
.github/workflows/e2e.yaml (5)
351-359
: LGTM! Job configuration is well-structured.The job configuration follows GitHub Actions best practices with appropriate timeout and container setup.
364-368
: Good use of persistent storage for test stability.The
require_k3d_storage: true
configuration aligns with the PR objective of using persistent volumes and disabling in-memory mode.
376-393
: Test configuration looks comprehensive.The E2E test configuration includes:
- Appropriate dataset setup
- Balanced operation counts (1000 for search operations, 100 for others)
- Reasonable wait duration for index creation (3m)
- Proper environment variable passing
🧰 Tools
🪛 actionlint (1.7.4)
377-377: shellcheck reported issue in this script: SC2086:info:12:23: Double quote to prevent globbing and word splitting
(shellcheck)
463-463
: Good job including the new test in slack notifications.The new test job is correctly added to the slack-notification job dependencies.
369-375
: Verify the Helm values file for rollout agent configuration.The deployment uses a specific values file for rollout agent configuration.
Run this script to inspect the Helm values:
✅ Verification successful
✓ Helm values file for rollout agent is properly configured
The values file contains appropriate configurations including:
- Persistent volume settings with 500Mi storage and ReadWriteOncePod access mode
- Resource allocations for all components
- Agent replica configuration with parallel pod management
- NGT indexing parameters
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the Helm values file for rollout agent configuration # Check if the values file exists and inspect its content for PV and memory settings if [ -f ".github/helm/values/values-rollout-agent.yaml" ]; then echo "Checking values-rollout-agent.yaml content:" cat .github/helm/values/values-rollout-agent.yaml else echo "Warning: values-rollout-agent.yaml not found" fiLength of output: 2351
Makefile (1)
370-370
: LGTM! Good addition of resource wait timeout.The new
E2E_WAIT_FOR_RESOURCE_READY
variable with a default value of 3m is a good practice for ensuring test stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
tests/e2e/operation/operation.go (1)
204-204
: Consider making the timeout configurable.Since different E2E test scenarios might require different timeout values (e.g., regular operations vs rolling updates), consider making this timeout configurable through an environment variable.
Example implementation:
+ const defaultGRPCTimeout = 60 * time.Second + + func getGRPCTimeout() time.Duration { + if timeout := os.Getenv("E2E_GRPC_TIMEOUT"); timeout != "" { + if d, err := time.ParseDuration(timeout); err == nil { + return d + } + } + return defaultGRPCTimeout + } func (c *client) getGRPCConn() (*grpc.ClientConn, error) { return grpc.NewClient( c.host+":"+strconv.Itoa(c.port), grpc.WithInsecure(), grpc.WithKeepaliveParams( keepalive.ClientParameters{ Time: time.Second, - Timeout: 60 * time.Second, + Timeout: getGRPCTimeout(), PermitWithoutStream: true, }, ), ) }tests/e2e/operation/stream.go (2)
875-885
: Consider using defer for mutex unlock.While the error handling is correct, consider using
defer mu.Unlock()
right after the lock to prevent potential lock leaks in case of panics.- mu.Lock() - rerr = ierr - mu.Unlock() + mu.Lock() + defer mu.Unlock() + rerr = ierr
924-927
: Consider consolidating error handling patterns.The error handling pattern here differs from the goroutine's error handling. Consider using the same pattern with a local error variable for consistency.
- mu.Lock() - rerr = errors.Join(rerr, err) - mu.Unlock() - return + mu.Lock() + defer mu.Unlock() + rerr = errors.Join(rerr, err) + returntests/e2e/crud/crud_test.go (3)
999-1103
: Comprehensive test implementation with proper concurrency handling.The test implementation is well-structured with:
- Proper context handling with cancellation
- Concurrent search operations during rollout
- Thread-safe error aggregation
- Clean resource management with WaitGroup and channels
However, consider adding a timeout context for the entire test to prevent indefinite hanging.
- ctx, cancel := context.WithCancel(context.Background()) + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)Also, consider adding a constant for the sleep duration:
+ const searchRetryInterval = 10 * time.Second } - time.Sleep(10 * time.Second) + time.Sleep(searchRetryInterval)
1028-1034
: Consider adding retry mechanism with backoff.The search function could benefit from a retry mechanism with exponential backoff to handle transient failures more gracefully.
+ searchWithRetry := func() error { + backoff := time.Second + maxBackoff := 30 * time.Second + for i := 0; i < 3; i++ { + err := searchFunc() + if err == nil { + return nil + } + if st, ok := status.FromError(err); ok && st.Code() != codes.DeadlineExceeded { + return err + } + time.Sleep(backoff) + backoff *= 2 + if backoff > maxBackoff { + backoff = maxBackoff + } + } + return searchFunc() + }
1047-1057
: Consider adding metrics or logging for error tracking.The error handling in the search loop could benefit from metrics or logging to track the frequency and patterns of deadline exceeded errors.
err = searchFunc() if err != nil { st, ok := status.FromError(err) if ok && st.Code() == codes.DeadlineExceeded { + t.Logf("Search operation timed out: %v", err) _, _, rerr := status.ParseError(err, codes.DeadlineExceeded, "an error occurred") mu.Lock() serr = errors.Join(serr, rerr) mu.Unlock() } + } else { + t.Log("Search operation completed successfully") }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/e2e/crud/crud_test.go
(5 hunks)tests/e2e/operation/operation.go
(1 hunks)tests/e2e/operation/stream.go
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
- GitHub Check: runner / go build
🔇 Additional comments (5)
tests/e2e/operation/operation.go (1)
204-204
: LGTM! Increasing the timeout to handle rolling updates.The increase from 5s to 60s timeout is a reasonable change to handle potential delays during agent rolling updates and should help prevent DEADLINE_EXCEEDED errors in the E2E tests.
tests/e2e/operation/stream.go (2)
858-869
: Improve error handling synchronization.The addition of mutex synchronization and local error variable improves thread safety when handling errors in the streaming operation.
898-898
: LGTM: Thread-safe error aggregation.The error joining operation is correctly using the local error variable before synchronizing with the shared error.
tests/e2e/crud/crud_test.go (2)
66-67
: LGTM: Clear variable naming.The new duration variables have clear, descriptive names that indicate their purpose.
137-141
: LGTM: Proper error handling in initialization.The initialization of
waitResourceReadyDuration
includes appropriate error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/e2e/crud/crud_test.go (1)
1035-1062
: Well-implemented concurrent search with proper error handling.The implementation correctly uses sync primitives (WaitGroup, Mutex) and channels for coordination. The error handling specifically targets DEADLINE_EXCEEDED errors as required.
Consider making the sleep duration (10s) configurable via a flag parameter for better test flexibility:
+var searchRetryInterval time.Duration + func init() { + searchRetry := flag.String("search-retry-interval", "10s", "interval between search retries during rollout") + searchRetryInterval, err = time.ParseDuration(*searchRetry) + if err != nil { + panic(err) + } }Then update line 1059:
- time.Sleep(10 * time.Second) + time.Sleep(searchRetryInterval)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tests/e2e/crud/crud_test.go
(5 hunks)tests/e2e/operation/operation.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/e2e/operation/operation.go
⏰ Context from checks skipped due to timeout of 90000ms (70)
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: CodeQL
- GitHub Check: runner / go build
- GitHub Check: check-format-diff
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
🔇 Additional comments (3)
tests/e2e/crud/crud_test.go (3)
66-67
: LGTM! Well-structured variable initialization.The addition of
waitResourceReadyDuration
with proper error handling and a reasonable default timeout value of 3 minutes is appropriate for managing resource readiness checks.Also applies to: 101-141
1071-1099
: Comprehensive post-rollout verification.The test properly verifies the system state after rollout by:
- Checking index information
- Verifying object existence
- Testing object retrieval
- Confirming remove operations
1064-1069
: Verify rollout restart command execution.The implementation uses kubectl for rollout restart. Let's verify the command execution and logging.
✅ Verification successful
Implementation of rollout restart command is correct and complete
The kubectl rollout restart implementation includes proper command execution, error handling, status monitoring with timeout, and appropriate logging. The code follows best practices for both the command execution and test implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify kubectl rollout restart command usage # Test: Search for kubectl rollout restart command usage rg -A 5 'kubectl.*rollout.*restart'Length of output: 916
Script:
#!/bin/bash # Get the complete implementation of kubectl package functions ast-grep --pattern 'func RolloutResourceName($$$) { $$$ }' echo "---" ast-grep --pattern 'func runCmd($$$) { $$$ }'Length of output: 127
Script:
#!/bin/bash # Search for RolloutResourceName function implementation rg -A 15 "^func RolloutResourceName" tests/e2e/kubernetes/kubectl/ echo "---" # Search for runCmd function implementation rg -A 10 "^func runCmd" tests/e2e/kubernetes/kubectl/Length of output: 2022
@@ -141,4 +141,4 @@ RUN --mount=type=bind,target=.,rw \ | |||
&& make faiss/install \ | |||
&& rm -rf ${GOPATH}/src/github.com/${ORG}/${REPO}/* | |||
# skipcq: DOK-DL3002 | |||
USER root:root | |||
USER root:root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
da3dccf
to
0289138
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
.github/workflows/dockers-index-creation-image.yaml (1)
246-246
: LGTM! Consider updating documentation for multi-platform support.The change consistently implements multi-platform support. Consider updating the project documentation to reflect the new multi-architecture support.
Consider adding a note about multi-platform support in the project's documentation, particularly in the deployment guide.
.github/workflows/dockers-index-correction-image.yaml (1)
252-252
: LGTM! Consider implementing cross-platform E2E tests.The multi-platform support is consistently implemented. To ensure reliability across architectures:
- Consider running E2E tests on both AMD64 and ARM64 platforms
- Verify that the agent rollout restart functionality works correctly on both architectures
Consider implementing a matrix strategy in the E2E test workflow to validate the rollout restart functionality across both architectures:
strategy: matrix: platform: [linux/amd64, linux/arm64].github/workflows/dockers-gateway-mirror-image.yaml (1)
Line range hint
266-266
: Consistent multi-platform support across all Docker image workflows.The addition of
platforms: linux/amd64,linux/arm64
is consistently applied across all Docker image workflow files:
- dockers-manager-index-image.yaml:266
- dockers-agent-faiss-image.yaml:268
- dockers-agent-ngt-image.yaml:272
- dockers-agent-sidecar-image.yaml:298
This systematic update ensures that all components support both AMD64 and ARM64 architectures, which is particularly important for:
- Running e2e tests across different architectures
- Supporting diverse deployment environments
- Ensuring consistent behavior during agent rollout tests
Consider documenting the multi-architecture support in the project's deployment guide, including any specific considerations for running e2e tests on different architectures.
Also applies to: 268-268, 272-272, 298-298
.github/workflows/dockers-agent-sidecar-image.yaml (1)
298-298
: LGTM! Platform configuration is consistent.The multi-arch support configuration matches the pattern used across other workflow files.
Consider implementing a CI test to verify that the built images work correctly on both architectures, especially for components with architecture-specific optimizations like NGT and FAISS.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
hack/docker/gen/main.go
is excluded by!**/gen/**
📒 Files selected for processing (46)
.github/workflows/dockers-agent-faiss-image.yaml
(1 hunks).github/workflows/dockers-agent-image.yaml
(1 hunks).github/workflows/dockers-agent-ngt-image.yaml
(1 hunks).github/workflows/dockers-agent-sidecar-image.yaml
(1 hunks).github/workflows/dockers-benchmark-job-image.yaml
(1 hunks).github/workflows/dockers-benchmark-operator-image.yaml
(1 hunks).github/workflows/dockers-dev-container-image.yaml
(1 hunks).github/workflows/dockers-discoverer-k8s-image.yaml
(1 hunks).github/workflows/dockers-example-client-image.yaml
(1 hunks).github/workflows/dockers-gateway-filter-image.yaml
(1 hunks).github/workflows/dockers-gateway-lb-image.yaml
(1 hunks).github/workflows/dockers-gateway-mirror-image.yaml
(1 hunks).github/workflows/dockers-helm-operator-image.yaml
(1 hunks).github/workflows/dockers-index-correction-image.yaml
(1 hunks).github/workflows/dockers-index-creation-image.yaml
(1 hunks).github/workflows/dockers-index-deletion-image.yaml
(1 hunks).github/workflows/dockers-index-operator-image.yaml
(1 hunks).github/workflows/dockers-index-save-image.yaml
(1 hunks).github/workflows/dockers-manager-index-image.yaml
(1 hunks).github/workflows/dockers-readreplica-rotate-image.yaml
(1 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)
✅ Files skipped from review due to trivial changes (26)
- dockers/binfmt/Dockerfile
- dockers/tools/benchmark/operator/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/operator/helm/Dockerfile
- dockers/index/job/deletion/Dockerfile
- dockers/dev/Dockerfile
- dockers/agent/core/agent/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/example/client/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/index/job/save/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/index/operator/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/agent/sidecar/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
⏰ Context from checks skipped due to timeout of 90000ms (110)
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
🔇 Additional comments (17)
.github/workflows/dockers-dev-container-image.yaml (1)
58-58
: LGTM! Consider monitoring build times.The addition of multi-architecture support is well-structured. However, building for multiple architectures may increase build duration.
Run this script to check current build times:
✅ Verification successful
Multi-arch builds show expected duration patterns
The workflow is well-optimized with event-based build configurations:
- Full builds (27 minutes) run only for pull request events
- Quick validation builds (0-1 minutes) run for pull request target events
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check workflow run times before and after the platform addition gh run list --workflow=dockers-dev-container-image.yaml --limit 10 --json durationInMinutes,status,conclusion \ | jq -r '.[] | select(.conclusion=="success") | "Duration: \(.durationInMinutes) minutes"'Length of output: 445
Script:
#!/bin/bash # Description: Check workflow run times using timestamp differences gh run list --workflow=dockers-dev-container-image.yaml --limit 10 --json startedAt,updatedAt,conclusion \ | jq -r '.[] | select(.conclusion=="success") | . as $run | ((($run.updatedAt | fromdateiso8601) - ($run.startedAt | fromdateiso8601)) / 60) | "Duration: \(. | floor) minutes"'Length of output: 415
Script:
#!/bin/bash # Description: Get detailed workflow run information gh run list --workflow=dockers-dev-container-image.yaml --limit 10 \ --json startedAt,conclusion,event,headBranch \ | jq -r '.[] | select(.conclusion=="success") | "\(.startedAt): Branch=\(.headBranch) Event=\(.event)"'Length of output: 753
.github/workflows/dockers-example-client-image.yaml (1)
68-68
: LGTM! Multi-architecture support added correctly.The platform specification for both amd64 and arm64 architectures is properly configured.
.github/workflows/dockers-agent-image.yaml (1)
74-74
: LGTM! Verify Rust compilation for both architectures.The multi-architecture support is correctly configured. Since this is a Rust-based core component, ensure that all Rust dependencies and build processes support both architectures.
Run this script to check Rust-related files for architecture-specific code:
✅ Verification successful
Multi-architecture support is properly configured ✅
The Rust codebase is architecture-agnostic and uses standard system libraries (BLAS, LAPACK) that are well-supported on both amd64 and arm64. The build configuration uses platform-independent flags and common C++ features (C++20, OpenMP) that work reliably across architectures.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check Rust files for architecture-specific code or dependencies rg -g 'rust/**/*.rs' -g 'rust/**/Cargo.toml' 'target_arch|cfg\(target_arch|platform-specific|target_os'Length of output: 105
Script:
#!/bin/bash # First, find all Rust-related files echo "=== Searching for Rust files ===" fd -e rs -e toml # Then, specifically look at Cargo.toml contents echo -e "\n=== Checking Cargo.toml contents ===" fd -g 'Cargo.toml' --exec cat {}Length of output: 9998
Script:
#!/bin/bash # Check build scripts for architecture-specific configurations echo "=== Checking build.rs files ===" fd build.rs --exec cat {}Length of output: 2674
.github/workflows/dockers-helm-operator-image.yaml (1)
68-68
: LGTM! Multi-architecture support added consistently.The platform configuration aligns with the project-wide standardization of multi-architecture support.
.github/workflows/dockers-index-operator-image.yaml (1)
244-244
: LGTM! Verify Go package compatibility.The multi-architecture support is correctly configured. Given the extensive list of Go package dependencies, ensure all packages support cross-compilation for both architectures.
Run this script to check Go dependencies for potential architecture-specific code:
✅ Verification successful
✅ Dependencies are cross-platform compatible
The codebase uses standard Go syscalls and has no mandatory CGO dependencies or architecture-specific code that would prevent cross-compilation. All platform-specific operations are properly abstracted through Go's standard library.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check Go files for architecture-specific code or CGO usage rg -g '*.go' 'runtime\.GOARCH|build constraints|CGO|syscall\.' ./internal/ ./pkg/Length of output: 5915
.github/workflows/dockers-readreplica-rotate-image.yaml (1)
242-242
: LGTM! Addition of multi-platform support enhances deployment flexibility.The addition of AMD64 and ARM64 platform support is a good practice that increases deployment flexibility across different architectures.
Please verify that the CI/CD infrastructure has the necessary QEMU emulation support for multi-platform builds:
✅ Verification successful
Multi-platform build infrastructure is properly configured
The reusable workflow has the necessary setup for multi-platform builds:
- QEMU emulation support is configured via
docker/setup-qemu-action
- Docker Buildx is properly set up via
docker/setup-buildx-action
- Both are configured to support the specified platforms (linux/amd64,linux/arm64)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify QEMU setup for multi-platform builds # Check if QEMU is registered for multi-platform builds docker buildx lsLength of output: 115
Script:
#!/bin/bash # Check for QEMU and buildx setup in the workflow cat .github/workflows/dockers-readreplica-rotate-image.yaml | grep -A 5 -B 5 "qemu\|buildx"Length of output: 93
Script:
#!/bin/bash # View the complete workflow file cat .github/workflows/dockers-readreplica-rotate-image.yamlLength of output: 8708
Script:
#!/bin/bash # Check the reusable workflow for QEMU and buildx setup cat .github/workflows/_docker-image.yamlLength of output: 4523
.github/workflows/dockers-index-save-image.yaml (1)
246-246
: LGTM! Consistent multi-platform support across workflows.The change maintains consistency with other workflow files by adding the same platform support.
.github/workflows/dockers-index-deletion-image.yaml (1)
246-246
: LGTM! Maintains consistent platform support.The addition aligns with the project-wide effort to support multiple architectures.
.github/workflows/dockers-gateway-lb-image.yaml (1)
260-260
: Excellent addition of multi-architecture support!The addition of
platforms: linux/amd64,linux/arm64
across all Docker image build workflows is a strategic enhancement that:
- Enables cross-platform compatibility by building images for both AMD64 and ARM64 architectures
- Improves accessibility for users running on different hardware platforms
- Aligns with modern containerization practices for broader platform support
This change is consistently applied across all Docker image workflows, ensuring uniform multi-architecture support throughout the project.
Let's verify the reusable workflow supports this parameter:
✅ Verification successful
Reusable workflow fully supports multi-architecture builds
The reusable workflow properly defines and handles the
platforms
parameter, correctly configuring buildx for multi-architecture support and passing the parameter to the build step.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the reusable workflow supports the platforms parameter # Check if the reusable workflow file exists and contains the platforms parameter rg -A 5 'platforms:' .github/workflows/_docker-image.yamlLength of output: 942
.github/workflows/dockers-discoverer-k8s-image.yaml (1)
262-262
: Same change as in gateway-lb workflow..github/workflows/dockers-gateway-filter-image.yaml (1)
262-262
: Same change as in gateway-lb workflow..github/workflows/dockers-benchmark-operator-image.yaml (1)
258-258
: Same change as in gateway-lb workflow..github/workflows/dockers-benchmark-job-image.yaml (1)
264-264
: Same change as in gateway-lb workflow..github/workflows/dockers-gateway-mirror-image.yaml (1)
264-264
: LGTM! Multi-platform support added correctly.The addition of
platforms: linux/amd64,linux/arm64
enables building Docker images for both AMD64 and ARM64 architectures, improving cross-platform compatibility.Since this is a generated file, let's verify the generator script:
✅ Verification successful
✅ Platform configuration properly implemented in generator
The multi-platform support is correctly implemented in the generator script (
hack/docker/gen/main.go
) with proper constants and fallback logic, ensuring consistent platform configuration (linux/amd64,linux/arm64
) across all 24 Docker image workflow files.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the platform parameter is consistently added across all generated workflow files # and that it originates from the generator script # Check the generator script for the platform parameter echo "Checking generator script for platform parameter..." rg -A 5 "platforms.*linux/amd64,linux/arm64" hack/docker/gen/main.go # Verify consistency across all generated workflow files echo "Verifying consistency across workflow files..." rg -l "platforms.*linux/amd64,linux/arm64" .github/workflows/dockers-*-image.yaml | wc -lLength of output: 1706
Script:
#!/bin/bash # Search for any platform-related configuration in the generator echo "Searching for platform-related code in generator..." rg -i "platform" hack/docker/gen/main.go # Check for template files that might contain the configuration echo "Checking for template files..." fd -t f "template" hack/docker/gen/ # If template files exist, search them for platform configuration echo "Searching template files for platform configuration..." rg -i "platform" hack/docker/gen/Length of output: 1535
.github/workflows/dockers-manager-index-image.yaml (1)
266-266
: LGTM! Verify compatibility with the reusable workflow.The platform specification for multi-arch support is correctly formatted and properly placed.
Run this script to verify that the reusable workflow supports the platforms parameter:
✅ Verification successful
Platforms parameter is properly supported in the reusable workflow
The reusable workflow fully supports the platforms parameter through:
- Input parameter definition with proper type and optionality
- Buildx setup with multi-arch support
- Correct usage in the build step
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if the reusable workflow supports the platforms parameter # Test: Look for platform-related configuration in the reusable workflow rg -A 5 'platforms:' .github/workflows/_docker-image.yamlLength of output: 942
.github/workflows/dockers-agent-faiss-image.yaml (1)
268-268
: LGTM! Platform configuration is consistent.The multi-arch support configuration matches the pattern used across other workflow files.
.github/workflows/dockers-agent-ngt-image.yaml (1)
272-272
: LGTM! Platform configuration is consistent.The multi-arch support configuration matches the pattern used across other workflow files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
tests/e2e/operation/stream.go (2)
272-279
: Consider making the timeout duration configurable.The hardcoded 3-second timeout might be too restrictive for larger datasets or high-latency environments. Consider making this timeout configurable through a parameter.
- to := time.Second * 3 + to := c.config.SearchTimeout
Line range hint
652-721
: LGTM! Consider extracting common error handling pattern.The mutex-protected error handling improvements are well-implemented and thread-safe. However, the same pattern is repeated across multiple methods.
Consider extracting this common error handling pattern into a helper function to reduce code duplication:
type streamErrorHandler struct { mu sync.Mutex rerr error } func (h *streamErrorHandler) handleError(err error) { h.mu.Lock() defer h.mu.Unlock() h.rerr = errors.Join(h.rerr, err) } func (h *streamErrorHandler) getError() error { h.mu.Lock() defer h.mu.Unlock() return h.rerr }Also applies to: 766-836, 882-951, 994-1062, 1155-1230
tests/e2e/crud/crud_test.go (4)
1031-1036
: Add test progress logging to searchFunc.Consider adding more detailed logging to help diagnose test failures.
searchFunc := func(ctx context.Context) error { + t.Logf("Executing search operation with %d vectors", len(ds.Test[searchFrom:searchFrom+searchNum])) return op.Search(t, ctx, operation.Dataset{ Test: ds.Test[searchFrom : searchFrom+searchNum], Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum], }) }
1071-1071
: Make the sleep duration configurable.The hardcoded 10-second sleep duration should be configurable to accommodate different test environments.
- time.Sleep(10 * time.Second) + time.Sleep(searchRetryInterval)
1055-1063
: Improve error handling specificity.The error handling could be more specific about the nature of deadline exceeded errors to help with debugging.
if ierr != nil { st, ok := status.FromError(ierr) if ok && st.Code() == codes.DeadlineExceeded { + t.Logf("Deadline exceeded during search: %v", ierr) _, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") mu.Lock() e = errors.Join(e, rerr) mu.Unlock() + } else if ok { + t.Logf("Non-deadline error occurred: code=%v, message=%v", st.Code(), st.Message()) } }
1111-1112
: Document the commented out Flush operation.Add a comment explaining why the Flush operation is commented out and why RemoveByTimestamp is used instead.
- // err = op.Flush(t, ctx) + // TODO: Flush operation is temporarily disabled due to potential issues during agent rollout. + // Using RemoveByTimestamp as an alternative cleanup method. err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/helm/values/values-rollout-agent.yaml
(1 hunks)pkg/agent/core/ngt/service/ngt.go
(1 hunks)tests/e2e/crud/crud_test.go
(6 hunks)tests/e2e/operation/stream.go
(17 hunks)
✅ Files skipped from review due to trivial changes (1)
- pkg/agent/core/ngt/service/ngt.go
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/helm/values/values-rollout-agent.yaml
⏰ Context from checks skipped due to timeout of 90000ms (180)
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
🔇 Additional comments (2)
tests/e2e/crud/crud_test.go (2)
59-69
: LGTM! Well-structured configuration variables.The new configuration variables are well-named and appropriately scoped for their intended use.
88-88
: LGTM! Proper flag initialization and error handling.The flags are well-configured with sensible defaults and proper error handling for duration parsing.
Also applies to: 104-144
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2799 +/- ##
==========================================
- Coverage 23.92% 23.92% -0.01%
==========================================
Files 546 546
Lines 54555 54558 +3
==========================================
- Hits 13054 13052 -2
- Misses 40717 40718 +1
- Partials 784 788 +4 ☔ View full report in Codecov by Sentry. |
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
Signed-off-by: vankichi <kyukawa315@gmail.com>
807732e
to
40a3ccc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (7)
tests/e2e/operation/multi.go (1)
33-38
: LGTM! Consider using a constant for the timeout value.The search configuration changes look good. The increased number of results and timeout settings align well with testing during agent rolling updates.
Consider extracting the timeout duration to a constant at the package level for better maintainability:
+const defaultSearchTimeout = 3 * time.Second func (c *client) MultiSearch(t *testing.T, ctx context.Context, ds Dataset) error { - to := time.Second * 3 + to := defaultSearchTimeouttests/e2e/kubernetes/kubectl/kubectl.go (1)
45-86
: Consider improving goroutine management and error handling.The implementation looks good overall, but there are a few areas for improvement:
- The goroutines for output streaming could leak if
cmd.Start()
fails.- Error messages from stderr could be more structured for better debugging.
Consider this improved implementation:
func RolloutResourceName(ctx context.Context, t *testing.T, resource string, name string, timeout string) error { t.Helper() cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name) if err := runCmd(t, cmd); err != nil { return err } r := strings.Join([]string{resource, name}, "/") to := strings.Join([]string{"--timeout", timeout}, "=") cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to) stdout, err := cmd.StdoutPipe() if err != nil { return err } defer stdout.Close() stderr, err := cmd.StderrPipe() if err != nil { return err } defer stderr.Close() + errCh := make(chan error, 2) + done := make(chan struct{}) + defer close(done) + if err := cmd.Start(); err != nil { return err } - go func() { + go func() { scanner := bufio.NewScanner(stdout) for scanner.Scan() { fmt.Println(scanner.Text()) } + if err := scanner.Err(); err != nil { + select { + case errCh <- fmt.Errorf("stdout scanner error: %w", err): + case <-done: + } + } }() - go func() { + go func() { scanner := bufio.NewScanner(stderr) + var errMsgs []string for scanner.Scan() { - fmt.Println("Error:", scanner.Text()) + errMsg := scanner.Text() + errMsgs = append(errMsgs, errMsg) + fmt.Println("Error:", errMsg) } + if err := scanner.Err(); err != nil { + select { + case errCh <- fmt.Errorf("stderr scanner error: %w", err): + case <-done: + } + } + if len(errMsgs) > 0 { + select { + case errCh <- fmt.Errorf("kubectl errors:\n%s", strings.Join(errMsgs, "\n")): + case <-done: + } + } }() - return cmd.Wait() + err = cmd.Wait() + if err != nil { + return err + } + + select { + case err := <-errCh: + return err + default: + return nil + } }This improved version:
- Properly manages goroutine lifecycles
- Collects and structures error messages
- Handles scanner errors
- Returns a more detailed error message when kubectl fails
tests/e2e/kubernetes/client/client.go (1)
71-75
: Align error handling with existing patterns.The implementation looks good, but consider aligning the error handling with the existing
WaitForPodReady
method for consistency.Consider this more consistent implementation:
func (cli *client) WaitForStatefulSetReady( ctx context.Context, namespace, name string, timeout time.Duration, ) (ok bool, err error) { ctx, cancel := context.WithTimeout(ctx, timeout) defer cancel() tick := time.NewTicker(time.Second) defer tick.Stop() for { ss, err := cli.clientset.AppsV1().StatefulSets(namespace).Get(ctx, name, metav1.GetOptions{}) - if err != nil { - return false, err - } + if err == nil { + if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas { + return true, nil + } + } - if ss.Status.UpdatedReplicas == ss.Status.Replicas && ss.Status.ReadyReplicas == ss.Status.Replicas { - return true, nil - } select { case <-ctx.Done(): return false, ctx.Err() case <-tick.C: + if err != nil { + // Log the error but continue polling + continue + } } } }This version:
- Aligns with the error handling pattern in
WaitForPodReady
- Continues polling on transient errors
- Maintains better readability with error handling at the end of the loop
Also applies to: 210-233
.github/workflows/e2e.yaml (1)
351-393
: LGTM! Consider adding documentation for the new test scenario.The new E2E test job is well-structured and follows the established patterns. The configuration properly handles the requirements for testing agent rollout restart scenarios.
Consider adding a comment block above the job to document:
- The purpose of this specific test scenario
- Why K3D storage is required
- What the values-rollout-agent.yaml configures
- Expected behavior during the test
Example:
# Test agent rollout restart behavior: # - Requires K3D storage for persistent volumes # - Uses values-rollout-agent.yaml to configure agent StatefulSet # - Performs rollout restart after index creation # - Verifies search operations during agent pod updates e2e-stream-crud-with-rollout-restart-agent:tests/e2e/operation/stream.go (2)
71-79
: Improved timeout configuration using time.DurationThe change from hardcoded nanoseconds to a more readable duration improves code maintainability.
Consider making the timeout duration configurable through a parameter instead of hardcoding 3 seconds:
-to := time.Second * 3 +func (c *client) Search(t *testing.T, ctx context.Context, ds Dataset, timeout time.Duration) error { + return c.SearchWithParameters( + t, + ctx, + ds, + 100, + -1.0, + 0.1, + timeout.Nanoseconds(), + DefaultStatusValidator, + ParseAndLogError, + ) +}
653-681
: Thread-safe error handling with proper error collectionThe error handling is now thread-safe with proper mutex protection and error collection.
Consider extracting the common error handling pattern into a reusable helper function to reduce code duplication across stream operations:
type streamErrorHandler struct { mu sync.Mutex rerr error } func (h *streamErrorHandler) handleError(err error) { h.mu.Lock() defer h.mu.Unlock() h.rerr = errors.Join(h.rerr, err) } func (h *streamErrorHandler) getError() error { h.mu.Lock() defer h.mu.Unlock() return h.rerr }tests/e2e/crud/crud_test.go (1)
87-88
: Configuration improvements for test parametersGood changes to make search parameters configurable, but the default values could use documentation.
Add comments explaining the reasoning behind the default values:
+// searchByIDNum is set to 3 to minimize test duration while still providing adequate coverage flag.IntVar(&searchByIDNum, "search-by-id-num", 3, "number of id-vector pairs used for search-by-id") +// searchConcurrency controls the number of concurrent search operations to stress test the system flag.IntVar(&searchConcurrency, "search-conn", 100, "number of search concurrency")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
hack/docker/gen/main.go
is excluded by!**/gen/**
📒 Files selected for processing (60)
.github/actions/setup-e2e/action.yaml
(2 hunks).github/actions/setup-k3d/action.yaml
(2 hunks).github/helm/values/values-rollout-agent.yaml
(1 hunks).github/workflows/dockers-agent-faiss-image.yaml
(1 hunks).github/workflows/dockers-agent-image.yaml
(1 hunks).github/workflows/dockers-agent-ngt-image.yaml
(1 hunks).github/workflows/dockers-agent-sidecar-image.yaml
(1 hunks).github/workflows/dockers-benchmark-job-image.yaml
(1 hunks).github/workflows/dockers-benchmark-operator-image.yaml
(1 hunks).github/workflows/dockers-dev-container-image.yaml
(1 hunks).github/workflows/dockers-discoverer-k8s-image.yaml
(1 hunks).github/workflows/dockers-example-client-image.yaml
(1 hunks).github/workflows/dockers-gateway-filter-image.yaml
(1 hunks).github/workflows/dockers-gateway-lb-image.yaml
(1 hunks).github/workflows/dockers-gateway-mirror-image.yaml
(1 hunks).github/workflows/dockers-helm-operator-image.yaml
(1 hunks).github/workflows/dockers-index-correction-image.yaml
(1 hunks).github/workflows/dockers-index-creation-image.yaml
(1 hunks).github/workflows/dockers-index-deletion-image.yaml
(1 hunks).github/workflows/dockers-index-operator-image.yaml
(1 hunks).github/workflows/dockers-index-save-image.yaml
(1 hunks).github/workflows/dockers-manager-index-image.yaml
(1 hunks).github/workflows/dockers-readreplica-rotate-image.yaml
(1 hunks).github/workflows/e2e.yaml
(2 hunks)Makefile
(2 hunks)Makefile.d/e2e.mk
(1 hunks)Makefile.d/functions.mk
(2 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)pkg/agent/core/ngt/service/ngt.go
(1 hunks)tests/e2e/crud/crud_test.go
(6 hunks)tests/e2e/kubernetes/client/client.go
(2 hunks)tests/e2e/kubernetes/kubectl/kubectl.go
(2 hunks)tests/e2e/operation/multi.go
(3 hunks)tests/e2e/operation/operation.go
(1 hunks)tests/e2e/operation/stream.go
(18 hunks)
🚧 Files skipped from review as they are similar to previous changes (53)
- dockers/index/job/correction/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/dev/Dockerfile
- dockers/gateway/filter/Dockerfile
- dockers/buildkit/Dockerfile
- dockers/index/operator/Dockerfile
- .github/workflows/dockers-dev-container-image.yaml
- dockers/tools/benchmark/operator/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- dockers/gateway/lb/Dockerfile
- dockers/gateway/mirror/Dockerfile
- dockers/agent/core/agent/Dockerfile
- dockers/manager/index/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- dockers/index/job/deletion/Dockerfile
- dockers/binfmt/Dockerfile
- .github/workflows/dockers-index-creation-image.yaml
- .github/workflows/dockers-agent-ngt-image.yaml
- dockers/index/job/creation/Dockerfile
- .github/workflows/dockers-discoverer-k8s-image.yaml
- .github/workflows/dockers-agent-faiss-image.yaml
- dockers/tools/benchmark/job/Dockerfile
- .github/workflows/dockers-agent-image.yaml
- .github/workflows/dockers-readreplica-rotate-image.yaml
- .github/workflows/dockers-index-deletion-image.yaml
- .github/workflows/dockers-manager-index-image.yaml
- .github/workflows/dockers-gateway-lb-image.yaml
- .github/workflows/dockers-index-operator-image.yaml
- dockers/example/client/Dockerfile
- Makefile
- dockers/tools/cli/loadtest/Dockerfile
- .github/workflows/dockers-benchmark-job-image.yaml
- pkg/agent/core/ngt/service/ngt.go
- .github/workflows/dockers-example-client-image.yaml
- tests/e2e/operation/operation.go
- .github/workflows/dockers-agent-sidecar-image.yaml
- .github/workflows/dockers-gateway-mirror-image.yaml
- .github/workflows/dockers-benchmark-operator-image.yaml
- .github/actions/setup-e2e/action.yaml
- .github/workflows/dockers-index-save-image.yaml
- .github/workflows/dockers-helm-operator-image.yaml
- .github/actions/setup-k3d/action.yaml
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/ci/base/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- .github/helm/values/values-rollout-agent.yaml
- dockers/index/job/save/Dockerfile
- dockers/agent/sidecar/Dockerfile
- Makefile.d/functions.mk
- Makefile.d/e2e.mk
- dockers/operator/helm/Dockerfile
- .github/workflows/dockers-index-correction-image.yaml
🔇 Additional comments (3)
.github/workflows/dockers-gateway-filter-image.yaml (1)
262-262
: LGTM! Good addition of multi-architecture support.Adding support for both AMD64 and ARM64 architectures is a positive change that improves cross-platform compatibility of the gateway-filter component.
Let's verify that the shared workflow template supports this parameter:
✅ Verification successful
Multi-architecture build support is properly configured and supported
The shared workflow template fully supports the platforms parameter and has the necessary infrastructure setup for multi-architecture builds using Docker Buildx.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the shared workflow template supports the platforms parameter # Check if the _docker-image.yaml workflow accepts the platforms parameter rg -A 5 "platforms:" .github/workflows/_docker-image.yaml # Verify that the GitHub Actions runners support multi-arch builds rg -A 5 "runs-on:" .github/workflows/_docker-image.yamlLength of output: 1623
tests/e2e/operation/multi.go (1)
61-61
: LGTM! Useful logging addition.The log statement helps track response counts during the E2E test, which is valuable for debugging.
.github/workflows/e2e.yaml (1)
463-463
: LGTM! Slack notification dependency added correctly.The new job is properly added to the slack-notification job dependencies.
mu.Lock() | ||
rerr = errors.Join(err, err) | ||
mu.Unlock() | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix redundant error joining
The error is being joined with itself, which is redundant and potentially confusing.
Apply this fix to all similar occurrences:
-rerr = errors.Join(err, err)
+rerr = err
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
mu.Lock() | |
rerr = errors.Join(err, err) | |
mu.Unlock() | |
return | |
mu.Lock() | |
rerr = err | |
mu.Unlock() | |
return |
|
||
_ = op.Upsert(t, ctx, operation.Dataset{ | ||
Train: ds.Train[insertFrom : insertFrom+insertNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
sleep(t, waitAfterInsertDuration) | ||
|
||
searchFunc := func(ctx context.Context) error { | ||
return op.MultiSearch(t, ctx, operation.Dataset{ | ||
Test: ds.Test[searchFrom : searchFrom+searchNum], | ||
Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum], | ||
}) | ||
} | ||
|
||
wg := sync.WaitGroup{} | ||
mu := sync.Mutex{} | ||
var serr error | ||
wg.Add(1) | ||
done := make(chan struct{}) | ||
go func() { | ||
defer wg.Done() | ||
for { | ||
select { | ||
case <-done: | ||
return | ||
default: | ||
eg, egctx := errgroup.New(ctx) | ||
for i := 0; i < searchConcurrency; i++ { | ||
eg.Go(func() (e error) { | ||
ierr := searchFunc(egctx) | ||
if ierr != nil { | ||
st, ok := status.FromError(ierr) | ||
if ok && st.Code() == codes.DeadlineExceeded { | ||
_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") | ||
mu.Lock() | ||
e = errors.Join(e, rerr) | ||
mu.Unlock() | ||
} | ||
} | ||
return | ||
}) | ||
} | ||
egerr := eg.Wait() | ||
mu.Lock() | ||
serr = errors.Join(serr, egerr) | ||
mu.Unlock() | ||
time.Sleep(5 * time.Second) | ||
} | ||
} | ||
}() | ||
|
||
// Wait for StatefulSet to be ready | ||
t.Log("rollout restart agent and waiting for agent pods ready...") | ||
err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String()) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
cnt, err := op.IndexInfo(t, ctx) | ||
if err != nil { | ||
if cnt == nil { | ||
t.Fatalf("an error occurred: err = %s", err) | ||
} | ||
t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err) | ||
} | ||
|
||
err = op.Exists(t, ctx, "0") | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
err = op.GetObject(t, ctx, operation.Dataset{ | ||
Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
err = op.Remove(t, ctx, operation.Dataset{ | ||
Train: ds.Train[removeFrom : removeFrom+removeNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
// Remove all vector data after the current - 1 hour. | ||
err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano()) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
close(done) | ||
wg.Wait() | ||
if serr != nil { | ||
t.Fatalf("an error occurred: %s", serr.Error()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce code duplication in test implementations
The test shares significant code with TestE2EAgentRolloutRestart.
Extract common test patterns into helper functions:
type searchTestConfig struct {
searchFunc func(context.Context) error
concurrency int
waitDuration time.Duration
cleanupSteps []func() error
}
func runConcurrentSearchTest(t *testing.T, cfg searchTestConfig) error {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()
var (
wg sync.WaitGroup
mu sync.Mutex
serr error
)
done := make(chan struct{})
// Run concurrent searches
wg.Add(1)
go func() {
defer wg.Done()
// ... existing search loop ...
}()
// Run cleanup steps
for _, cleanup := range cfg.cleanupSteps {
if err := cleanup(); err != nil {
return err
}
}
close(done)
wg.Wait()
return serr
}
Then use it in both tests:
func TestE2EHighConcurrencyMultiSearch(t *testing.T) {
// ... setup ...
err := runConcurrentSearchTest(t, searchTestConfig{
searchFunc: searchFunc,
concurrency: searchConcurrency,
waitDuration: 5 * time.Second,
cleanupSteps: []func() error{
// ... cleanup steps ...
},
})
if err != nil {
t.Fatal(err)
}
}
_ = op.Upsert(t, ctx, operation.Dataset{ | ||
Train: ds.Train[insertFrom : insertFrom+insertNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
sleep(t, waitAfterInsertDuration) | ||
|
||
searchFunc := func(ctx context.Context) error { | ||
return op.Search(t, ctx, operation.Dataset{ | ||
Test: ds.Test[searchFrom : searchFrom+searchNum], | ||
Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum], | ||
}) | ||
} | ||
|
||
wg := sync.WaitGroup{} | ||
mu := sync.Mutex{} | ||
done := make(chan struct{}) | ||
var serr error | ||
|
||
wg.Add(1) | ||
go func() { | ||
defer wg.Done() | ||
for { | ||
select { | ||
case <-done: | ||
return | ||
default: | ||
eg, egctx := errgroup.New(ctx) | ||
for i := 0; i < searchConcurrency; i++ { | ||
eg.Go(func() (e error) { | ||
ierr := searchFunc(egctx) | ||
if ierr != nil { | ||
st, ok := status.FromError(ierr) | ||
if ok && st.Code() == codes.DeadlineExceeded { | ||
_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") | ||
mu.Lock() | ||
e = errors.Join(e, rerr) | ||
mu.Unlock() | ||
} | ||
} | ||
return | ||
}) | ||
} | ||
egerr := eg.Wait() | ||
mu.Lock() | ||
serr = errors.Join(serr, egerr) | ||
mu.Unlock() | ||
time.Sleep(10 * time.Second) | ||
} | ||
} | ||
}() | ||
|
||
// Wait for StatefulSet to be ready | ||
t.Log("rollout restart agent and waiting for agent pods ready...") | ||
err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String()) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
cnt, err := op.IndexInfo(t, ctx) | ||
if err != nil { | ||
if cnt == nil { | ||
t.Fatalf("an error occurred: err = %s", err) | ||
} | ||
t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err) | ||
} | ||
|
||
err = op.Exists(t, ctx, "0") | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
err = op.GetObject(t, ctx, operation.Dataset{ | ||
Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
err = op.Remove(t, ctx, operation.Dataset{ | ||
Train: ds.Train[removeFrom : removeFrom+removeNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
// Remove all vector data after the current - 1 hour. | ||
err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano()) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
close(done) | ||
wg.Wait() | ||
if serr != nil { | ||
t.Fatalf("an error occurred: %s", serr) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve test robustness and cleanup
The test implementation is good but could be improved in several areas.
Consider these improvements:
- Add timeout context to prevent test from hanging:
-ctx, cancel := context.WithCancel(context.Background())
+ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
- Ensure proper cleanup of resources:
t.Cleanup(teardown)
+t.Cleanup(func() {
+ cancel()
+ close(done)
+ wg.Wait()
+})
- Add test failure conditions for specific scenarios:
if serr != nil {
+ if strings.Contains(serr.Error(), "DeadlineExceeded") {
+ t.Fatal("Search operations timed out during agent rollout")
+ }
t.Fatalf("an error occurred: %s", serr)
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// TestE2EAgentRolloutRestart tests that search requests succeed with rollout restart vald-agent. | |
func TestE2EAgentRolloutRestart(t *testing.T) { | |
t.Cleanup(teardown) | |
if kubeClient == nil { | |
var err error | |
kubeClient, err = client.New(kubeConfig) | |
if err != nil { | |
t.Skipf("TestE2EReadReplica needs kubernetes client but failed to create one: %s", err) | |
} | |
} | |
ctx, cancel := context.WithCancel(context.Background()) | |
defer cancel() | |
op, err := operation.New(host, port) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
_ = op.Upsert(t, ctx, operation.Dataset{ | |
Train: ds.Train[insertFrom : insertFrom+insertNum], | |
}) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
sleep(t, waitAfterInsertDuration) | |
searchFunc := func(ctx context.Context) error { | |
return op.Search(t, ctx, operation.Dataset{ | |
Test: ds.Test[searchFrom : searchFrom+searchNum], | |
Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum], | |
}) | |
} | |
wg := sync.WaitGroup{} | |
mu := sync.Mutex{} | |
done := make(chan struct{}) | |
var serr error | |
wg.Add(1) | |
go func() { | |
defer wg.Done() | |
for { | |
select { | |
case <-done: | |
return | |
default: | |
eg, egctx := errgroup.New(ctx) | |
for i := 0; i < searchConcurrency; i++ { | |
eg.Go(func() (e error) { | |
ierr := searchFunc(egctx) | |
if ierr != nil { | |
st, ok := status.FromError(ierr) | |
if ok && st.Code() == codes.DeadlineExceeded { | |
_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") | |
mu.Lock() | |
e = errors.Join(e, rerr) | |
mu.Unlock() | |
} | |
} | |
return | |
}) | |
} | |
egerr := eg.Wait() | |
mu.Lock() | |
serr = errors.Join(serr, egerr) | |
mu.Unlock() | |
time.Sleep(10 * time.Second) | |
} | |
} | |
}() | |
// Wait for StatefulSet to be ready | |
t.Log("rollout restart agent and waiting for agent pods ready...") | |
err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String()) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
cnt, err := op.IndexInfo(t, ctx) | |
if err != nil { | |
if cnt == nil { | |
t.Fatalf("an error occurred: err = %s", err) | |
} | |
t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err) | |
} | |
err = op.Exists(t, ctx, "0") | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
err = op.GetObject(t, ctx, operation.Dataset{ | |
Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum], | |
}) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
err = op.Remove(t, ctx, operation.Dataset{ | |
Train: ds.Train[removeFrom : removeFrom+removeNum], | |
}) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
// Remove all vector data after the current - 1 hour. | |
err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano()) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
close(done) | |
wg.Wait() | |
if serr != nil { | |
t.Fatalf("an error occurred: %s", serr) | |
} | |
} | |
// TestE2EAgentRolloutRestart tests that search requests succeed with rollout restart vald-agent. | |
func TestE2EAgentRolloutRestart(t *testing.T) { | |
t.Cleanup(teardown) | |
if kubeClient == nil { | |
var err error | |
kubeClient, err = client.New(kubeConfig) | |
if err != nil { | |
t.Skipf("TestE2EReadReplica needs kubernetes client but failed to create one: %s", err) | |
} | |
} | |
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute) | |
defer cancel() | |
t.Cleanup(func() { | |
cancel() | |
close(done) | |
wg.Wait() | |
}) | |
op, err := operation.New(host, port) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
_ = op.Upsert(t, ctx, operation.Dataset{ | |
Train: ds.Train[insertFrom : insertFrom+insertNum], | |
}) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
sleep(t, waitAfterInsertDuration) | |
searchFunc := func(ctx context.Context) error { | |
return op.Search(t, ctx, operation.Dataset{ | |
Test: ds.Test[searchFrom : searchFrom+searchNum], | |
Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum], | |
}) | |
} | |
wg := sync.WaitGroup{} | |
mu := sync.Mutex{} | |
done := make(chan struct{}) | |
var serr error | |
wg.Add(1) | |
go func() { | |
defer wg.Done() | |
for { | |
select { | |
case <-done: | |
return | |
default: | |
eg, egctx := errgroup.New(ctx) | |
for i := 0; i < searchConcurrency; i++ { | |
eg.Go(func() (e error) { | |
ierr := searchFunc(egctx) | |
if ierr != nil { | |
st, ok := status.FromError(ierr) | |
if ok && st.Code() == codes.DeadlineExceeded { | |
_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") | |
mu.Lock() | |
e = errors.Join(e, rerr) | |
mu.Unlock() | |
} | |
} | |
return | |
}) | |
} | |
egerr := eg.Wait() | |
mu.Lock() | |
serr = errors.Join(serr, egerr) | |
mu.Unlock() | |
time.Sleep(10 * time.Second) | |
} | |
} | |
}() | |
// Wait for StatefulSet to be ready | |
t.Log("rollout restart agent and waiting for agent pods ready...") | |
err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String()) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
cnt, err := op.IndexInfo(t, ctx) | |
if err != nil { | |
if cnt == nil { | |
t.Fatalf("an error occurred: err = %s", err) | |
} | |
t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err) | |
} | |
err = op.Exists(t, ctx, "0") | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
err = op.GetObject(t, ctx, operation.Dataset{ | |
Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum], | |
}) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
err = op.Remove(t, ctx, operation.Dataset{ | |
Train: ds.Train[removeFrom : removeFrom+removeNum], | |
}) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
// Remove all vector data after the current - 1 hour. | |
err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano()) | |
if err != nil { | |
t.Fatalf("an error occurred: %s", err) | |
} | |
close(done) | |
wg.Wait() | |
if serr != nil { | |
if strings.Contains(serr.Error(), "DeadlineExceeded") { | |
t.Fatal("Search operations timed out during agent rollout") | |
} | |
t.Fatalf("an error occurred: %s", serr) | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
tests/e2e/kubernetes/client/client.go (1)
210-233
: Consider making the polling interval configurable.The implementation looks good but could benefit from a configurable polling interval instead of the hardcoded 1-second interval.
Apply this diff to make the polling interval configurable:
func (cli *client) WaitForStatefulSetReady( - ctx context.Context, namespace, name string, timeout time.Duration, + ctx context.Context, namespace, name string, timeout time.Duration, pollInterval time.Duration, ) (ok bool, err error) { ctx, cancel := context.WithTimeout(ctx, timeout) defer cancel() - tick := time.NewTicker(time.Second) + if pollInterval <= 0 { + pollInterval = time.Second + } + tick := time.NewTicker(pollInterval) defer tick.Stop().github/workflows/e2e.yaml (1)
351-393
: Consider aligning test parameters with other e2e jobs.The implementation looks good, but for consistency:
- Consider increasing the insert and search counts to 10000 to match other e2e jobs
- Consider adding comments to explain the requirement for k3d storage
Apply this diff to align the test parameters:
make E2E_BIND_PORT=8081 \ E2E_DATASET_NAME=${{ env.DATASET }} \ - E2E_INSERT_COUNT=1000 \ - E2E_SEARCH_COUNT=1000 \ - E2E_SEARCH_BY_ID_COUNT=1000 \ + E2E_INSERT_COUNT=10000 \ + E2E_SEARCH_COUNT=10000 \ + E2E_SEARCH_BY_ID_COUNT=10000 \Also, add a comment above the k3d storage requirement:
uses: ./.github/actions/setup-e2e with: + # Required for persistent volumes during agent rollout require_k3d_storage: true
tests/e2e/operation/stream.go (1)
71-71
: Consider making the timeout duration configurable.The hardcoded 3-second timeout might be too restrictive for large-scale search operations. Consider making it configurable through parameters or environment variables to support different deployment scenarios.
type client struct { + searchTimeout time.Duration } func (c *client) Search(t *testing.T, ctx context.Context, ds Dataset) error { - to := time.Second * 3 + to := c.searchTimeout return c.SearchWithParameters(Also applies to: 79-79, 273-273, 280-280
tests/e2e/crud/crud_test.go (1)
1002-1121
: Comprehensive test for agent rollout restart.The test effectively validates system behavior during agent rollout restart. However, consider improving error type checking.
Consider extracting the error type checking into a helper function for better reusability:
+func isDeadlineExceeded(err error) bool { + if st, ok := status.FromError(err); ok { + return st.Code() == codes.DeadlineExceeded + } + return false +} func TestE2EAgentRolloutRestart(t *testing.T) { // ... eg.Go(func() (e error) { ierr := searchFunc(egctx) if ierr != nil { - st, ok := status.FromError(ierr) - if ok && st.Code() == codes.DeadlineExceeded { + if isDeadlineExceeded(ierr) { _, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") mu.Lock() e = errors.Join(e, rerr) mu.Unlock() } } return }) // ... }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
hack/docker/gen/main.go
is excluded by!**/gen/**
📒 Files selected for processing (60)
.github/actions/setup-e2e/action.yaml
(2 hunks).github/actions/setup-k3d/action.yaml
(2 hunks).github/helm/values/values-rollout-agent.yaml
(1 hunks).github/workflows/dockers-agent-faiss-image.yaml
(1 hunks).github/workflows/dockers-agent-image.yaml
(1 hunks).github/workflows/dockers-agent-ngt-image.yaml
(1 hunks).github/workflows/dockers-agent-sidecar-image.yaml
(1 hunks).github/workflows/dockers-benchmark-job-image.yaml
(1 hunks).github/workflows/dockers-benchmark-operator-image.yaml
(1 hunks).github/workflows/dockers-dev-container-image.yaml
(1 hunks).github/workflows/dockers-discoverer-k8s-image.yaml
(1 hunks).github/workflows/dockers-example-client-image.yaml
(1 hunks).github/workflows/dockers-gateway-filter-image.yaml
(1 hunks).github/workflows/dockers-gateway-lb-image.yaml
(1 hunks).github/workflows/dockers-gateway-mirror-image.yaml
(1 hunks).github/workflows/dockers-helm-operator-image.yaml
(1 hunks).github/workflows/dockers-index-correction-image.yaml
(1 hunks).github/workflows/dockers-index-creation-image.yaml
(1 hunks).github/workflows/dockers-index-deletion-image.yaml
(1 hunks).github/workflows/dockers-index-operator-image.yaml
(1 hunks).github/workflows/dockers-index-save-image.yaml
(1 hunks).github/workflows/dockers-manager-index-image.yaml
(1 hunks).github/workflows/dockers-readreplica-rotate-image.yaml
(1 hunks).github/workflows/e2e.yaml
(2 hunks)Makefile
(2 hunks)Makefile.d/e2e.mk
(1 hunks)Makefile.d/functions.mk
(2 hunks)dockers/agent/core/agent/Dockerfile
(1 hunks)dockers/agent/core/faiss/Dockerfile
(1 hunks)dockers/agent/core/ngt/Dockerfile
(1 hunks)dockers/agent/sidecar/Dockerfile
(1 hunks)dockers/binfmt/Dockerfile
(1 hunks)dockers/buildbase/Dockerfile
(1 hunks)dockers/buildkit/Dockerfile
(1 hunks)dockers/buildkit/syft/scanner/Dockerfile
(1 hunks)dockers/ci/base/Dockerfile
(1 hunks)dockers/dev/Dockerfile
(1 hunks)dockers/discoverer/k8s/Dockerfile
(1 hunks)dockers/example/client/Dockerfile
(1 hunks)dockers/gateway/filter/Dockerfile
(1 hunks)dockers/gateway/lb/Dockerfile
(1 hunks)dockers/gateway/mirror/Dockerfile
(1 hunks)dockers/index/job/correction/Dockerfile
(1 hunks)dockers/index/job/creation/Dockerfile
(1 hunks)dockers/index/job/deletion/Dockerfile
(1 hunks)dockers/index/job/readreplica/rotate/Dockerfile
(1 hunks)dockers/index/job/save/Dockerfile
(1 hunks)dockers/index/operator/Dockerfile
(1 hunks)dockers/manager/index/Dockerfile
(1 hunks)dockers/operator/helm/Dockerfile
(1 hunks)dockers/tools/benchmark/job/Dockerfile
(1 hunks)dockers/tools/benchmark/operator/Dockerfile
(1 hunks)dockers/tools/cli/loadtest/Dockerfile
(1 hunks)pkg/agent/core/ngt/service/ngt.go
(1 hunks)tests/e2e/crud/crud_test.go
(6 hunks)tests/e2e/kubernetes/client/client.go
(2 hunks)tests/e2e/kubernetes/kubectl/kubectl.go
(2 hunks)tests/e2e/operation/multi.go
(3 hunks)tests/e2e/operation/operation.go
(1 hunks)tests/e2e/operation/stream.go
(18 hunks)
🚧 Files skipped from review as they are similar to previous changes (52)
- dockers/buildkit/Dockerfile
- dockers/index/job/correction/Dockerfile
- dockers/buildbase/Dockerfile
- dockers/binfmt/Dockerfile
- dockers/agent/core/agent/Dockerfile
- .github/workflows/dockers-index-correction-image.yaml
- .github/workflows/dockers-helm-operator-image.yaml
- .github/workflows/dockers-dev-container-image.yaml
- .github/workflows/dockers-index-operator-image.yaml
- .github/workflows/dockers-index-creation-image.yaml
- dockers/operator/helm/Dockerfile
- .github/workflows/dockers-manager-index-image.yaml
- .github/workflows/dockers-agent-image.yaml
- dockers/gateway/lb/Dockerfile
- .github/workflows/dockers-index-save-image.yaml
- dockers/tools/benchmark/operator/Dockerfile
- dockers/dev/Dockerfile
- dockers/agent/core/ngt/Dockerfile
- dockers/tools/benchmark/job/Dockerfile
- .github/workflows/dockers-benchmark-operator-image.yaml
- dockers/gateway/mirror/Dockerfile
- dockers/agent/core/faiss/Dockerfile
- .github/workflows/dockers-gateway-lb-image.yaml
- dockers/index/job/save/Dockerfile
- Makefile.d/functions.mk
- dockers/manager/index/Dockerfile
- dockers/ci/base/Dockerfile
- pkg/agent/core/ngt/service/ngt.go
- .github/actions/setup-e2e/action.yaml
- .github/workflows/dockers-benchmark-job-image.yaml
- .github/workflows/dockers-agent-faiss-image.yaml
- .github/workflows/dockers-index-deletion-image.yaml
- .github/workflows/dockers-example-client-image.yaml
- .github/workflows/dockers-agent-ngt-image.yaml
- .github/workflows/dockers-readreplica-rotate-image.yaml
- .github/workflows/dockers-gateway-mirror-image.yaml
- .github/workflows/dockers-agent-sidecar-image.yaml
- tests/e2e/operation/operation.go
- Makefile.d/e2e.mk
- dockers/index/job/deletion/Dockerfile
- dockers/tools/cli/loadtest/Dockerfile
- .github/workflows/dockers-discoverer-k8s-image.yaml
- dockers/index/operator/Dockerfile
- dockers/index/job/readreplica/rotate/Dockerfile
- dockers/example/client/Dockerfile
- dockers/discoverer/k8s/Dockerfile
- .github/helm/values/values-rollout-agent.yaml
- dockers/agent/sidecar/Dockerfile
- dockers/index/job/creation/Dockerfile
- dockers/buildkit/syft/scanner/Dockerfile
- dockers/gateway/filter/Dockerfile
- Makefile
⏰ Context from checks skipped due to timeout of 90000ms (150)
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
- GitHub Check: build / build
🔇 Additional comments (8)
.github/workflows/dockers-gateway-filter-image.yaml (1)
262-262
: LGTM! Good addition of multi-platform support.The addition of AMD64 and ARM64 platform support enhances deployment flexibility and follows container best practices.
To ensure the multi-platform build works as expected, please verify:
- The build completes successfully for both architectures
- The resulting images are properly tagged and pushed
- The images can be pulled and run on their respective architectures
Run this script to verify the image build and platform support:
✅ Verification successful
Multi-platform support properly configured across all Docker workflows ✅
The configuration is consistently implemented across all 24 Docker workflow files, including base images and build infrastructure, indicating a well-coordinated implementation of multi-architecture support.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify multi-platform image build configuration # Check if the workflow file references a valid reusable workflow echo "Checking reusable workflow reference..." test -f .github/workflows/_docker-image.yaml || echo "Error: Referenced workflow file not found" # Verify platform configuration in other similar workflow files echo "Checking platform configuration consistency across workflow files..." rg -l 'platforms:\s*linux/amd64,linux/arm64' .github/workflows/dockers-*-image.yamlLength of output: 2909
.github/actions/setup-k3d/action.yaml (2)
39-42
: LGTM! Well-defined input parameter.The new
storage
input parameter is well-documented and follows the existing parameter style. The default value of "false" ensures backward compatibility.
131-135
: Verify the make target and add storage setup validation.The step looks good but requires verification of the following:
- Ensure the
k3d/storage
make target exists and properly sets up local-path-storage- Consider adding verification of successful storage setup
Let's verify the make target and its implementation:
Consider adding storage validation after the setup:
- name: Set local path storage if: ${{ inputs.storage == 'true' }} shell: bash run: | make k3d/storage + # Verify storage setup + kubectl wait --for=condition=ready pod -l app=local-path-provisioner -n local-path-storage --timeout=60s✅ Verification successful
The k3d/storage target exists and is properly implemented, but could benefit from additional validation
The target is correctly defined in Makefile.d/k3d.mk and:
- Deploys the official Rancher local-path-provisioner
- Verifies StorageClass creation
- Sets local-path as the default StorageClass
Consider enhancing the validation as suggested:
.PHONY: k3d/storage k3d/storage: kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml kubectl get storageclass kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' + kubectl wait --for=condition=ready pod -l app=local-path-provisioner -n local-path-storage --timeout=60s
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the k3d/storage make target and its implementation # Check if the make target exists echo "Checking for k3d/storage make target..." make -n k3d/storage 2>/dev/null || echo "k3d/storage target not found" # Look for local-path-storage related files echo -e "\nSearching for local-path-storage configuration..." rg -l "local-path-storage" --type yamlLength of output: 331
Script:
#!/bin/bash # Check Makefile content and search for storage-related configurations echo "=== Checking Makefile content ===" fd Makefile --type f --exec cat {} echo -e "\n=== Searching for storage-related patterns ===" rg -l "storage" --type yaml rg -l "provisioner" --type yaml echo -e "\n=== Checking k3d-related targets in Makefile ===" rg "k3d[:/]" MakefileLength of output: 33547
Script:
#!/bin/bash # Check k3d-specific Makefile and configurations echo "=== Checking k3d Makefile includes ===" cat Makefile.d/k3d.mk echo -e "\n=== Searching for k3d storage configurations ===" rg -l "k3d.*storage" --type yaml rg -l "local-path.*storage" --type yamlLength of output: 2568
tests/e2e/operation/multi.go (2)
33-38
: Enhance search configuration with timeout and increased result count.The changes improve the robustness of the MultiSearch operation by:
- Adding a 3-second timeout to catch potential DEADLINE_EXCEEDED errors during agent rollout
- Increasing the number of search results from 3 to 10 for better coverage
61-61
: Add logging for better observability.The added logging statement helps track the number of responses received during the test execution.
.github/workflows/e2e.yaml (1)
463-463
: LGTM!The job is correctly added to the slack-notification dependencies.
tests/e2e/operation/stream.go (1)
653-681
: Well-implemented thread-safe error handling.The error handling improvements demonstrate good practices:
- Proper mutex synchronization for thread safety
- Consistent error aggregation pattern across streaming operations
- Clear separation of error collection and propagation
Also applies to: 767-795, 883-910, 995-1023, 1156-1183
tests/e2e/crud/crud_test.go (1)
59-69
: Well-structured configuration management.The new configuration options enhance test flexibility:
- Search concurrency control
- Resource wait duration configuration
- Clear default values
Also applies to: 87-88, 104-104, 141-144
func RolloutResourceName( | ||
ctx context.Context, t *testing.T, resource string, name string, timeout string, | ||
) error { | ||
t.Helper() | ||
cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name) | ||
if err := runCmd(t, cmd); err != nil { | ||
return err | ||
} | ||
|
||
r := strings.Join([]string{resource, name}, "/") | ||
to := strings.Join([]string{"--timeout", timeout}, "=") | ||
cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to) | ||
stdout, err := cmd.StdoutPipe() | ||
if err != nil { | ||
return err | ||
} | ||
defer stdout.Close() | ||
|
||
stderr, err := cmd.StderrPipe() | ||
if err != nil { | ||
return err | ||
} | ||
defer stderr.Close() | ||
|
||
if err := cmd.Start(); err != nil { | ||
return err | ||
} | ||
go func() { | ||
scanner := bufio.NewScanner(stdout) | ||
for scanner.Scan() { | ||
fmt.Println(scanner.Text()) | ||
} | ||
}() | ||
go func() { | ||
scanner := bufio.NewScanner(stderr) | ||
for scanner.Scan() { | ||
fmt.Println("Error:", scanner.Text()) | ||
} | ||
}() | ||
|
||
return cmd.Wait() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for scanner errors.
The implementation looks good but should include error handling for scanner errors in the goroutines.
Apply this diff to add error handling:
go func() {
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
+ if err := scanner.Err(); err != nil {
+ fmt.Printf("Error reading stdout: %v\n", err)
+ }
}()
go func() {
scanner := bufio.NewScanner(stderr)
for scanner.Scan() {
fmt.Println("Error:", scanner.Text())
}
+ if err := scanner.Err(); err != nil {
+ fmt.Printf("Error reading stderr: %v\n", err)
+ }
}()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
func RolloutResourceName( | |
ctx context.Context, t *testing.T, resource string, name string, timeout string, | |
) error { | |
t.Helper() | |
cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name) | |
if err := runCmd(t, cmd); err != nil { | |
return err | |
} | |
r := strings.Join([]string{resource, name}, "/") | |
to := strings.Join([]string{"--timeout", timeout}, "=") | |
cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to) | |
stdout, err := cmd.StdoutPipe() | |
if err != nil { | |
return err | |
} | |
defer stdout.Close() | |
stderr, err := cmd.StderrPipe() | |
if err != nil { | |
return err | |
} | |
defer stderr.Close() | |
if err := cmd.Start(); err != nil { | |
return err | |
} | |
go func() { | |
scanner := bufio.NewScanner(stdout) | |
for scanner.Scan() { | |
fmt.Println(scanner.Text()) | |
} | |
}() | |
go func() { | |
scanner := bufio.NewScanner(stderr) | |
for scanner.Scan() { | |
fmt.Println("Error:", scanner.Text()) | |
} | |
}() | |
return cmd.Wait() | |
} | |
func RolloutResourceName( | |
ctx context.Context, t *testing.T, resource string, name string, timeout string, | |
) error { | |
t.Helper() | |
cmd := exec.CommandContext(ctx, "kubectl", "rollout", "restart", resource, name) | |
if err := runCmd(t, cmd); err != nil { | |
return err | |
} | |
r := strings.Join([]string{resource, name}, "/") | |
to := strings.Join([]string{"--timeout", timeout}, "=") | |
cmd = exec.CommandContext(ctx, "kubectl", "rollout", "status", r, "--watch", to) | |
stdout, err := cmd.StdoutPipe() | |
if err != nil { | |
return err | |
} | |
defer stdout.Close() | |
stderr, err := cmd.StderrPipe() | |
if err != nil { | |
return err | |
} | |
defer stderr.Close() | |
if err := cmd.Start(); err != nil { | |
return err | |
} | |
go func() { | |
scanner := bufio.NewScanner(stdout) | |
for scanner.Scan() { | |
fmt.Println(scanner.Text()) | |
} | |
if err := scanner.Err(); err != nil { | |
fmt.Printf("Error reading stdout: %v\n", err) | |
} | |
}() | |
go func() { | |
scanner := bufio.NewScanner(stderr) | |
for scanner.Scan() { | |
fmt.Println("Error:", scanner.Text()) | |
} | |
if err := scanner.Err(); err != nil { | |
fmt.Printf("Error reading stderr: %v\n", err) | |
} | |
}() | |
return cmd.Wait() | |
} |
|
||
_ = op.Upsert(t, ctx, operation.Dataset{ | ||
Train: ds.Train[insertFrom : insertFrom+insertNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
sleep(t, waitAfterInsertDuration) | ||
|
||
searchFunc := func(ctx context.Context) error { | ||
return op.MultiSearch(t, ctx, operation.Dataset{ | ||
Test: ds.Test[searchFrom : searchFrom+searchNum], | ||
Neighbors: ds.Neighbors[searchFrom : searchFrom+searchNum], | ||
}) | ||
} | ||
|
||
wg := sync.WaitGroup{} | ||
mu := sync.Mutex{} | ||
var serr error | ||
wg.Add(1) | ||
done := make(chan struct{}) | ||
go func() { | ||
defer wg.Done() | ||
for { | ||
select { | ||
case <-done: | ||
return | ||
default: | ||
eg, egctx := errgroup.New(ctx) | ||
for i := 0; i < searchConcurrency; i++ { | ||
eg.Go(func() (e error) { | ||
ierr := searchFunc(egctx) | ||
if ierr != nil { | ||
st, ok := status.FromError(ierr) | ||
if ok && st.Code() == codes.DeadlineExceeded { | ||
_, _, rerr := status.ParseError(ierr, codes.DeadlineExceeded, "an error occurred") | ||
mu.Lock() | ||
e = errors.Join(e, rerr) | ||
mu.Unlock() | ||
} | ||
} | ||
return | ||
}) | ||
} | ||
egerr := eg.Wait() | ||
mu.Lock() | ||
serr = errors.Join(serr, egerr) | ||
mu.Unlock() | ||
time.Sleep(5 * time.Second) | ||
} | ||
} | ||
}() | ||
|
||
// Wait for StatefulSet to be ready | ||
t.Log("rollout restart agent and waiting for agent pods ready...") | ||
err = kubectl.RolloutResourceName(ctx, t, "statefulset", "vald-agent", waitResourceReadyDuration.String()) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
cnt, err := op.IndexInfo(t, ctx) | ||
if err != nil { | ||
if cnt == nil { | ||
t.Fatalf("an error occurred: err = %s", err) | ||
} | ||
t.Fatalf("an error occurred: count = %d, err = %s", cnt.Stored, err) | ||
} | ||
|
||
err = op.Exists(t, ctx, "0") | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
err = op.GetObject(t, ctx, operation.Dataset{ | ||
Train: ds.Train[getObjectFrom : getObjectFrom+getObjectNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
err = op.Remove(t, ctx, operation.Dataset{ | ||
Train: ds.Train[removeFrom : removeFrom+removeNum], | ||
}) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
// Remove all vector data after the current - 1 hour. | ||
err = op.RemoveByTimestamp(t, ctx, time.Now().Add(-time.Hour).UnixNano()) | ||
if err != nil { | ||
t.Fatalf("an error occurred: %s", err) | ||
} | ||
|
||
close(done) | ||
wg.Wait() | ||
if serr != nil { | ||
t.Fatalf("an error occurred: %s", serr.Error()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider reducing code duplication in test implementations.
The high concurrency test is well-structured but shares significant code with TestE2EAgentRolloutRestart. Consider extracting common functionality.
Extract shared test logic into reusable functions:
+type searchTestConfig struct {
+ t *testing.T
+ ctx context.Context
+ op *operation.Operation
+ searchFunc func(context.Context) error
+ concurrency int
+ sleepDuration time.Duration
+}
+
+func runConcurrentSearchTest(cfg searchTestConfig) error {
+ var (
+ wg sync.WaitGroup
+ mu sync.Mutex
+ serr error
+ )
+ done := make(chan struct{})
+
+ wg.Add(1)
+ go func() {
+ defer wg.Done()
+ for {
+ select {
+ case <-done:
+ return
+ default:
+ eg, egctx := errgroup.New(cfg.ctx)
+ for i := 0; i < cfg.concurrency; i++ {
+ eg.Go(func() (e error) {
+ if ierr := cfg.searchFunc(egctx); ierr != nil {
+ if isDeadlineExceeded(ierr) {
+ _, _, rerr := status.ParseError(
+ ierr,
+ codes.DeadlineExceeded,
+ "an error occurred",
+ )
+ mu.Lock()
+ e = errors.Join(e, rerr)
+ mu.Unlock()
+ }
+ }
+ return
+ })
+ }
+ egerr := eg.Wait()
+ mu.Lock()
+ serr = errors.Join(serr, egerr)
+ mu.Unlock()
+ time.Sleep(cfg.sleepDuration)
+ }
+ }
+ }()
+
+ // Run test-specific operations here
+
+ close(done)
+ wg.Wait()
+ return serr
+}
func TestE2EHighConcurrencyMultiSearch(t *testing.T) {
// ... setup code ...
- wg := sync.WaitGroup{}
- mu := sync.Mutex{}
- var serr error
- wg.Add(1)
- done := make(chan struct{})
- go func() {
- // ... duplicate concurrent search logic ...
- }()
+ err := runConcurrentSearchTest(searchTestConfig{
+ t: t,
+ ctx: ctx,
+ op: op,
+ searchFunc: searchFunc,
+ concurrency: searchConcurrency,
+ sleepDuration: 5 * time.Second,
+ })
// ... remaining test code ...
}
Committable suggestion skipped: line range outside the PR's diff.
Description
Fatal
Related Issue
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit
Release Notes
New Features
Testing Improvements
MultiSearch
method to handle more results and introduced timeout configuration.Configuration Updates