-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport PR #2702 to release/v1.7 for feat: Implement delete expired index job #2722
Backport PR #2702 to release/v1.7 for feat: Implement delete expired index job #2722
Conversation
* feat: Implement delete expired index job * fix: Replace agent Remove RPC to Vald Remove RPC * fix: service name * fix: typo * fix: log method * fix: variable name * fix: use internal package * fix: Change struct field name
Deploying vald with Cloudflare Pages
|
📝 WalkthroughWalkthroughThe pull request introduces several changes focused on enhancing the index deletion functionality within the Vald system. Key modifications include the addition of new GitHub Actions workflows for managing Docker images related to index deletion, the implementation of a Kubernetes CronJob for periodic index deletion, and the creation of configuration files and service structures to support the deletion process. Additionally, updates to spell-check configurations and Makefile entries are included to accommodate these new features. Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 24
🧹 Outside diff range and nitpick comments (17)
pkg/index/job/deletion/service/options.go (1)
1-48
: Consider implementing validation helper package.Given the similar validation patterns needed across options, consider creating a shared validation package that can be reused across the codebase. This would ensure consistent validation rules and error messages.
Additionally, consider adding:
- Constants for default values
- Package-level documentation explaining the overall purpose and usage
- Examples in documentation showing common usage patterns
cmd/index/job/deletion/main.go (1)
34-59
: Well-structured implementation aligned with PR objectives.The implementation provides a solid foundation for the index deletion job with proper separation of concerns:
- Configuration management via
config.NewConfig
- Business logic isolation in
usecase.New
- Error handling and recovery mechanisms
Consider documenting the following aspects in the PR description or README:
- Expected configuration format and options
- Kubernetes Job/CronJob scheduling recommendations
- Monitoring and alerting considerations for failed deletions
pkg/index/job/deletion/config/config.go (2)
25-37
: Consider enhancing struct documentation.The Data struct could benefit from more detailed documentation:
- Add examples of when/how this configuration is used
- Document the relationship between these configuration components
- Add validation rules or constraints for each field
Example enhancement:
// Data represents the configuration for the index deletion job. // It combines global application settings, server configurations, // observability settings, and deletion-specific configurations // required to run the index deletion service.
24-37
: Consider adding scheduling configuration for periodic deletion.Based on the PR objectives mentioning periodic execution, consider adding configuration fields for:
- Schedule parameters (cron expression)
- One-time vs periodic execution mode
- Retry policies and timeout settings
This would align better with the requirement to support both periodic and one-time deletion tasks.
internal/config/index_deleter.go (2)
39-40
: Fix incorrect field comment.The comment for
DeletionPoolSize
incorrectly mentions "indexing" instead of "deletion".- // DeletionPoolSize represents batch pool size for indexing. + // DeletionPoolSize represents batch pool size for deletion operations.
17-47
: Consider adding expiration-related configuration fields.To better support the PR's objective of managing expired indexes, consider adding fields for:
- Expiration criteria (e.g., age-based, usage-based)
- Schedule configuration for periodic deletion
- Dry-run mode for safety
Example fields:
type IndexDeleter struct { // ... existing fields ... // ExpirationAge defines how old (in hours) an index must be to be considered expired ExpirationAge int64 `json:"expiration_age" yaml:"expiration_age"` // LastAccessThreshold defines the threshold (in hours) of last access time for considering an index expired LastAccessThreshold int64 `json:"last_access_threshold" yaml:"last_access_threshold"` // DryRun if true, only logs what would be deleted without actually deleting DryRun bool `json:"dry_run" yaml:"dry_run"` }.github/workflows/dockers-index-deletion.yaml (1)
76-80
: Consider pinning the reusable workflow version.While the current configuration is correct, consider pinning the reusable workflow to a specific SHA for better stability and predictability.
build: - uses: ./.github/workflows/_docker-image.yaml + uses: ./.github/workflows/_docker-image.yaml@v1.7.0 with: target: index-deletion secrets: inheritcmd/index/job/deletion/sample.yaml (1)
16-17
: Consider making version and timezone configurable
- The version
v0.0.0
appears to be a placeholder. Consider using a templated value that matches the actual release version.- Hardcoding timezone to
JST
might cause issues for deployments in different regions. Consider making it configurable through environment variables.-version: v0.0.0 -time_zone: JST +version: ${VALD_VERSION} +time_zone: ${TIMEZONE:-UTC}Makefile.d/docker.mk (1)
352-355
: Consider adding documentation for the new image.While the implementation is correct, consider adding documentation about the index-deletion image's purpose and configuration in the project's documentation, similar to other index-related jobs.
k8s/index/job/deletion/configmap.yaml (2)
32-35
: Consider using INFO level logging for production.Debug level logging can impact performance and generate excessive logs in production environments.
logging: format: raw - level: debug + level: info logger: glg
202-207
: Remove TLS paths when TLS is disabled.Since TLS is disabled (
enabled: false
), having paths configured could be misleading for maintenance.tls: - ca: /path/to/ca - cert: /path/to/cert enabled: false insecure_skip_verify: false - key: /path/to/keydockers/index/job/deletion/Dockerfile (1)
19-19
: Remove unused build argumentUPX_OPTIONS
The build argument
UPX_OPTIONS
is declared but not utilized in the Dockerfile. Removing unused arguments helps keep the Dockerfile clean and maintainable.Apply this diff to remove the unused argument:
-ARG UPX_OPTIONS=-9
pkg/index/job/deletion/usecase/deletion.go (3)
73-79
: Add a comment explaining the reversal ofaddrs
inOnDiscoverFunc
.The purpose of reversing the
addrs
slice in theOnDiscoverFunc
is not immediately clear. Adding a comment to explain why the addresses are reversed would improve code readability and maintainability.
139-189
: Handle potential nil channels in the select statement.In the
Start
method, ifr.observability
isnil
, theoech
channel remainsnil
. Including anil
channel in aselect
statement can potentially lead to blocking behavior. While Go'sselect
ignores cases withnil
channels, explicitly handling this can improve code clarity.Consider modifying the select statement:
for { select { case <-ctx.Done(): return ctx.Err() - case err = <-oech: + case err = <-oech: + if oech != nil { + err = <-oech + } case err = <-sech: case err = <-cech: }Alternatively, initialize
oech
with a closed channel whenr.observability
isnil
to avoid thenil
channel issue.
193-214
: Remove unnecessary methodsPreStop
andPostStop
if not used.The methods
PreStop
andPostStop
currently do nothing and may not be necessary. If they are not required by therunner.Runner
interface, consider removing them to simplify the code.If these methods are required by the interface but intentionally left empty, consider adding comments to explain this.
// PreStop is called before Stop. Currently, no pre-stop actions are required. func (*run) PreStop(_ context.Context) error { return nil } // PostStop is called after Stop. Currently, no post-stop actions are required. func (*run) PostStop(_ context.Context) error { return nil }pkg/index/job/deletion/service/deleter.go (2)
63-69
: Improve variable naming for clarity indelDuplicateAddrs
.The variable
exist
used to track addresses can be renamed toseen
for better readability and to align with Go naming conventions. This makes the purpose of the variable clearer to other developers.Consider renaming
exist
toseen
:func delDuplicateAddrs(targetAddrs []string) []string { addrs := make([]string, 0, len(targetAddrs)) - exist := make(map[string]bool) + seen := make(map[string]bool) for _, addr := range targetAddrs { - if !exist[addr] { + if !seen[addr] { addrs = append(addrs, addr) - exist[addr] = true + seen[addr] = true } } return addrs }
79-85
: Simplify the deferred span ending inStart
method.Since
span
is initialized just before thedefer
statement and is guaranteed to be non-nil, the nil check is unnecessary. Removing it simplifies the code.Simplify the deferred function:
func (idx *index) Start(ctx context.Context) error { ctx, span := trace.StartSpan(ctx, apiName+"/service/index.Delete") defer func() { - if span != nil { - span.End() - } + span.End() }() // ... }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
hack/actions/gen/main.go
is excluded by!**/gen/**
hack/docker/gen/main.go
is excluded by!**/gen/**
📒 Files selected for processing (21)
- .cspell.json (1 hunks)
- .gitfiles (1 hunks)
- .github/actions/detect-docker-image-tags/action.yaml (1 hunks)
- .github/workflows/dockers-image-scan.yaml (1 hunks)
- .github/workflows/dockers-index-deletion-image.yaml (1 hunks)
- .github/workflows/dockers-index-deletion.yaml (1 hunks)
- .github/workflows/dockers-release-branch-image.yaml (1 hunks)
- Makefile (1 hunks)
- Makefile.d/build.mk (4 hunks)
- Makefile.d/docker.mk (3 hunks)
- cmd/index/job/deletion/main.go (1 hunks)
- cmd/index/job/deletion/sample.yaml (1 hunks)
- dockers/index/job/deletion/Dockerfile (1 hunks)
- hack/cspell/main.go (1 hunks)
- internal/config/index_deleter.go (1 hunks)
- k8s/index/job/deletion/configmap.yaml (1 hunks)
- k8s/index/job/deletion/cronjob.yaml (1 hunks)
- pkg/index/job/deletion/config/config.go (1 hunks)
- pkg/index/job/deletion/service/deleter.go (1 hunks)
- pkg/index/job/deletion/service/options.go (1 hunks)
- pkg/index/job/deletion/usecase/deletion.go (1 hunks)
✅ Files skipped from review due to trivial changes (2)
- .cspell.json
- hack/cspell/main.go
🔇 Additional comments (29)
pkg/index/job/deletion/service/options.go (3)
1-7
: LGTM! Package structure and imports are well-organized.
10-19
: LGTM! Robust error handling for discoverer client.The implementation properly validates the client and uses appropriate error types.
8-9
: Verify the index type definition.The
index
type used in the Option function signature is not visible in this file. Let's verify its definition.✅ Verification successful
Since the file under review is
pkg/index/job/deletion/service/options.go
, and we found the correspondingindex
type definition inpkg/index/job/deletion/service/deleter.go
within the same package, I can now generate the final response:The index type is properly defined within the same package
The
index
type referenced in the Option function is defined indeleter.go
within the same package. It's a struct containing client configuration, target addresses, and concurrency settings, which is appropriate for the deletion service context.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for index type definition in the package ast-grep --pattern 'type index struct { $$$ }'Length of output: 3094
cmd/index/job/deletion/main.go (3)
14-26
: LGTM! Well-organized imports.The package structure and imports are clean and include all necessary dependencies for the index deletion job implementation.
40-58
: LGTM! Robust error handling implementation.The implementation includes:
- Proper error wrapping with context
- Type assertion checks
- Appropriate fatal logging for unrecoverable errors
28-32
: Verify version compatibility with the backport.The
maxVersion
is set to "v0.0.10" which seems unusually low for a backport to release/v1.7. Please verify if this version range is appropriate.✅ Verification successful
Based on the gathered information, I can now provide a final response about the version compatibility concern:
The version range is appropriate for this component.
The
maxVersion = "v0.0.10"
in the index deletion job is correct because:
From internal/info/info.go, we can see that this is a standalone component with its own versioning scheme, separate from the main Vald version.
The version check is handled by internal/version/version.go which uses semantic versioning to ensure version compatibility between components.
The version range (v0.0.0 to v0.0.10) is intentionally kept at 0.0.x level to indicate this is an internal component that follows its own versioning scheme independent of the main Vald release version.
The version constraints are properly validated using hashicorp/go-version library to ensure semantic version compatibility.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check version compatibility with the target branch # Test: Get the version from the target branch. Expect: Version should be compatible with maxVersion # Get the current branch version git branch --show-current # Check version tags in the repository git tag | grep -E "v1\.7\."Length of output: 73
Script:
#!/bin/bash # Let's try a different approach to understand the version context # Check the repository's version information in common locations echo "=== Checking version in go.mod ===" cat go.mod echo -e "\n=== Checking README for version info ===" cat README.md echo -e "\n=== Looking for version-related constants in the codebase ===" rg "const.*version" -i --type go -A 3 echo -e "\n=== Checking other files in the same directory for version context ===" fd . "cmd/index/job/deletion" --type f --exec cat {}Length of output: 70806
Script:
#!/bin/bash # Let's check the version information in the package and configuration # Check version information in pkg/index/job/deletion echo "=== Checking version info in pkg/index/job/deletion ===" fd . "pkg/index/job/deletion" --type f --exec rg "version" -i {} # Check version information in internal/info echo -e "\n=== Checking version info in internal/info ===" fd . "internal/info" --type f --exec cat {} # Check version information in internal/version echo -e "\n=== Checking version info in internal/version ===" fd . "internal/version" --type f --exec cat {}Length of output: 53959
pkg/index/job/deletion/config/config.go (2)
1-20
: LGTM! Clean package structure and imports.The package declaration and imports are well-organized, following Go best practices with appropriate copyright headers.
39-71
: 🛠️ Refactor suggestionImprove error handling and add configuration validation.
Several improvements could enhance the robustness of the configuration loading:
- The Bind() errors are currently discarded. Consider logging or handling these errors.
- Add validation for actual configuration values before returning.
- Consider sanitizing the file path input for security.
Let's verify if there are any existing validation patterns in the codebase:
Consider applying this enhancement:
func NewConfig(path string) (cfg *Data, err error) { + if err := validatePath(path); err != nil { + return nil, err + } cfg = new(Data) if err = config.Read(path, &cfg); err != nil { return nil, err } if cfg != nil { - _ = cfg.GlobalConfig.Bind() + if err := cfg.GlobalConfig.Bind(); err != nil { + return nil, fmt.Errorf("failed to bind global config: %w", err) + } } else { return nil, errors.ErrInvalidConfig } + + if err := cfg.validate(); err != nil { + return nil, err + } return cfg, nil } + +func (d *Data) validate() error { + // Add validation logic here + return nil +}.github/workflows/dockers-index-deletion-image.yaml (3)
76-80
: LGTM: Proper reusable workflow implementationThe job configuration correctly:
- Uses a reusable workflow template
- Specifies the target as 'index-deletion'
- Inherits secrets from the calling context
32-74
: Verify path patterns completenessThe path patterns appear comprehensive but should be verified against the actual index deletion feature requirements.
Let's verify the path patterns:
#!/bin/bash # Description: Verify if all relevant paths for the index deletion feature are included # Expected: No additional critical paths found # Check for potential missing paths related to index deletion echo "Searching for files related to index deletion..." fd -t f . | rg -i "index.*deletion" | rg -v -f <(rg '^ - ' .github/workflows/dockers-index-deletion-image.yaml | sed 's/ - //') # Check for potential configuration files echo "Checking for config files..." fd -t f "config.*yaml" | rg -i "index.*deletion"
31-74
: Security consideration: Validate PR trigger configurationThe workflow is triggered on both
pull_request
andpull_request_target
events with identical path filters. Thepull_request_target
event runs in the context of the base repository and has access to secrets, which could pose a security risk if not properly configured.Let's verify the security implications:
Consider:
- Using only
pull_request
event if secrets access isn't required- If
pull_request_target
is necessary, implement additional security checks in the shared workflow✅ Verification successful
I need to examine the shared workflow and action files to understand how they handle secrets and PR triggers before making a final determination about the security implications.
PR trigger configuration is secure with proper controls
The workflow implementation is secure based on the following evidence:
The shared workflow (
_docker-image.yaml
) has explicit security controls:
- For forked repositories,
pull_request_target
only runs if the PR has theci/approved
label- For non-forked repositories, it uses regular
pull_request
event- The build job has a strict conditional check combining these rules:
(github.event.pull_request.head.repo.fork == true && github.event_name == 'pull_request_target' && contains(github.event.pull_request.labels.*.name, 'ci/approved'))Additional security measures:
- Docker builds from forks are limited to
linux/amd64
platform only- Extra tags are not added for PR builds
- Secrets are only used in the build job which has the security conditionals
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if other Docker image workflows follow the same pattern # Description: Looking for potential security precedents or inconsistencies # Expected: Similar dual trigger patterns in other Docker workflows echo "Checking other Docker workflow files for PR trigger patterns..." rg -l 'pull_request_target:' .github/workflows/dockers-*-image.yaml echo "Checking for secret usage in the shared workflow..." rg 'secrets:' .github/workflows/_docker-image.yamlLength of output: 461
Script:
#!/bin/bash # Let's check the shared workflow file to understand how it handles secrets and PR triggers echo "Content of shared workflow file:" cat .github/workflows/_docker-image.yaml # Also check the docker-build action for additional context echo -e "\nContent of docker-build action:" cat .github/actions/docker-build/action.yaml # Check if there are any explicit security checks for PR_target echo -e "\nChecking for PR security measures:" rg -A 5 'if: github.event_name == .pull_request_target.' .github/workflows/Length of output: 10205
.github/workflows/dockers-image-scan.yaml (1)
73-76
: LGTM! Good security practice including the new image in scanning workflow.The addition of the
index-deletion
job follows the established pattern and ensures the new Docker image will undergo the same security scanning as other images. This is crucial for maintaining the security posture of the system..github/workflows/dockers-release-branch-image.yaml (1)
91-96
: LGTM! Well-integrated job definition.The new
index-deletion
job follows the established pattern and is properly integrated with the existing workflow structure. It correctly depends on thedump-contexts-to-log
job and uses the shared workflow file, which maintains consistency with other jobs in the pipeline.k8s/index/job/deletion/cronjob.yaml (2)
54-77
:⚠️ Potential issueImprove init containers reliability and configurability
The init containers have several areas for improvement:
- Hardcoded namespace
default.svc.cluster.local
limits deployment flexibility- Infinite loop without timeout could cause stuck pods
- Consider using
kubernetes.io/startup-probe
instead of custom HTTP checksSuggested improvements:
- name: wait-for-agent image: busybox:stable imagePullPolicy: Always command: - /bin/sh - -e - -c - | + timeout=300 + start_time=$(date +%s) until [ "$(wget --server-response --spider --quiet \ - http://vald-agent.default.svc.cluster.local:3001/readiness \ + http://vald-agent.${MY_POD_NAMESPACE}.svc.cluster.local:3001/readiness \ 2>&1 | awk 'NR==1{print $2}')" == "200" ]; do + current_time=$(date +%s) + elapsed=$((current_time - start_time)) + if [ $elapsed -gt $timeout ]; then + echo "Timeout waiting for agent" + exit 1 + fi echo "waiting for agent to be ready..." sleep 2; doneApply similar changes to the discoverer init container.
#!/bin/bash # Check if the namespace is referenced elsewhere in the codebase rg -l "default[.]svc[.]cluster[.]local"
142-146
: Add ConfigMap validationThe job depends on the
vald-index-deletion-config
ConfigMap but there's no guarantee it exists when the job runs.Consider:
- Adding documentation about the required ConfigMap
- Using Helm dependencies to ensure the ConfigMap is created first
- Adding validation in the init container to check for ConfigMap existence:
initContainers: + - name: validate-config + image: busybox:stable + command: + - /bin/sh + - -c + - | + if [ ! -f /etc/server/config.yaml ]; then + echo "ConfigMap not mounted properly" + exit 1 + fi + volumeMounts: + - name: vald-index-deletion-config + mountPath: /etc/server/cmd/index/job/deletion/sample.yaml (2)
200-230
: Enable and configure observability for production readiness
- Observability is disabled by default (
enabled: false
), which might hinder monitoring in production.- Pod and node attributes contain placeholder values that need to be replaced.
Let's verify the observability configuration in other components:
#!/bin/bash # Check observability settings across configurations rg -g '*.{yaml,yml}' 'observability:\s*\n\s*enabled:'Recommendations:
- Enable observability by default for production environments
- Document the required environment variables for the placeholder values
- Consider adding alerts for critical metrics
observability: - enabled: false + enabled: true otlp: attribute: - namespace: "_MY_POD_NAMESPACE_" - pod_name: "_MY_POD_NAME_" - node_name: "_MY_NODE_NAME_" + namespace: ${POD_NAMESPACE} + pod_name: ${POD_NAME} + node_name: ${NODE_NAME}
71-79
: Security and configuration concerns in deleter section
- Using the "default" namespace is not recommended for production deployments.
- The
index_id
is a critical field but lacks validation or documentation about its format.- The
concurrency
setting might need documentation about its impact on performance and resource usage.Let's verify the namespace usage in other configuration files:
Consider:
- Using a dedicated namespace for the deletion job
- Adding validation for the index_id format
- Documenting the impact of concurrency settings on resource usage
Makefile.d/build.mk (2)
27-27
: LGTM! Build configuration follows best practices.The build configuration for the index deletion binary is correctly implemented:
- Uses static linking (CGO_ENABLED=0)
- Follows consistent pattern with other index jobs
- Properly integrated into the build system
Also applies to: 84-87
136-136
: LGTM! Zip artifact configuration is properly implemented.The zip artifact configuration for the index deletion binary:
- Follows the established pattern for artifact creation
- Maintains consistency with other index job artifacts
- Correctly handles directory creation and zip file generation
Also applies to: 196-199
Makefile.d/docker.mk (2)
39-39
: LGTM! Consistent integration with existing build targets.The index-deletion target is properly integrated into both the main build list and xpanes build list, maintaining alphabetical ordering.
Also applies to: 67-67
346-355
: LGTM! Verify Dockerfile existence.The new targets follow the established pattern and are properly defined. However, let's verify the existence of the referenced Dockerfile.
✅ Verification successful
Dockerfile exists at the expected location - good to go!
The new Makefile targets are properly defined and the referenced Dockerfile exists at the expected path
dockers/index/job/deletion/Dockerfile
, confirming that the implementation is complete and correct.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify the existence of the Dockerfile if [ -f "dockers/index/job/deletion/Dockerfile" ]; then echo "✓ Dockerfile exists at dockers/index/job/deletion/Dockerfile" else echo "✗ Missing Dockerfile at dockers/index/job/deletion/Dockerfile" exit 1 fiLength of output: 181
k8s/index/job/deletion/configmap.yaml (3)
16-26
: LGTM! Metadata follows Kubernetes best practices.The ConfigMap metadata includes appropriate labels for Kubernetes and Helm management.
208-238
: Consider enabling observability for monitoring deletion operations.Since this job performs critical data deletion operations, having observability enabled would help in monitoring and debugging the deletion process.
observability: - enabled: false + enabled: true otlp: collector_endpoint: "" trace: - enabled: false + enabled: trueRun this script to verify if OpenTelemetry collector is deployed in the cluster:
✅ Verification successful
Let me check the collector configuration and documentation to provide a more informed response.
Based on the documentation and configuration files, I can now provide the final response:
OpenTelemetry collector is available, enabling observability is recommended
The OpenTelemetry collector is properly configured in the cluster with support for both metrics and traces. The collector is set up as a DaemonSet to collect telemetry data from all nodes, and exports to:
- Jaeger for distributed tracing
- Prometheus for metrics visualization
To enable observability, update the configuration as suggested and set the collector endpoint to:
otlp: collector_endpoint: "opentelemetry-collector-collector.default.svc.cluster.local:4317"🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if OpenTelemetry collector exists in the cluster # Expected: Find references to otel-collector deployment or service rg -l 'otel.*collector|opentelemetry.*collector'Length of output: 681
Script:
#!/bin/bash # Check collector configuration and documentation cat k8s/metrics/otel/collector.yaml cat k8s/metrics/otel/README.md cat docs/user-guides/observability-configuration.mdLength of output: 11653
239-248
:⚠️ Potential issueUpdate sample index_id and consider increasing concurrency.
- The
index_id
is set to "sample" which needs to be updated with a real value or made configurable.- Consider increasing concurrency for better performance, especially when dealing with multiple indexes.
deleter: - index_id: "sample" + index_id: ${INDEX_ID} - concurrency: 1 + concurrency: 5Let's verify if this index_id is referenced elsewhere:
Makefile (1)
44-44
: LGTM: Variable definition follows conventions.The addition of
INDEX_DELETION_IMAGE
follows the established naming pattern and is correctly placed with other image-related variables.Let's verify the variable usage in the codebase:
✅ Verification successful
Variable is properly defined and utilized in build system
The
INDEX_DELETION_IMAGE
variable is correctly used indocker.mk
for:
- Image name generation via echo command
- Docker build target configuration
The implementation follows the established pattern used for other Docker images in the build system.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for references to the new index deletion image variable # Test: Search for INDEX_DELETION_IMAGE usage in Makefile.d/docker.mk and other build files rg "INDEX_DELETION_IMAGE" Makefile.d/Length of output: 153
Script:
#!/bin/bash # Let's check the context of how INDEX_DELETION_IMAGE is used in docker.mk rg -A 3 "INDEX_DELETION_IMAGE" Makefile.d/docker.mkLength of output: 252
.gitfiles (1)
99-99
: LGTM: Addition of index deletion workflow file.The addition of
.github/workflows/dockers-index-deletion-image.yaml
aligns with the PR objectives to implement index deletion functionality and follows the repository's established pattern for Docker image workflow files.pkg/index/job/deletion/service/deleter.go (3)
197-202
: Ensure consistency in error handling when skipping certain errors.When the gRPC status code is
FailedPrecondition
orNotFound
, the function logs a warning and returnsnil
, effectively skipping the error. Confirm that this behavior is intentional and that these errors should not be propagated.If skipping these errors is intended, consider documenting this behavior to make it clear to future maintainers. If not, adjust the error handling to propagate these errors appropriately.
215-218
: Review error aggregation logic for completeness.The
doDeleteIndex
method returnserrors.Join(err, errs)
to aggregate errors from different sources. Ensure that this aggregation correctly represents all encountered errors without omitting any critical information.Confirm that
errors.Join(err, errs)
effectively combines the error fromOrderedRangeConcurrent
and the accumulatederrs
. If necessary, consider using an error type that preserves individual errors for detailed reporting.
154-214
:⚠️ Potential issuePotential data race when accumulating errors concurrently.
In the
doDeleteIndex
method, theerrs
variable is accessed and modified by multiple goroutines. Although a mutexemu
is used, ensure that all accesses toerrs
are properly synchronized to prevent data races.Verify that there are no unsynchronized accesses to
errs
. Alternatively, consider using anerrgroup.Group
to manage concurrency and error collection more safely.To confirm, you can check for data races using Go's race detector during testing.
// WithIndexingConcurrency returns Option that sets indexing concurrency. | ||
func WithIndexingConcurrency(num int) Option { | ||
return func(idx *index) error { | ||
if num <= 0 { | ||
return errors.NewErrInvalidOption("indexingConcurrency", num) | ||
} | ||
idx.concurrency = num | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding an upper bound check for concurrency.
While the function correctly validates for non-positive values, consider adding an upper bound check to prevent potential resource exhaustion.
func WithIndexingConcurrency(num int) Option {
return func(idx *index) error {
if num <= 0 {
return errors.NewErrInvalidOption("indexingConcurrency", num)
}
+ // Consider system resources and practical limits
+ const maxConcurrency = 1000 // adjust based on your requirements
+ if num > maxConcurrency {
+ return errors.NewErrInvalidOption("indexingConcurrency exceeds maximum", num)
+ }
idx.concurrency = num
return nil
}
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// WithIndexingConcurrency returns Option that sets indexing concurrency. | |
func WithIndexingConcurrency(num int) Option { | |
return func(idx *index) error { | |
if num <= 0 { | |
return errors.NewErrInvalidOption("indexingConcurrency", num) | |
} | |
idx.concurrency = num | |
return nil | |
} | |
} | |
// WithIndexingConcurrency returns Option that sets indexing concurrency. | |
func WithIndexingConcurrency(num int) Option { | |
return func(idx *index) error { | |
if num <= 0 { | |
return errors.NewErrInvalidOption("indexingConcurrency", num) | |
} | |
// Consider system resources and practical limits | |
const maxConcurrency = 1000 // adjust based on your requirements | |
if num > maxConcurrency { | |
return errors.NewErrInvalidOption("indexingConcurrency exceeds maximum", num) | |
} | |
idx.concurrency = num | |
return nil | |
} | |
} |
// WithTargetAddrs returns Option that sets indexing target addresses. | ||
func WithTargetAddrs(addrs ...string) Option { | ||
return func(idx *index) error { | ||
if len(addrs) != 0 { | ||
idx.targetAddrs = append(idx.targetAddrs, addrs...) | ||
} | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add address validation and deduplication.
The current implementation could benefit from:
- Validating address format
- Preventing duplicate addresses
- Checking for empty strings
func WithTargetAddrs(addrs ...string) Option {
return func(idx *index) error {
if len(addrs) != 0 {
+ seen := make(map[string]struct{}, len(idx.targetAddrs))
+ for _, addr := range idx.targetAddrs {
+ seen[addr] = struct{}{}
+ }
+ for _, addr := range addrs {
+ if addr == "" {
+ return errors.NewErrInvalidOption("empty address", addr)
+ }
+ // Add basic address format validation
+ if !isValidAddr(addr) {
+ return errors.NewErrInvalidOption("invalid address format", addr)
+ }
+ if _, exists := seen[addr]; !exists {
+ idx.targetAddrs = append(idx.targetAddrs, addr)
+ seen[addr] = struct{}{}
+ }
+ }
- idx.targetAddrs = append(idx.targetAddrs, addrs...)
}
return nil
}
}
Committable suggestion was skipped due to low confidence.
// WithTargetIndexID returns Option that sets target deleting index ID. | ||
func WithTargetIndexID(indexID string) Option { | ||
return func(idx *index) error { | ||
idx.targetIndexID = indexID | ||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add index ID validation and enhance documentation.
The function should validate the index ID format and provide clear documentation about the expected format.
-// WithTargetIndexID returns Option that sets target deleting index ID.
+// WithTargetIndexID returns Option that sets target deleting index ID.
+// The index ID must be a non-empty string matching the format: ^[a-zA-Z0-9_-]+$
func WithTargetIndexID(indexID string) Option {
return func(idx *index) error {
+ if indexID == "" {
+ return errors.NewErrInvalidOption("empty index ID", indexID)
+ }
+ // Add validation for index ID format
+ if !isValidIndexID(indexID) {
+ return errors.NewErrInvalidOption("invalid index ID format", indexID)
+ }
idx.targetIndexID = indexID
return nil
}
}
Committable suggestion was skipped due to low confidence.
context.Background(), | ||
runner.WithName(name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding a context timeout.
Using context.Background()
without a timeout could potentially allow the deletion job to run indefinitely. Consider adding a reasonable timeout to ensure the job completes within expected boundaries.
- return runner.Do(
- context.Background(),
+ ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute)
+ defer cancel()
+ return runner.Do(
+ ctx,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
context.Background(), | |
runner.WithName(name), | |
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Minute) | |
defer cancel() | |
return runner.Do( | |
ctx, | |
runner.WithName(name), |
// Deletion represents auto indexing service configurations. | ||
Deletion *config.IndexDeleter `json:"deleter" yaml:"deleter"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect documentation comment.
The comment for the Deletion field incorrectly refers to "auto indexing service" instead of "deletion service".
Apply this fix:
- // Deletion represents auto indexing service configurations.
+ // Deletion represents index deletion service configurations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Deletion represents auto indexing service configurations. | |
Deletion *config.IndexDeleter `json:"deleter" yaml:"deleter"` | |
// Deletion represents index deletion service configurations. | |
Deletion *config.IndexDeleter `json:"deleter" yaml:"deleter"` |
discoverer.WithOnDiscoverFunc(func(ctx context.Context, c discoverer.Client, addrs []string) error { | ||
last := len(addrs) - 1 | ||
for i := 0; i < len(addrs)/2; i++ { | ||
addrs[i], addrs[last-i] = addrs[last-i], addrs[i] | ||
} | ||
return nil | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Address unused parameters in OnDiscoverFunc
.
The parameters ctx
and c
are not used within the OnDiscoverFunc
. To avoid potential confusion or linting warnings, consider renaming them to _
to indicate they are intentionally unused.
Apply this diff to address the unused parameters:
discoverer.WithOnDiscoverFunc(func(ctx context.Context, c discoverer.Client, addrs []string) error {
+ // ctx and c are unused in this function
+ // Reverse the order of addrs to prioritize certain addresses
+
- last := len(addrs) - 1
+ last := len(addrs) - 1
for i := 0; i < len(addrs)/2; i++ {
addrs[i], addrs[last-i] = addrs[last-i], addrs[i]
}
return nil
}),
Committable suggestion was skipped due to low confidence.
if r.observability != nil { | ||
if err := r.observability.Stop(ctx); err != nil { | ||
errs = errors.Join(errs, err) | ||
} | ||
} | ||
if r.server != nil { | ||
if err := r.server.Shutdown(ctx); err != nil { | ||
errs = errors.Join(errs, err) | ||
} | ||
} | ||
return errs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure all resources are properly closed in Stop
method.
Currently, the Stop
method checks and stops r.observability
and r.server
but does not explicitly stop r.indexer
. If r.indexer
holds resources or goroutines, they should be properly closed or cancelled to prevent resource leaks.
Apply this diff to stop the indexer
:
func (r *run) Stop(ctx context.Context) (errs error) {
+ if err := r.indexer.Stop(ctx); err != nil {
+ errs = errors.Join(errs, err)
+ }
if r.observability != nil {
if err := r.observability.Stop(ctx); err != nil {
errs = errors.Join(errs, err)
}
}
if r.server != nil {
if err := r.server.Shutdown(ctx); err != nil {
errs = errors.Join(errs, err)
}
}
return errs
}
Ensure that the service.Deleter
interface includes a Stop
method to properly handle shutdown.
Committable suggestion was skipped due to low confidence.
r.eg.Go(safety.RecoverFunc(func() (err error) { | ||
defer func() { | ||
p, err := os.FindProcess(os.Getpid()) | ||
if err != nil { | ||
// using Fatal to avoid this process to be zombie | ||
// skipcq: RVV-A0003 | ||
log.Fatalf("failed to find my pid to kill %v", err) | ||
return | ||
} | ||
log.Info("sending SIGTERM to myself to stop this job") | ||
if err := p.Signal(syscall.SIGTERM); err != nil { | ||
log.Error(err) | ||
} | ||
}() | ||
return r.indexer.Start(ctx) | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider more graceful shutdown instead of sending SIGTERM
to self.
Sending a SIGTERM
to the process within the deferred function might not be the most graceful way to signal shutdown. This approach can potentially lead to abrupt termination without proper cleanup. It would be better to implement a cancellation mechanism using contexts or error handling to control the lifecycle more gracefully.
Consider refactoring the code to use context cancellation:
r.eg.Go(safety.RecoverFunc(func() (err error) {
- defer func() {
- p, err := os.FindProcess(os.Getpid())
- if err != nil {
- // using Fatal to avoid this process to be zombie
- log.Fatalf("failed to find my pid to kill %v", err)
- return
- }
- log.Info("sending SIGTERM to myself to stop this job")
- if err := p.Signal(syscall.SIGTERM); err != nil {
- log.Error(err)
- }
- }()
return r.indexer.Start(ctx)
}))
+// In the main function or where context is managed, consider cancelling the context when r.indexer.Start completes.
This change avoids sending signals to the process and allows for a controlled shutdown using context cancellation.
Committable suggestion was skipped due to low confidence.
if err != nil { | ||
var attrs trace.Attributes | ||
switch { | ||
case errors.Is(err, errors.ErrGRPCClientConnNotFound("*")): | ||
err = status.WrapWithInternal( | ||
vald.RemoveRPCName+" API connection not found", err, | ||
) | ||
attrs = trace.StatusCodeInternal(err.Error()) | ||
case errors.Is(err, errors.ErrGRPCTargetAddrNotFound): | ||
err = status.WrapWithInternal( | ||
vald.RemoveRPCName+" API connection target address \""+strings.Join(idx.targetAddrs, ",")+"\" not found", err, | ||
) | ||
attrs = trace.StatusCodeInternal(err.Error()) | ||
default: | ||
var ( | ||
st *status.Status | ||
msg string | ||
) | ||
st, msg, err = status.ParseError(err, codes.Internal, | ||
"failed to parse "+vald.RemoveRPCName+" gRPC error response", | ||
) | ||
attrs = trace.FromGRPCStatus(st.Code(), msg) | ||
} | ||
log.Warn(err) | ||
if span != nil { | ||
span.RecordError(err) | ||
span.SetAttributes(attrs...) | ||
span.SetStatus(trace.StatusError, err.Error()) | ||
} | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor error handling for maintainability in Start
method.
The error handling logic in the Start
method is complex and repetitive. Refactoring this logic into a separate helper function would improve readability and maintainability.
Create a helper function to handle error wrapping and attribute assignment:
func (idx *index) handleStartError(ctx context.Context, span trace.Span, err error) error {
var attrs trace.Attributes
switch {
case errors.Is(err, errors.ErrGRPCClientConnNotFound("*")):
err = status.WrapWithInternal(
vald.RemoveRPCName+" API connection not found", err,
)
attrs = trace.StatusCodeInternal(err.Error())
case errors.Is(err, errors.ErrGRPCTargetAddrNotFound):
err = status.WrapWithInternal(
vald.RemoveRPCName+" API connection target address \""+strings.Join(idx.targetAddrs, ",")+"\" not found", err,
)
attrs = trace.StatusCodeInternal(err.Error())
default:
var (
st *status.Status
msg string
)
st, msg, err = status.ParseError(err, codes.Internal,
"failed to parse "+vald.RemoveRPCName+" gRPC error response",
)
attrs = trace.FromGRPCStatus(st.Code(), msg)
}
log.Warn(err)
if span != nil {
span.RecordError(err)
span.SetAttributes(attrs...)
span.SetStatus(trace.StatusError, err.Error())
}
return err
}
Then update the Start
method:
err := idx.doDeleteIndex(ctx, /*...*/)
if err != nil {
- // Existing error handling logic
+ return idx.handleStartError(ctx, span, err)
}
return nil
e := &errors.ErrCriticalOption{} | ||
if errors.As(oerr, &e) { | ||
log.Error(err) | ||
return nil, oerr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible misuse of error variables in error handling.
In the New
function, the error variable err
is used in log.Error(err)
, but it might be more appropriate to use oerr
, which contains the wrapped error with additional context. This ensures that the logged error message includes all relevant information.
Consider modifying the error logging to use oerr
:
if errors.As(oerr, &e) {
- log.Error(err)
+ log.Error(oerr)
return nil, oerr
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
e := &errors.ErrCriticalOption{} | |
if errors.As(oerr, &e) { | |
log.Error(err) | |
return nil, oerr | |
e := &errors.ErrCriticalOption{} | |
if errors.As(oerr, &e) { | |
log.Error(oerr) | |
return nil, oerr |
[WARNING:INTCFG] Changes in |
FROM ghcr.io/vdaas/vald/vald-buildbase:nightly AS builder | ||
LABEL maintainer="vdaas.org vald team <vald@vdaas.org>" | ||
# skipcq: DOK-DL3002 | ||
USER root:root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WORKDIR ${GOPATH}/src/github.com/${ORG}/${REPO} | ||
SHELL ["/bin/bash", "-o", "pipefail", "-c"] | ||
#skipcq: DOK-W1001, DOK-SC2046, DOK-SC2086, DOK-DL3008 | ||
RUN --mount=type=bind,target=.,rw \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CHATOPS:HELP] ChatOps commands.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/v1.7 #2722 +/- ##
===============================================
Coverage ? 23.95%
===============================================
Files ? 545
Lines ? 54324
Branches ? 0
===============================================
Hits ? 13015
Misses ? 40535
Partials ? 774 ☔ View full report in Codecov by Sentry. |
Description
Delete expired index from Agent using k8s Job.
The user sets the
index_id
in config and deletes the index.This feature is currently implementing.
Related Issue
close: #2647
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit
Release Notes
New Features
Improvements
Bug Fixes