Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] implement vald benchmark #1

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions engine/clients/client_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
)
from engine.clients.qdrant import QdrantConfigurator, QdrantSearcher, QdrantUploader
from engine.clients.redis import RedisConfigurator, RedisSearcher, RedisUploader
from engine.clients.vald import ValdConfigurator, ValdSearcher, ValdUploader
from engine.clients.weaviate import (
WeaviateConfigurator,
WeaviateSearcher,
Expand All @@ -33,6 +34,7 @@
"elastic": ElasticConfigurator,
"opensearch": OpenSearchConfigurator,
"redis": RedisConfigurator,
"vald": ValdConfigurator,
}

ENGINE_UPLOADERS = {
Expand All @@ -42,6 +44,7 @@
"elastic": ElasticUploader,
"opensearch": OpenSearchUploader,
"redis": RedisUploader,
"vald": ValdUploader,
}

ENGINE_SEARCHERS = {
Expand All @@ -51,6 +54,7 @@
"elastic": ElasticSearcher,
"opensearch": OpenSearchSearcher,
"redis": RedisSearcher,
"vald": ValdSearcher,
}


Expand Down
3 changes: 3 additions & 0 deletions engine/clients/vald/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from engine.clients.vald.configure import ValdConfigurator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9% of developers fix this issue

F401: 'engine.clients.vald.configure.ValdConfigurator' imported but unused

❗❗ 3 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
engine/clients/vald/init.py 2
engine/clients/vald/init.py 3
engine/clients/vald/parser.py 1

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

from engine.clients.vald.search import ValdSearcher
from engine.clients.vald.upload import ValdUploader
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19% of developers fix this issue

W292: no newline at end of file


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

237 changes: 237 additions & 0 deletions engine/clients/vald/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
from kubernetes import client
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8% of developers fix this issue

E0401: Unable to import 'kubernetes'

❗❗ 8 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
engine/clients/vald/configure.py 1
engine/clients/vald/search.py 2
engine/clients/vald/search.py 4
engine/clients/vald/search.py 5
engine/clients/vald/upload.py 1
engine/clients/vald/upload.py 5
engine/clients/vald/upload.py 6
engine/clients/vald/upload.py 7

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9% of developers fix this issue

reportMissingImports: Import "kubernetes" could not be resolved

❗❗ 8 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
engine/clients/vald/configure.py 1
engine/clients/vald/search.py 2
engine/clients/vald/search.py 4
engine/clients/vald/search.py 5
engine/clients/vald/upload.py 1
engine/clients/vald/upload.py 5
engine/clients/vald/upload.py 6
engine/clients/vald/upload.py 7

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.



def _metadata(name="vald-agent-ngt"):
return client.V1ObjectMeta(
name=name,
labels={
"app": "vald-agent-ngt",
"app.kubernetes.io/name": "vald",
"helm.sh/chart": "vald-v1.7.6",
"app.kubernetes.io/managed-by": "Helm",
"app.kubernetes.io/instance": "vald",
"app.kubernetes.io/version": "v1.7.6",
"app.kubernetes.io/component": "agent",
},
)


_label_selector = client.V1LabelSelector(match_labels={"app": "vald-agent-ngt"})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6% of developers fix this issue

E501: line too long (80 > 79 characters)

❗❗ 15 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
engine/clients/vald/config.py 24
engine/clients/vald/config.py 38
engine/clients/vald/config.py 82
engine/clients/vald/config.py 114
engine/clients/vald/config.py 124
engine/clients/vald/config.py 133
engine/clients/vald/config.py 200
engine/clients/vald/configure.py 17
engine/clients/vald/parser.py 7
engine/clients/vald/parser.py 10

Showing 10 of 15 findings. Visit the Lift Web Console to see all.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


POD_DISRUPTION_BUDGET = client.V1PodDisruptionBudget(
api_version="policy/v1",
metadata=_metadata(),
spec=client.V1PodDisruptionBudgetSpec(max_unavailable=1, selector=_label_selector),
)

CONFIG_MAP = client.V1ConfigMap(
api_version="v1",
metadata=_metadata(name="vald-agent-ngt-config"),
data={"config.yaml": ""},
)

SERVICE_PORT = client.V1ServicePort(
name="grpc", port=8081, target_port=8081, protocol="TCP", node_port=30081
)

READINESS_PORT = client.V1ServicePort(
name="readiness", port=3001, target_port=3001, protocol="TCP", node_port=30001
)

SERVICE = client.V1Service(
api_version="v1",
metadata=_metadata(),
spec=client.V1ServiceSpec(
ports=[SERVICE_PORT, READINESS_PORT],
selector={
"app.kubernetes.io/name": "vald",
"app.kubernetes.io/component": "agent",
},
type="NodePort",
),
)

STATEFUL_SET = client.V1StatefulSet(
api_version="apps/v1",
metadata=_metadata(),
spec=client.V1StatefulSetSpec(
service_name="vald-agent-ngt",
pod_management_policy="Parallel",
replicas=4,
revision_history_limit=2,
selector=_label_selector,
update_strategy=client.V1StatefulSetUpdateStrategy(
type="RollingUpdate", rolling_update={"partition": 0}
),
template=client.V1PodTemplateSpec(
metadata=client.V1ObjectMeta(
creation_timestamp=None,
labels={
"app": "vald-agent-ngt",
"app.kubernetes.io/name": "vald",
"app.kubernetes.io/instance": "vald",
"app.kubernetes.io/component": "agent",
},
),
spec=client.V1PodSpec(
affinity=client.V1Affinity(
node_affinity=client.V1NodeAffinity(
preferred_during_scheduling_ignored_during_execution=[]
),
pod_affinity=client.V1PodAffinity(
preferred_during_scheduling_ignored_during_execution=[],
required_during_scheduling_ignored_during_execution=[],
),
pod_anti_affinity=client.V1PodAntiAffinity(
preferred_during_scheduling_ignored_during_execution=[
client.V1WeightedPodAffinityTerm(
pod_affinity_term=client.V1PodAffinityTerm(
label_selector=client.V1LabelSelector(
match_expressions=[
client.V1LabelSelectorRequirement(
key="app",
operator="In",
values=["vald-agent-ngt"],
)
]
),
topology_key="kubernetes.io/hostname",
),
weight=100,
)
],
required_during_scheduling_ignored_during_execution=[],
),
),
containers=[
client.V1Container(
name="vald-agent-ngt",
image="vdaas/vald-agent-ngt:v1.7.6",
image_pull_policy="Always",
liveness_probe=client.V1Probe(
failure_threshold=2,
http_get=client.V1HTTPGetAction(
path="/liveness", port="liveness", scheme="HTTP"
),
initial_delay_seconds=5,
period_seconds=3,
success_threshold=1,
timeout_seconds=2,
),
readiness_probe=client.V1Probe(
failure_threshold=2,
http_get=client.V1HTTPGetAction(
path="/readiness", port="readiness", scheme="HTTP"
),
initial_delay_seconds=10,
period_seconds=3,
success_threshold=1,
timeout_seconds=2,
),
startup_probe=client.V1Probe(
http_get=client.V1HTTPGetAction(
path="/liveness", port="liveness", scheme="HTTP"
),
initial_delay_seconds=5,
timeout_seconds=2,
success_threshold=1,
failure_threshold=200,
period_seconds=5,
),
ports=[
client.V1ContainerPort(
name="liveness",
protocol="TCP",
container_port=3000,
),
client.V1ContainerPort(
name="readiness",
protocol="TCP",
container_port=3001,
),
client.V1ContainerPort(
name="grpc",
protocol="TCP",
container_port=8081,
),
],
resources=client.V1ResourceRequirements(
requests={"cpu": "100m", "memory": "100Mi"}
),
termination_message_path="/dev/termination-log",
termination_message_policy="File",
security_context=client.V1SecurityContext(
allow_privilege_escalation=False,
capabilities=client.V1Capabilities(drop=["ALL"]),
privileged=False,
read_only_root_filesystem=False,
run_as_group=65532,
run_as_non_root=True,
run_as_user=65532,
),
env=[
client.V1EnvVar(
name="MY_NODE_NAME",
value_from=client.V1EnvVarSource(
field_ref=client.V1ObjectFieldSelector(
field_path="spec.nodeName"
)
),
),
client.V1EnvVar(
name="MY_POD_NAME",
value_from=client.V1EnvVarSource(
field_ref=client.V1ObjectFieldSelector(
field_path="metadata.name"
)
),
),
client.V1EnvVar(
name="MY_POD_NAMESPACE",
value_from=client.V1EnvVarSource(
field_ref=client.V1ObjectFieldSelector(
field_path="metadata.namespace"
)
),
),
],
volume_mounts=[
client.V1VolumeMount(
name="vald-agent-ngt-config", mount_path="/etc/server"
)
],
)
],
dns_policy="ClusterFirst",
restart_policy="Always",
scheduler_name="default-scheduler",
security_context=client.V1PodSecurityContext(
fs_group=65532,
fs_group_change_policy="OnRootMismatch",
run_as_group=65532,
run_as_non_root=True,
run_as_user=65532,
),
termination_grace_period_seconds=120,
volumes=[
client.V1Volume(
name="vald-agent-ngt-config",
config_map=client.V1ConfigMapVolumeSource(
default_mode=420, name="vald-agent-ngt-config"
),
)
],
priority_class_name="default-vald-agent-ngt-priority",
),
),
),
)

PRIORITY_CLASS = client.V1PriorityClass(
api_version="scheduling.k8s.io/v1",
metadata=_metadata(name="default-vald-agent-ngt-priority"),
value=int(1e9),
preemption_policy="Never",
global_default=False,
description="A priority class for Vald agent.",
)
100 changes: 100 additions & 0 deletions engine/clients/vald/configure.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import kubernetes as k8s
import time
import yaml
from benchmark.dataset import Dataset
from engine.base_client.configure import BaseConfigurator
from engine.base_client.distances import Distance
from engine.clients.vald.config import (
POD_DISRUPTION_BUDGET,
CONFIG_MAP,
SERVICE,
STATEFUL_SET,
PRIORITY_CLASS,
)


def _delete(f1, f2, resource, namespace="default"):
res = f1(namespace, field_selector=f"metadata.name={resource.metadata.name}")
if len(res.items) > 0:
f2(resource.metadata.name, namespace)


class ValdConfigurator(BaseConfigurator):
DISTANCE_MAPPING = {
Distance.L2: "L2",
Distance.DOT: "COS",
Distance.COSINE: "COS",
}

def __init__(self, host, collection_params: dict, connection_params: dict):
super().__init__(host, collection_params, connection_params)

k8s.config.load_kube_config(connection_params["kubeconfig"])

def clean(self):
api_client = k8s.client.ApiClient()
policy_api = k8s.client.PolicyV1Api(api_client)
_delete(
policy_api.list_namespaced_pod_disruption_budget,
policy_api.delete_namespaced_pod_disruption_budget,
POD_DISRUPTION_BUDGET,
)
core_api = k8s.client.CoreV1Api(api_client)
_delete(
core_api.list_namespaced_config_map,
core_api.delete_namespaced_config_map,
CONFIG_MAP,
)
_delete(
core_api.list_namespaced_service,
core_api.delete_namespaced_service,
SERVICE,
)
apps_api = k8s.client.AppsV1Api(api_client)
_delete(
apps_api.list_namespaced_stateful_set,
apps_api.delete_namespaced_stateful_set,
STATEFUL_SET,
)
scheduling_api = k8s.client.SchedulingV1Api(api_client)
res = scheduling_api.list_priority_class(
field_selector=f"metadata.name={PRIORITY_CLASS.metadata.name}"
)
if len(res.items) > 0:
scheduling_api.delete_priority_class(PRIORITY_CLASS.metadata.name)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13% of developers fix this issue

W293: blank line contains whitespace

❗❗ 3 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
engine/clients/vald/parser.py 9
engine/clients/vald/parser.py 12
engine/clients/vald/parser.py 15

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

time.sleep(10) # TODO: using watch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11% of developers fix this issue

E261: at least two spaces before inline comment


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


def recreate(self, dataset: Dataset, collection_params):
api_client = k8s.client.ApiClient()
configmap = CONFIG_MAP
with open(collection_params["base_config"]) as f:
cfg = yaml.safe_load(f)
configmap.data = {
"config.yaml": yaml.safe_dump(
{
**cfg,
**collection_params["ngt_config"],
**{
"dimension": dataset.config.vector_size,
"distance_type": self.DISTANCE_MAPPING[
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10% of developers fix this issue

reportGeneralTypeIssues: Argument of type "str" cannot be assigned to parameter "__key" of type "Distance" in function "getitem"
  "str" is incompatible with "Distance"


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

dataset.config.distance
],
},
}
)
}

policy_api = k8s.client.PolicyV1Api(api_client)
policy_api.create_namespaced_pod_disruption_budget(
"default", POD_DISRUPTION_BUDGET
)
core_api = k8s.client.CoreV1Api(api_client)
core_api.create_namespaced_config_map("default", configmap)
core_api.create_namespaced_service("default", SERVICE)
apps_api = k8s.client.AppsV1Api(api_client)
apps_api.create_namespaced_stateful_set("default", STATEFUL_SET)
scheduling_api = k8s.client.SchedulingV1Api(api_client)
scheduling_api.create_priority_class(PRIORITY_CLASS)

time.sleep(30) # TODO: using watch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11% of developers fix this issue

E261: at least two spaces before inline comment


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

17 changes: 17 additions & 0 deletions engine/clients/vald/parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from typing import Any, List, Optional
from engine.base_client import IncompatibilityError
from engine.base_client.parser import BaseConditionParser, FieldValue


class ValdParser(BaseConditionParser):
def build_condition(self, and_subfilters: List[Any] | None, or_subfilters: List[Any] | None) -> Any | None:
raise IncompatibilityError

def build_exact_match_filter(self, field_name: str, value: FieldValue) -> Any:
raise IncompatibilityError

def build_range_filter(self, field_name: str, lt: FieldValue | None, gt: FieldValue | None, lte: FieldValue | None, gte: FieldValue | None) -> Any:
raise IncompatibilityError

def build_geo_filter(self, field_name: str, lat: float, lon: float, radius: float) -> Any:
raise IncompatibilityError
Loading