Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing similarity index bugs #171

Merged
merged 4 commits into from
Jan 12, 2024
Merged

Fixing similarity index bugs #171

merged 4 commits into from
Jan 12, 2024

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Jan 10, 2024

Fixes issues ID'd in #168 and #170

mongodb backend

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart",
    max_samples=10,
    dataset_name="zzz",
    drop_existing_dataset=True,
)

view = dataset[:5]

# optional: precompute embeddings
model = foz.load_zoo_model("clip-vit-base32-torch")
embeddings = view.compute_embeddings(model)
view.set_values("embeddings", embeddings.tolist())

index = fob.compute_similarity(
    view,  # although not recommended, this does work
    model="clip-vit-base32-torch",
    backend="mongodb",
    brain_key="img_sim",
    embeddings="embeddings",
)

print(index.total_index_size)  # 5
print(index.index_size)  # 5

# index is created even though embeddings already existed
assert index._index

# issues warnings for skipped IDs
embeddings, sample_ids, _ = index.compute_embeddings(dataset[3:6])
index.add_to_index(embeddings, sample_ids, overwrite=False, warn_existing=True)

print(index.total_index_size)  # 6
print(index.index_size)  # 5

# skips existing but no warnings
embeddings, sample_ids, _ = index.compute_embeddings(dataset[5:])
index.add_to_index(embeddings, sample_ids, overwrite=False)

print(index.total_index_size)  # 10
print(index.index_size)  # 5

del_view = dataset[4:6]
index.use_view(del_view)

print(index.total_index_size)  # 10
print(index.index_size)  # 2

# works even though some IDs are not in the index's original view
index.remove_from_index(sample_ids=del_view.values("id"))

print(index.total_index_size)  # 8
print(index.index_size)  # 0

index.use_view(dataset)

print(index.total_index_size)  # 8
print(index.index_size)  # 8

assert index.ready

# issues a warning rather than raising an error
puppies = dataset.take(5).sort_by_similarity("puppies")
# The MongoDB backend does not yet support views; the full index will instead be queried, which may result in fewer matches in your current view

sklearn backend

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart",
    max_samples=10,
    dataset_name="zzz",
    drop_existing_dataset=True,
)

view = dataset[:5]

index = fob.compute_similarity(
    view,  # although not recommended, this does work
    model="clip-vit-base32-torch",
    backend="sklearn",
    brain_key="img_sim",
    embeddings="embeddings",
)

print(index.total_index_size)  # 5
print(index.index_size)  # 5

# issues warnings for skipped IDs
embeddings, sample_ids, _ = index.compute_embeddings(dataset[3:6])
index.add_to_index(embeddings, sample_ids, overwrite=False, warn_existing=True)

print(index.total_index_size)  # 6
print(index.index_size)  # 5

# skips existing but no warnings
embeddings, sample_ids, _ = index.compute_embeddings(dataset[5:])
index.add_to_index(embeddings, sample_ids, overwrite=False)

print(index.total_index_size)  # 10
print(index.index_size)  # 5

del_view = dataset[4:6]
index.use_view(del_view)

print(index.total_index_size)  # 10
print(index.index_size)  # 2

# works even though some IDs are not in the index's original view
index.remove_from_index(sample_ids=del_view.values("id"))

print(index.total_index_size)  # 8
print(index.index_size)  # 0

index.use_view(dataset)

print(index.total_index_size)  # 8
print(index.index_size)  # 8

puppies = dataset.take(5).sort_by_similarity("puppies")

Copy link
Contributor

@allenleetc allenleetc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brimoor in #169, issues 1 2 3 are resolved but issue 0 still errors, guessing due to

Do we want to try to resolve in this PR?

@brimoor
Copy link
Contributor Author

brimoor commented Jan 11, 2024

@allenleetc ah yes, you are correct! Added your fix in bd2fffe 👍

Copy link
Contributor

@allenleetc allenleetc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@brimoor brimoor merged commit c5b5f12 into develop Jan 12, 2024
4 checks passed
@brimoor brimoor deleted the similarity-bugs branch January 12, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants