Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager Empty Bucket Checking #148

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Eager Empty Bucket Checking #148

wants to merge 7 commits into from

Conversation

Gillgamesh
Copy link
Contributor

Implements the following changes:

  1. The function effective_size() tracks the number of rows in a sketch column that are at or below a nonzero bucket. That is, it tracks (index of deepest nonempty bucket) + 1.
  2. There is an optional flag to enable the tracking of a nonempty_buckets bit array. This has the effect of making effective_size() take a constant number of instructions by clz. The flag is updating on calls to the three merge functions and to update().
  3. sample() and exhaustive_sample() have been updated to only check a small constant number of buckets below effective_size() for each column. This is based off a number of theoretical observations about the concentration of good buckets near the "best" bucket, on both the right and left. This should speed up calls to them by a 2-4x factor without sacrafices to the success probability.
  4. merge() and range_merge() now use effective_size() to not bother merging in buckets that are definitely zero.
  5. Similarly, serialize() and the deserialize constructor also use effective_size() to shrink the space needed for storing and sending sketches.

The performance testing needs to be done thorouhgly, but this should slightly speed up merging, signifcantly speed up querying, and slightly slow down updating.

@etwest
Copy link
Collaborator

etwest commented Mar 28, 2024

Since this pull request is modifying the Sketch class which is essential to correctness and performance I think we should be extremely thorough in our review. This means we may end up making a lot of critiques/suggestions. Just want to set that expectation before we start.

include/bucket.h Outdated Show resolved Hide resolved
include/bucket.h Outdated Show resolved Hide resolved
@@ -125,6 +125,18 @@ class Sketch {

std::mutex mutex; // lock the sketch for applying updates in multithreaded processing


/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line up * and provide a little more documentation. Specify that this operates per column and that all non-empty buckets are above this cutoff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't think of one

src/sketch.cpp Outdated
@@ -18,6 +27,14 @@ Sketch::Sketch(vec_t vector_len, uint64_t seed, size_t _samples, size_t _cols) :
buckets[i].alpha = 0;
buckets[i].gamma = 0;
}

#ifdef EAGER_BUCKET_CHECK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to have two forms of serialization:

  1. Direct "no-copy" serialization where we pull the bucket array data out directly.
  2. Compressed serialization which leverages the known column sizes to be more data size efficient.

To pull this off we need the buckets and flags to be stored contiguously. Suggest allocating a few extra buckets that are then used for the flags.

NOTE: We will all have to think about the right way to "receive" these serialized forms on the other side.

src/sketch.cpp Show resolved Hide resolved
src/sketch.cpp Outdated
@@ -78,6 +118,13 @@ void Sketch::update(const vec_t update_idx) {
size_t bucket_id = i * bkt_per_col + depth;
likely_if(depth < bkt_per_col) {
Bucket_Boruvka::update(buckets[bucket_id], update_idx, checksum);
#ifdef EAGER_BUCKET_CHECK
likely_if(!Bucket_Boruvka::is_empty(buckets[bucket_id])) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlikely_if(is_empty)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigate the performance of the following:

unlikely_if(is_empty)
  set_bit()
update()
unlikely_if(is_empty)
  clear_bit()

src/sketch.cpp Outdated
return {buckets[bucket_id].alpha, GOOD};

for (size_t col = first_column; col < first_column + cols_per_sample; ++col) {
int row = effective_size(col)-1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int(effective_size(col))

src/sketch.cpp Outdated
{
// first, check for emptyness
Bucket *current_row = buckets + (col_idx * bkt_per_col);
if (Bucket_Boruvka::is_empty(buckets[num_buckets - 1]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop this or move it into the #else. It's a more expensive call than the clzll

}
#ifdef EAGER_BUCKET_CHECK
unlikely_if(nonempty_buckets[col_idx] == 0) return 0;
return (uint8_t)((sizeof(unsigned long long) * 8) - __builtin_clzll(nonempty_buckets[col_idx]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a convenient way to use ctzll? It's about 3 times faster than clzll

@@ -17,7 +17,11 @@

constexpr uint64_t KB = 1024;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls clean this file. (i.e. get rid of the HashSet etc.)

}
BENCHMARK(BM_Std_Set_Hash_Iterator)->RangeMultiplier(2)->Range(1, 1 << 14);

BENCHMARK_MAIN();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add newline

@@ -5,6 +5,15 @@
#include <vector>
#include <cassert>


inline static void set_bit(vec_t &t, int position) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest switching the position of first bucket to most significant bit.
t |= 1 << (sizeof(vec_t) * 8 - position)

@DanielDeLayo DanielDeLayo removed their request for review September 6, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants