-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for compressor plugins #585
base: main
Are you sure you want to change the base?
Conversation
24319b2
to
c0655bb
Compare
@lucagiac81 please rebase |
c0655bb
to
ce92cf2
Compare
@@ -995,6 +1140,43 @@ void MutableCFOptions::RefreshDerivedOptions(int num_levels, | |||
max_file_size[i] = target_file_size_base; | |||
} | |||
} | |||
|
|||
if (compressor == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently discovered issue to be addressed as part of the review:
when using an option string specifying the (pre-existing) “compression” option instead of the (new) “compressor” option, the "compression" option is not applied.
This is caused by the following:
- When the base MutableCFOptions is created, compression defaults to kSnappyCompression . This leads to compressor being set to SnappyCompressor in RefreshDerivedOptions.
- When the option string is then parsed, the “compression” option won’t override the existing “compressor”, as it’s not null.
- In other words, the code can’t differentiate whether the SnappyCompressor selection came explicitly from the user or from the default settings.
This can be addressed by
- Finding a way to eliminate the compression/compressor option duplication (best solution), or
- having a separate variable for a "derived" compressor in MutableCFOptions, which is computed from the compressor/compression options. In this way, user selection is never overwritten by default settings.
A test must also be added to detect the bug and confirm the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the current status of this issue? It looks like a critical issue to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a solution using the "derived" compressor approach described above (and a test to reproduce the issue and validate the fix). Now that #397 is merged, I'll try to look for a better solution before pushing an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucagiac81 thank you
#397 should be able to help solve this problem. Using it, we can set the compressor as an option and not serialize the compression+opts if the Compressor is not set. Once your PR is merged, I suggest you open another issue to outline that issue, preferably with an associated unit test. Then I can help work on a fix. |
util/compressor.cc
Outdated
CreateIfMatches<ZSTDNotFinalCompressor>(id, result)) { | ||
return Status::OK(); | ||
} else { | ||
#ifndef ROCKSDB_LITE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROCKSDB_LITE is deprecated, should not be used. But its removal can be handled in a separate PR (there are other leftovers in speedb code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed when I removed ROCKSDB_LITE support. I'll remove it and submit an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed any leftover ROCKSDB_LITE support from the files modified by this PR (in compressor.cc and block_based_table_builder.cc).
ce92cf2
to
636da34
Compare
636da34
to
0f12bbe
Compare
Code is rebased on latest main based RocksDB 8.6.7 (the previous code was based on RocksDB 8.1.1). Fix vector option parsing
Add test for option override issue
Separate derived compression settings
I also tried to fix the bug using PR397 (with kUseBaseAddress), but I could not cover all cases.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
This is a follow-up to PR #584, adding support for external compressor plugins.
The first commit in this PR will match PR #584 until that is merged.
It is equivalent to PR6717 in RocksDB.
The Compressor and related classes are made part of the public API to allow development of plugins and to expose compressors in options.
Compressor plugins can be used to easily integrate new compression algorithms into Speedb. They can also implement compression techniques tailored to specific types of data. For example, if the values in a database are of numeric type (e.g., arrays of integers) with particular distributions (e.g., limited range within each block), the values could be compressed using a lightweight compression algorithm implemented as a plugin (such as frame-of-reference or delta encoding).
Options for Compressors
New options are added to support plugin compressors.
For example, compression was previously configured by compression (of type CompressionType) and compression_opts (of type CompressionOptions). This PR adds a compressor option (pointer to Compressor) to specify a compressor object (which includes type and options).
This approach was followed for the following options:
The existing CompressionType/CompressionOptions options are preserved for backward compatibility.
If the user doesn't specify compressors (leaving them null), the CompressionType/CompressionOptions options are used as before. Otherwise, compressors override the existing options.
A new constant kPluginCompression is defined in CompressionType for plugin compressors. The SST properties block stores information about the specific compressor in the compression_name field. This is used to instantiate a suitable compressor when opening the SST.
Option String Examples
Built-in compressor (existing options)
compression=kZSTD;compression_opts={level=1}
Built-in compressor (new options)
compressor={id=ZSTD;level=1}
Plugin compressor (new options)
compressor={id=my_compressor;my_option1=value1;my_option2=value2}
Options Object Example
Built-in compressor (existing options)
Built-in compressor (new options)
Plugin compressor (new options)
db_bench
For db_bench, compression_type and individual compression options (such as compression_level) were left unchanged for backward compatibility.
compression_type is still used with plugin compressors to specify their name. Other compressor options can be passed using compressor_options.
Built-in compressor (existing options)
--compression_type=zstd --compression_level=1
Built-in compressor (new options)
--compression_type=zstd --compressor_options="level=1"
Plugin compressor (new options)
--compression_type=my_compressor --compressor_options="my_option1=value1;my_option2=value2"
Limitations/Future Work
Compressor plugins are currently not supported for
These limitations will be addressed by future PRs.