A BST (binary search tree) written in Rust that supports efficient query and teardown scenarios, i.e. the typical usage pattern is to build a master copy of the tree, then
- clone the master copy to a new tree
- tear the tree down with a series of delete-range operations (and do something with the retrieved items), interspersed with range queries
- rinse, repeat
Two data structures are currently implemented: TeardownTree and IntervalTeardownTree (an augmented Interval Tree), both
with conventional Map
and Set
interfaces.
The tree does not use any kind of self-balancing and does not support insert operation.
The tree is pointer-free, meaning that nodes do not store explicit pointers to their children. The only thing we
store in a node is your data. This is similar to how binary heaps work: all nodes in the tree reside in an array, the
root always at index 0, and given a node with index i, its left/right children are found at indices 2*i+1
and
2*i+2
. Thus no dynamic memory allocation or deallocation is required. This makes it possible to implement a fast
clone operation: instead of traversing the tree, allocating and copying each node individually, we are able to
allocate the whole array in a single call and efficiently copy the entire content. The tree also supports a refill
operation (currently only implemented for T: Copy
), which copies the contents of the master tree into self
without allocating at all.
As to delete-range operation, we use a custom algorithm running in O(k + log n)
time, where k
is the number
of items deleted (and returned) and n
is the initial size of the tree. Detailed description.
An exhaustive automated test for delete-range has been written and is found in lib.rs
. I have tested all trees
up to the size n=10. All the other supported operations have been tested extensively for every variation of the tree (we
use slightly different algorithms for IntervalTree and for the filtered variants). In general, I feel the quality is
already pretty good. If you find any bugs, please open an issue.
The library has been optimized for speed based on profiling. A significant amount of unsafe code is employed. Every occurrence of unsafe code has been carefully reviewed, most have been annotated with comments elaborating on safety.
[dependencies]
teardown_tree = "0.6.6"
extern crate teardown_tree;
- Install Rust and Cargo (any recent version will do, stable or nightly).
git clone https://github.com/kirillkh/rs_teardown_tree.git
cd rs_teardown_tree/benchmarks
cargo run --release
All query and delete operations work in O(k + log n) time, where k is the number of items queried/deleted/returned
and n
is the initial size of the tree. Note that if you have already deleted m
items, the next delete_range
operation will still take O(n), not O(n-m) time as in many other data structures. However, this distinction only
matters in practice when delete's are interspersed with insert's, and TeardownTree does not support insert's.
The amount of memory consumed by a TeardownTree built with n items, each of size s
is n*s + n + 2*log_2(n) + z
bytes, where z is a small constant. (The first term is the size of an array holding just your data; the second -- of an array of flags
that are unset for removed items; the third -- of two auxiliary arrays used internally by the delete_range algorithm).
The first set of benchmarks compares TeardownTree::delete_range()
against:
BTreeSet::remove()
in Rust's standard libraryTreap
SplayTree
TeardownSet::delete()
, which deletes a single elementUnbalancedBST
, a pointer-based Binary Search Tree that uses an efficient delete_range() algorithm
I made straightforward modifications to Treap
and SplayTree
in order to add support for delete_range, however
BTreeSet
lacks an equivalent operation (it has an O(log n)
split
, but not merge
, see
Rust #34666), therefore BTreeSet::remove()
is used instead.
As the graph above shows, on my machine the whole refill/teardown sequence on a tree of 1,000,000 u64 items (we refill the
tree with elements from the master copy, then delete 1000 items at a time until the tree is empty), is ~20 times faster
with TeardownTreeSet<usize>::delete_range()
than with BTreeSet<usize>::remove()
. It also consumes 45% less memory.
Another set of benchmarks exposes the overhead of several variations of the data structure and its algorithms.
For details, please see the benchmarks page.
- No
insert
operation (yet?..) - The storage is not deallocated until the structure is dropped.
- Fine print regarding complexity:
delete_range
works inO(k + log n)
time, wheren
is the initial size of the tree, not its current size (if you have already deletedm
items, the nextdelete_range
operation will still take O(log n) time in the worst case, not necessarily O(log(n-m))). The same applies to the other logarithmic operations. - Performance is sensitive to the size of your data. Starting from a certain size, it is faster to use a TeardownSet<Box<(Key,Value)>> or store the key-value pairs separately and use external handles as keys (e.g. TeardownSet<INDEX_INTO_EXTERNAL_VEC> or TeardownSet<&MyKey>). It's probably a good idea to run some benchmarks to know what's best in your case.