author | title | |
---|---|---|
|
Offloading B+ Tree Insertion |
Databases use the B+ tree data structure to store database records. Databases then perform read and write operations on these stored records. However, database write operations, or in other words B+ tree insertion, can be expensive. So in workloads with a lot of writes, write operations must be carefully optimized.
In some workloads, write operations come periodically in large chunks which I call packets. The naive approach in dealing with these packets is writing each of the records one-by-one into the existing B+ tree that contains the database's records. However, as mentioned B+ tree insertion is expensive, especially when the insertion forces the B+ tree to grow and shift records around because it runs out of space. A database server might end up disturbing the B+ tree several times when attempting to write the records for just one of these packets. This will take away valuable CPU time from read operations which are happening constantly and at the same time. Furthermore, in this simplified model of a database system B+ trees aren't necessarily parallelized well. So, a packet operation which slowly writes records one a time will completely lock out any read operations that should have been happening at the same time.
One solution to this problem is to take this packet of records and just put it into its own B+ tree. This will take a short amount of time and read operations can resume immediately. Read operations will now have to check the original B+ tree and this new temporary B+ tree as well when searching for a new element.
When another packet comes, it also gets its own B+ tree. As more packets arrive, at some point the database will have to merge the B+ trees together for efficiency concerns: every new B+ Tree means that read operations will have to search through an extra B+ tree when finding the element they are looking for. This is not a big problem because B+ trees are sorted so a merging algorithm is very efficient; we get the records from two of these B+ trees, merge them together, then bulk-load them into a new B+ tree. An easy way to formalize this merging is to have an upper bound on total B+ trees (say 16), and then when we reach that upper bound, merge every 2 B+ trees so we end up with 8.
The program is coded in C++ and built using the meson build system. It
runs four threads at the same time:
After sending a packet to the
The
Packets are sent periodically, according to a timer. All threads have a
start time. Threads finish working when some amount of time passes,
usually one second. This is done by finding the difference between the
current time and the start time in nanoseconds, using the
A read and write function call in the B+ tree wrapper object cannot
happen at the same time. We cannot be modifying or adding to the vector
of B+ trees while reading from it, and much less merging B+ trees
together. So, both methods are locked with a lock. The
Records have random
The B+ tree code is located in the
The threads send packets and read operations to one another using MoodyCamel's ReaderWriterQueue (https://github.com/cameron314/readerwriterqueue). This queue is really fast and concurrent for one reader thread and one writer thread, exactly as our program requires.
The
Currently, there are three tests located in the
-
$nsLimit$ : nanosecond limit, -
$nsInterval$ : nanosecond interval, how often packages are sent, -
$packetQueueCapacity$ ,$commandQueueCapacity$ ,$idQueueCapacity$ : ReaderWriterQueue capacity, not very important, -
$P$ : records per package, important, -
$N$ : payload size per package, not important, -
$B$ : max number of B+ trees before merging, and -
$USE \textunderscore BTREE$ : if disabled, the B+ tree wrapper object's read and write methods do nothing. -
$SIMPLE$ : if enabled, there is only one B+ tree in the wrapper object, and records are inserted one-at-a-time from incoming packets.
In each of the upcoming graphs, there will be three lines: no B+ tree
operations, simple B+ tree object, and vector B+ tree object. The x-axis
will be
As
For very low values of
No B+ Tree will have best performance. I do not see a realistic scenario
where paying upfront overhead cost of Simple B+ Tree will be bigger than
Vector B+ Tree's multiple B+ tree reading. Simple B+ Tree's thoroughput
will always be much higher than Vector B+ Tree's. At high
Higher
As
Again, No B+ Tree will have best performance, and Simple B+ Tree will be bigger than Vector B+ Tree. Vector B+ tree will probably be even further behind than Simple B+ Tree compared to the first experiment, as more records per package benefits Vector B+ Tree.
Again, higher
As
Again, No B+ Tree will have best performance, and Simple B+ Tree will be
bigger than Vector B+ Tree. Vector B+ tree will get closer to Simple B+
tree with more
Again, higher
Missing because no data points for
It is clear that the Vector B+ Trees graph object performs much worse
than Simple B+ Trees object. It makes no sense that thoroughput
decreased for higher