-
Notifications
You must be signed in to change notification settings - Fork 124
Custom (De )Serialization Functions
For all standard layout, non-polymorphic aggregate types you do not have to write a custom function. However, when you have 3rd-party structs, structs with constructors or RAII structs that manage memory, etc. you need to either write a cista_members
function or override the serialize and deserialize functions.
If your type is not a standard layout, non-polymorphic aggregate (e.g. in cases of inheritance or custom constructors) it is sometimes sufficient to implement a cista_members
member function:
#include "cista.h"
struct A { int _a, _b; };
struct B : A {
// Option 1: reuse serialization from parent.
auto cista_members() { return std::tie(*static_cast<A*>(this), _c, _d); }
// Option 2: list all members from this and parents classes.
// auto cista_members() { return std::tie(_a, _b, _c, _d); }
int _c, _d;
};
int main() {
B b{1, 2, 3, 4};
auto const buf = cista::serialize(b);
auto const ptr = cista::deserialize<B>(buf);
printf("%d %d %d %d\n", ptr->_a, ptr->_b, ptr->_c, ptr->_d);
}
cista_members()
is required to return a tuple of references to all members that should be serialized/deserialized by cista. As shown in the example above, it is possible to repeat members from parent classes or reuse the parent's serialization/deserialization functions.
Implementing cista_members()
is sufficient if
- your custom
serialize
function would only callserialize
for each member and - your custom
deserialize
function would only calldeserialize
for each member
These cases can be implemented more conveniently using cista_members
.
If you need more complex logic because your struct contains pointers, makes allocations in the constructor and releases memory in the destructor, etc. you need to write custom serialization and deserialization logic as described in the next sections.
By default, every value gets copied raw byte-by-byte. For each type you would like serialize with a custom function, you need to override the following function for your type.
This function will be called with
- The serialization context (described below). It provides functions to write to the buffer and to translate pointers to offsets.
- A pointer to the original (i.e. not the serialized!) value of your struct.
- The offset value where the value of
YourType
has been copied to. You can use this information to adjust certain members ofYourType
. For example:ctx.write(pos + offsetof(cista::string, h_.ptr_), cista::convert_endian<Ctx::MODE>(start))
. This overrides the pointer contained incista::string
with the offset, the bytes have been copied to by callingstart = c.write(orig->data(), orig->size(), 1)
.
template <typename Ctx>;
void serialize(Ctx&, YourType const*, cista::offset_t const);
The Ctx
parameter is templated to support different
serialization targets (e.g. file and buffer).
Ctx
provides the following members:
struct serialization_context {
/**
* Writes the values at [ptr, ptr + size[ to
* the end of the serialization target buffer.
* Adjusts for alignment if needed and returns
* the new (aligned) offset the value was written to.
*
* Appends to the buffer (resize).
*
* \param ptr points to the data to write
* \param size number of bytes to write
* \param alignment the alignment to consider
* \return the alignment adjusted offset
*/
offset_t write(void const* ptr, offset_t const size,
offset_t alignment = 0);
/**
* Overrides the value at `pos` with value `val`.
*
* Note: does not append to the buffer.
* The position `pos` needs to exist already
* and provide enough space to write `val`.
*
* \param pos the position to write to
* \param val the value to copy to position `pos`
*/
template <typename T>;
void write(offset_t const pos, T const& val);
};
Important: As soon, as you decide to implement your own custom serialization function, you are on your own and need to make sure, serialization functions are called for all members of your type.
Example:
namespace data = cista::offset;
struct my_type {
// Constructor makes my_type non-aggregate,
// disabling cista's auto mode.
my_type(int);
// Member with pointers
// that need to be serialized.
data::vector<data::ptr<my_type>> paths_;
};
template <typename Ctx>
inline void serialize(Ctx & context, my_type const* el,
cista::offset_t const offset) {
using cista::serialize;
serialize(context, &el->paths_, offset + offsetof(my_type, paths_));
// call serialize on all members!
// always keep this up-to-date as you add member variables!
}
template <typename Ctx>
inline void deserialize(Ctx const& context, my_type* el) {
using cista::deserialize;
deserialize(context, &el->paths_);
// call deserialize on all members!
// always keep this up-to-date as you add member variables!
}
If you don't, you'll see issues as described in https://github.com/felixguendling/cista/issues/139
For all members that require custom serialization (including raw or pointers), call the serialize()
function recursively.
For custom deserialization, there are two options:
Not all three functions need to be implemented. There are generic default overloads for every function.
Deserialization is split into three phases for every data structure. This is necessary because in the cista::mode::DEEP_CHECK
mode (which is required when reading untrusted data), a two phase approach is used:
- The first phase goes decends recursivley through the (tree-like) owning memory hiearchy.
- The second phase skips the endian and pointer conversions but therefore goes through the (graph-like) data structure following owning and non-owning pointers. This resembels a breadth-first-search (BFS) on the stored data. In the second phase only the
recurse
andcheck_state
functions are applied.
/**
* STEP 1: Restore Scalars & Pointers.
*
* Convert endian for each scalar field of YourType
* Use c.convert_endian(...) to do so.
*
* Convert raw pointers from the stored offset values
* to absolute pointers.
* This is only necessary in cista::raw mode.
*/
template <typename Ctx, typename T>
void convert_endian_and_ptr(Ctx const& c, YourType* el);
/**
* STEP 2: Status Check.
*
* Called after scalars and pointers have been
* restored. All members should be valid.
*
* Check if the data structure has a valid state.
* E.g. for pointers:
* they should either point to null
* or to a valid address within the buffer
* c.check_ptr() does this check.
*/
template <typename Ctx, typename T>
void check_state(Ctx const& c, YourType* el);
/**
* STEP 3: Recursive Call.
*
* For containers like unique_ptr, vector, etc.
* Tell the "main" method where to precede
* with deserialization
*/
template <typename Ctx, typename T, typename Fn>
void recurse(Ctx& c, YourType* el, Fn&& fn);
If you do not need the cista::mode::DEEP_CHECK
option, it is sufficient to implement a single deserialize function that does the combination of all three functions described above. Use is_mode_enabled(Ctx::MODE, mode::UNCHECKED)
to optionally disable all bounds checks.
template <typename Ctx, typename T>
inline void deserialize(Ctx const& c, std::vector<T>* el);
The following functions are provided by the Ctx
type:
struct deserialization_context {
/**
* Computes and stores the absolute address value
* of a stored raw pointer.
* A raw pointer stores a offset relative to itself
* that will be converted to a absolute address.
*
* \param ptr the raw pointer
*/
template <typename Ptr>;
void deserialize_ptr(Ptr** ptr) const;
/**
* Checks whether the pointer points to
* a valid memory address within the buffer
* where at least `size` bytes are available.
*
* \param el the memory address to check
* \param size the size to check for
* \throws if there are bytes outside the buffer
*/
template <typename T> void check_ptr(offset_ptr<T> el, size_t size = sizeof(T)) const;
template <typename T> void check_ptr(T* el, size_t size = sizeof(T)) const;
/**
* Checks whether a bool has a valid value (0 or 1).
* Accessing invalid values is undefined behaviour.
*
* \param b the boolean value to check
*/
static void check_bool(bool const& b);
/**
* Throws a std::runtime_error{msg}
* if `condition` is false.
*
* \param condition the condition to check
* \param msg the exception message
*/
require(bool condition, char const* msg) const;
/**
* Does endian conversion if necessary
* depending on the mode.
* Apply this to all serialized scalar values.
*
* \param el scalar value to convert.
*/
void convert_endian(T& el) const;
};
For all members that require custom deserialization (including raw or pointers), call the deserialize()
function recursively.