Skip to content

Custom (De )Serialization Functions

Felix Gündling edited this page Oct 9, 2019 · 21 revisions

For all standard layout, non-polymorphic aggregate types you do not have to write a custom function. However, when you have 3rd-party structs, structs with constructors or RAII structs that manage memory, etc. you need to override the serialize and deserialize functions.

Serialization

By default, every value gets copied raw byte-by-byte. For each type you would like serialize with a custom function, you need to override the following function for your type.

This function will be called with

  • The serialization context (described below). It provides functions to write to the buffer and to translate pointers to offsets.
  • A pointer to the original (i.e. not the serialized!) value of your struct.
  • The offset value where the value of YourType has been copied to. You can use this information to adjust certain members of YourType. For example: ctx.write(pos + offsetof(cista::string, h_.ptr_), cista::convert_endian<Ctx::MODE>(start)). This overrides the pointer contained in cista::string with the offset, the bytes have been copied to by calling start = c.write(orig->data(), orig->size(), 1).
template <typename Ctx>;
void serialize(Ctx&, YourType const*, cista::offset_t const);

The Ctx parameter is templated to support different serialization targets (e.g. file and buffer). Ctx provides the following members:

struct serialization_context {
  /**
   * Writes the values at [ptr, ptr + size[ to
   * the end of the serialization target buffer.
   * Adjusts for alignment if needed and returns
   * the new (aligned) offset the value was written to.
   *
   * Appends to the buffer (resize).
   *
   * \param ptr         points to the data to write
   * \param size        number of bytes to write
   * \param alignment   the alignment to consider
   * \return the alignment adjusted offset
   */
  offset_t write(void const* ptr, offset_t const size,
                 offset_t alignment = 0);

  /**
   * Overrides the value at `pos` with value `val`.
   *
   * Note: does not append to the buffer.
   * The position `pos` needs to exist already
   * and provide enough space to write `val`.
   *
   * \param pos  the position to write to
   * \param val  the value to copy to position `pos`
   */
  template <typename T>;
  void write(offset_t const pos, T const& val);
};

Deserialization

For custom deserialization, there are two options:

Split Deserialization Functions

Not all three functions need to be implemented. There are generic default overloads for every function.

Deserialization is split into three phases for every data structure. This is necessary because in the cista::mode::DEEP_CHECK mode (which is required when reading untrusted data), a two phase approach is used:

  • The first phase goes decends recursivley through the (tree-like) owning memory hiearchy.
  • The second phase skips the endian and pointer conversions but therefore goes through the (graph-like) data structure following owning and non-owning pointers. This resembels a breadth-first-search (BFS) on the stored data. In the second phase only the recurse and check_state functions are applied.
/**
 * STEP 1: Restore Scalars & Pointers.
 *
 * Convert endian for each scalar field of YourType
 * Use c.convert_endian(...) to do so.
 *
 * Convert raw pointers from the stored offset values
 * to absolute pointers.
 * This is only necessary in cista::raw mode.
 */
template <typename Ctx, typename T>
void convert_endian_and_ptr(Ctx const& c, YourType* el);
/**
 * STEP 2: Status Check.
 *
 * Called after scalars and pointers have been
 * restored. All members should be valid.
 *
 * Check if the data structure has a valid state.
 * E.g. for pointers:
 * they should either point to null
 * or to a valid address within the buffer
 * c.check_ptr() does this check.
 */
template <typename Ctx, typename T>
void check_state(Ctx const& c, YourType* el);
/**
 * STEP 3: Recursive Call.
 *
 * For containers like unique_ptr, vector, etc.
 * Tell the "main" method where to precede
 * with deserialization
 */
template <typename Ctx, typename T, typename Fn>
void recurse(Ctx& c, YourType* el, Fn&& fn);

Single Deserialize Function

If you do not need the cista::mode::DEEP_CHECK option, it is sufficient to implement a single deserialize function that does the combination of all three functions described above. Use is_mode_enabled(Ctx::MODE, mode::UNCHECKED) to optionally disable all bounds checks.

template <typename Ctx, typename T>
inline void deserialize(Ctx const& c, std::vector<T>* el);

Both functions (deserialize() and check()) are provided by the deserialization_context:

struct deserialization_context {
  /**
   * Computes and stores the absolute address value
   * of a stored raw pointer.
   * A raw pointer stores a offset relative to itself
   * that will be converted to a absolute address.
   *
   * \param ptr  the raw pointer
   */
  template <typename Ptr>;
  void deserialize_ptr(Ptr** ptr) const;

  /**
   * Checks whether the pointer points to
   * a valid memory address within the buffer
   * where at least `size` bytes are available.
   *
   * \param el    the memory address to check
   * \param size  the size to check for
   * \throws if there are bytes outside the buffer
   */
  template <typename T> void check_ptr(offset_ptr<T> el, size_t size = sizeof(T)) const;
  template <typename T> void check_ptr(T* el, size_t size = sizeof(T)) const;
};