Skip to content

Custom (De )Serialization Functions

Felix Gündling edited this page Aug 3, 2022 · 21 revisions

For all standard layout, non-polymorphic aggregate types you do not have to write a custom function. However, when you have 3rd-party structs, structs with constructors or RAII structs that manage memory, etc. you need to override the serialize and deserialize functions.

Serialization

By default, every value gets copied raw byte-by-byte. For each type you would like serialize with a custom function, you need to override the following function for your type.

This function will be called with

  • The serialization context (described below). It provides functions to write to the buffer and to translate pointers to offsets.
  • A pointer to the original (i.e. not the serialized!) value of your struct.
  • The offset value where the value of YourType has been copied to. You can use this information to adjust certain members of YourType. For example: ctx.write(pos + offsetof(cista::string, h_.ptr_), cista::convert_endian<Ctx::MODE>(start)). This overrides the pointer contained in cista::string with the offset, the bytes have been copied to by calling start = c.write(orig->data(), orig->size(), 1).
template <typename Ctx>;
void serialize(Ctx&, YourType const*, cista::offset_t const);

The Ctx parameter is templated to support different serialization targets (e.g. file and buffer). Ctx provides the following members:

struct serialization_context {
  /**
   * Writes the values at [ptr, ptr + size[ to
   * the end of the serialization target buffer.
   * Adjusts for alignment if needed and returns
   * the new (aligned) offset the value was written to.
   *
   * Appends to the buffer (resize).
   *
   * \param ptr         points to the data to write
   * \param size        number of bytes to write
   * \param alignment   the alignment to consider
   * \return the alignment adjusted offset
   */
  offset_t write(void const* ptr, offset_t const size,
                 offset_t alignment = 0);

  /**
   * Overrides the value at `pos` with value `val`.
   *
   * Note: does not append to the buffer.
   * The position `pos` needs to exist already
   * and provide enough space to write `val`.
   *
   * \param pos  the position to write to
   * \param val  the value to copy to position `pos`
   */
  template <typename T>;
  void write(offset_t const pos, T const& val);
};

Important: As soon, as you decide to implement your own custom serialization function, you are on your own and need to make sure, serialization functions are called for all members of your type.

Example:

namespace data = cista::offset;

struct my_type {
  my_type(int);  // makes my_type non-aggregate,
                 // disabling cista's auto mode
  data::vector<my_type*> paths_;  // member with pointers
                                  // that need to be serialized
};

template <typename Ctx>
inline void serialize(Ctx& context, my_type const* el, const cista::offset_t offset) {
  using cista::serialize;
  serialize(context, &el->paths_, offset + offsetof(my_type, paths_));
  // call serialize on all members!
  // always keep this up-to-date as you add member variables!
}

template <typename Ctx>
inline void deserialize(Ctx const& context, my_type* el) {
  using cista::deserialize;
  deserialize(context, &el->paths_);
  // call deserialize on all members!
  // always keep this up-to-date as you add member variables!
}

If you don't, you'll see issues as described in https://github.com/felixguendling/cista/issues/139

For all members that require custom serialization (including raw or pointers), call the serialize() function recursively.

Deserialization

For custom deserialization, there are two options:

Split Deserialization Functions

Not all three functions need to be implemented. There are generic default overloads for every function.

Deserialization is split into three phases for every data structure. This is necessary because in the cista::mode::DEEP_CHECK mode (which is required when reading untrusted data), a two phase approach is used:

  • The first phase goes decends recursivley through the (tree-like) owning memory hiearchy.
  • The second phase skips the endian and pointer conversions but therefore goes through the (graph-like) data structure following owning and non-owning pointers. This resembels a breadth-first-search (BFS) on the stored data. In the second phase only the recurse and check_state functions are applied.
/**
 * STEP 1: Restore Scalars & Pointers.
 *
 * Convert endian for each scalar field of YourType
 * Use c.convert_endian(...) to do so.
 *
 * Convert raw pointers from the stored offset values
 * to absolute pointers.
 * This is only necessary in cista::raw mode.
 */
template <typename Ctx, typename T>
void convert_endian_and_ptr(Ctx const& c, YourType* el);
/**
 * STEP 2: Status Check.
 *
 * Called after scalars and pointers have been
 * restored. All members should be valid.
 *
 * Check if the data structure has a valid state.
 * E.g. for pointers:
 * they should either point to null
 * or to a valid address within the buffer
 * c.check_ptr() does this check.
 */
template <typename Ctx, typename T>
void check_state(Ctx const& c, YourType* el);
/**
 * STEP 3: Recursive Call.
 *
 * For containers like unique_ptr, vector, etc.
 * Tell the "main" method where to precede
 * with deserialization
 */
template <typename Ctx, typename T, typename Fn>
void recurse(Ctx& c, YourType* el, Fn&& fn);

Single Deserialize Function

If you do not need the cista::mode::DEEP_CHECK option, it is sufficient to implement a single deserialize function that does the combination of all three functions described above. Use is_mode_enabled(Ctx::MODE, mode::UNCHECKED) to optionally disable all bounds checks.

template <typename Ctx, typename T>
inline void deserialize(Ctx const& c, std::vector<T>* el);

The following functions are provided by the Ctx type:

struct deserialization_context {
  /**
   * Computes and stores the absolute address value
   * of a stored raw pointer.
   * A raw pointer stores a offset relative to itself
   * that will be converted to a absolute address.
   *
   * \param ptr  the raw pointer
   */
  template <typename Ptr>;
  void deserialize_ptr(Ptr** ptr) const;

  /**
   * Checks whether the pointer points to
   * a valid memory address within the buffer
   * where at least `size` bytes are available.
   *
   * \param el    the memory address to check
   * \param size  the size to check for
   * \throws if there are bytes outside the buffer
   */
  template <typename T> void check_ptr(offset_ptr<T> el, size_t size = sizeof(T)) const;
  template <typename T> void check_ptr(T* el, size_t size = sizeof(T)) const;

  /**
   * Checks whether a bool has a valid value (0 or 1).
   * Accessing invalid values is undefined behaviour.
   *
   * \param b  the boolean value to check
   */
  static void check_bool(bool const& b);

  /**
   * Throws a std::runtime_error{msg}
   * if `condition` is false.
   *
   * \param condition  the condition to check
   * \param msg        the exception message
   */
  require(bool condition, char const* msg) const;

  /**
   * Does endian conversion if necessary
   * depending on the mode.
   * Apply this to all serialized scalar values.
   *
   * \param el  scalar value to convert.
   */
  void convert_endian(T& el) const;
};

For all members that require custom deserialization (including raw or pointers), call the deserialize() function recursively.