Skip to content

Serialization Reference

Felix Gündling edited this page Nov 6, 2019 · 4 revisions

Data Structures

The following data structures exist in the namespaces cista::offset and cista::raw:

  • vector<T>: serializable version of std::vector<T>
  • string: serializable version of std::string
  • unique_ptr<T>: serializable version of std::unique_ptr<T>
  • hash_map<K, V>: serializable version of std::unordered_map (using Google's Swiss Table)
  • ptr<T>: serializable pointer: cista::raw::ptr<T> is just a T*, cista::offset::ptr<T> is a specialized data structure that behaves mostly like a T* (overloaded ->, *, etc. operators).

Currently, they do not provide exactly the same interface as their std:: equivalents.

Pointers

A cista::ptr<T> can only point to null or to a value stored in the serialized buffer. Pointing to a value within the serialized buffer requires that the offset it was written at is known at serialization time.

There are three ways to index an address in order to serialize a pointer to it:

  • cista::unique_ptr<T>: Every cista::unique_ptr<T> will be indexed. Thus, pointing to values held by a cista::unique_ptr<T> is possible.
  • cista::indexed_vector<T>: Within a cista::indexed_vector<T>, every value can be referenced. This is more efficient than a cista::vector<cista::unique_ptr<T>>. However, cista::vector<T> and cista::indexed_vector<T> do not provide pointer stability after non-const operations such as resize, or emplace_back.
  • cista::indexed<T>: To be able to point to the value of member variables, it is possible to use cista::indexed<T>. cista::indexed<T> inherits from T and thus can be used just like a T.

An example using cista::indexed_vector<T> and cista::indexed<T>:

namespace data = cista::offset;

struct node;

struct edge {
  data::ptr<node> from_;
  data::ptr<node> to_;
};

struct node {
  uint32_tid_{0};
  data::vector<data::ptr<edge>> edges_;
  cista::indexed<data::string> name_;
};

struct graph {
  data::indexed_vector<node> nodes_;
  data::indexed_vector<edge> edges_;
  data::vector<data::ptr<data::string>> node_names_;
};

Serialization and Deserialization Functions

Mode

Serialization and deserialiazation have to use the same mode. This can be ensured by storing the mode in a constexpr variable. This variable can then be passed to cista::serialize() and cista::deserialize().

The cista::mode enum provides the following values:

  • NONE - default mode (default values are listed below)
  • UNCHECKED - do no bounds checks for types (only affects deserialization)
  • WITH_VERSION - store the data structure version (8 byte), default value: off
  • WITH_INTEGRITY - store a hash sum of the serialized data (8 byte), default value: off
  • SERIALIZE_BIG_ENDIAN - use big endian format when serializing (default: little endian)
  • DEEP_CHECK - apply deep checking for security (only affects deserialization)
  • CAST - casts the buffer pointer (with compile time checks that the buffer stays unmodified: no endian conversion and only offset pointer data structures)

The stored data structure version (cista::mode::WITH_VERSION) and hash sum (cista::mode::WITH_INTEGRITY) are checked at deserialization (if available).

Note that you cannot store the integrity checksum and/or data structure version and omit the flag at deserialization because they affect where the actual data starts.

These values work as a bit mask.

Example:

constexpr auto const MODE = cista::mode::WITH_VERSION |
                            cista::mode::WITH_INTEGRITY |
                            cista::mode::DEEP_CHECK;

Serialization

The following methods can be used to serialize either to a std::vector<uint8_t> (default) or to an arbitrary serialization target.

  • std::vector<uint8_t> cista::serialize<mode Mode = mode::NONE, T>(T const&) serializes an object of type T and returns a buffer containing the serialized object.
  • void cista::serialize<mode const Mode = mode::NONE, Target, T>(Target&, T const&) serializes an object of type T to the specified target. Targets are either cista::buf<Buf> (where Buf can either be a simple std::vector<uint8_t> or a cista::mmap) or cista::file. Custom target structs should provide write functions as described here.

Deserialization

The following functions exist in cista::offset and cista::raw:

  • T* deserialize<T, cista::mode Mode = cista::mode::NONE, Container>(Container&) deserializes an object from a std::vector<uint8_t> or similar data structure. This function throws a std::runtimer_error if the data is not well-formed.
  • T* deserialize<T, cista::mode Mode = cista::mode::NONE>(uint8_t* from, uint8_t* to) deserializes an object from a pointer range. This function throws a std::runtimer_error if the data is not well-formed.
  • reinterpret_cast<T>(ptr): If you are using offset mode and the machine endian format is the same as the serialized one, you may as well just call call reinterpret_cast<T>(ptr).