LIP: 0040
Title: Define state model and state root
Author: Alessandro Ricottone <alessandro.ricottone@lightcurve.io>
Discussions-To: https://research.lisk.com/t/state-model-and-state-root
Status: Active
Type: Informational
Created: 2021-05-22
Updated: 2024-01-04
Requires: 0039
The state of an interoperable chain in the Lisk ecosystem is maintained in a global state store. Entries of the state store are inserted in a sparse Merkle tree, the state tree. The whole state is thus authenticated by the tree Merkle root, the state root. In this LIP, we define the state model of a Lisk blockchain and the construction of the state tree from which the state root is calculated.
This LIP is licensed under the Creative Commons Zero 1.0 Universal.
A blockchain created with the Lisk SDK is organized in a modular way: each module registered in the chain defines its own state and the possible state transitions, i.e., the transactions defined within the module or the reducers that can be called by other modules.
The chain also maintains a global state store, a collection of key-value pairs defining the state of the blockchain. Following the modular architecture, the state store is further split into several module stores, collections of key-value pairs defined within the state specific to the module. This is achieved by imposing a specific format for keys in the global state store: Each key is given by the concatenation of the module store prefix, a substore prefix, and a store key. A module store is thus the collection of key-value pairs whose keys share the same module store prefix (which identifies the specific module store).
The state tree is the sparse Merkle tree built on top of the state store. Organizing the state of a blockchain in a Merkle tree allows to cryptographically authenticate the whole state with a single hash, the state root. The state root property is calculated at the end of the block processing as the Merkle root of the state tree and included in the block header. Information from the block header is then used to create a certificate and signed by the chain validators.
Certificates are posted to another chain as part of a cross-chain update transaction. The state root is used to authenticate the information contained in it. For example, a node can verify that the cross-chain messages in the cross-chain update payload have been included in the outbox of the sending chain by recomputing the state root signed in the certificate from the outbox root. This is done using a "witness", i.e. an array of hashes used to recompute the root starting from the known elements of the tree.
In the future, the state root could further facilitate the introduction of light clients. Light clients would only check the validity of block headers, without fully verifying every transaction in a block. The state root can then be used to verify the correctness of a portion of the state, such as a user balance, requested by the light client from a full node.
Furthermore, the state root could be used for faster synchronization: a node could download a snapshot of the blockchain at a block close to the tip, therefore synchronizing quickly to the last state, while in the background it downloads and verifies blocks in the past. In this case, the state root would be used to verify the validity of the snapshot.
Finally, the state root can be employed to improve the regenesis protocol. A node could download the state root from a trusted party and then check that the genesis block of a chain defines a state that corresponds to the trusted state root.
This LIP specifies the general format for the module stores, the way in which the key-value pairs are inserted in the state tree, and the calculation of the state root.
Key-value pairs of the global state store logically belong to a specific module, which defines the protocol logic responsible for updating, inserting, and deleting them. In order to group key-value pairs together, each key belonging to a certain module contains the module store prefix as a fixed-length prefix. The collection of key-value pairs belonging to a certain module, i.e. with the same module store prefix, forms the module store.
Similarly, within the module store we require the key to contain a fixed-length prefix called substore prefix. The substore prefix identifies a substore within the module store, i.e. a group of key-value pairs with the same scope and typically values in the same format. Within a module store, there is no a-priori separation between user and chain accounts, as they are treated as any generic key-value pair, and the separation is obtained by choosing a different substore prefix. Finally, we call the final part of the key, i.e. the key without the store and substore prefixes, the store key. The store key identifies different key-value pairs within a substore.
To summarize, keys of the state store are obtained by the concatenation of the store prefix, the substore prefix, and the store key.
Key-value pairs of the global state store are inserted in a sparse Merkle tree (introduced in LIP 0039), the state tree. The state root is then set to the root of the state tree.
Each key-value pair is inserted in the state tree as a new leaf node. Keys of the leaf nodes are obtained by concatenating the store prefix, the substore prefix, and the hash of the store key. Prepending the store prefix separates the state tree in smaller subtrees (one per module, see Figure 1). Similarly, the substore prefix ensures that a group of key-value pairs within one substore lies in the same subtree.
Store keys are hashed in order to randomize the key of the leaf node, and as a consequence its position. This effectively mitigates spam attacks that aim at creating many key-value pairs (e.g., accounts) in a certain key neighborhood (e.g., close to the target account address), in order to increase the length of inclusion proofs for that specific key (in our SMT implementation).
Given B
substores in a module store and N
entries in the substore, the module subtree will contain on average Log(B) + Log(N)
layers. The expected number of operations needed to validate a witness from a leaf to the state root is 8 * STORE_PREFIX_LENGTH
(for the store prefix), plus 8 * SUBSTORE_PREFIX_LENGTH
(for the substore prefix), plus Log(N)
for the module subtree. Empty hashes are not included in the witness, and the resulting witness size is therefore Log(M) + Log(B) + Log(N)
, where M
indicates the number of modules registered in the chain.
The module store prefix is calculated automatically from the first STORE_PREFIX_LENGTH
bytes of the hash of the module name, where the first bit of the store prefix is set to 0.
On the other hand, the Interoperability module store prefix is set to a constant, with the first bit set to 1.
This is done to separate the Interoperability module from other modules in the state tree and make the state tree unbalanced, to shorten inclusion proofs for the Interoperability module store (required for cross-chain update commands).
When a chain is started and modules are registered, collisions between different module store prefixes are checked and must resolved (by changing module names).
We define the following constants:
Name | Type | Value | Description |
---|---|---|---|
STORE_PREFIX_LENGTH |
uint32 | 4 | Length in bytes of a store prefix. |
SUBSTORE_PREFIX_LENGTH |
uint32 | 2 | Length in bytes of a substore prefix. |
Figure 1: The general structure of the state sparse Merkle tree for a Lisk blockchain using two application-specific modules. The state root is the Merkle root. Each module defines its own module store. The keys of the leaf nodes start with the store prefixes, so that each module subtree is separated from the others. Not all modules are depicted in this example.
The state root is the root of a sparse Merkle tree (specified in LIP 0039), the state tree. Each registered module of the chain defines the key-value entries of its own module store. Values in the module store can be arbitrary byte arrays, but they typically correspond to data structures serialized according to LIP 0027. The keys of the corresponding leaf nodes are given by the concatenation of the store prefix, substore prefix, and the (32 bytes long) SHA-256 hash of the store key. Store prefixes are bytes values of length STORE_PREFIX_LENGTH
. Substore prefixes are bytes values of length SUBSTORE_PREFIX_LENGTH
. This implies that a leaf-node key of the state tree has a fixed length of STORE_PREFIX_LENGTH + SUBSTORE_PREFIX_LENGTH + 32
bytes.
In summary, for an entry in the global state store identified by store prefix storePrefix
, substore prefix substorePrefix
, store key storeKey
, and store value storeValue
, the corresponding leaf node is added to the state tree using SMT.update(storePrefix + substorePrefix + hash(storeKey), hash(storeValue))
. This leaf node will have the following properties:
leaf.key = storePrefix + substorePrefix + sha256(storeKey),
leaf.value = sha256(storeValue),
leaf.hash = leafHash(leaf.key + leaf.value),
where the leafHash
function is defined in LIP 0039.
The state root at a certain height h
, stateRoot(h)
, is the root of the sparse Merkle tree computed from all key-value pairs of all module stores (as described above), after processing the block at height h
.
The module store prefix is calculated from the module names as follows.
def module_name_to_store_prefix(module_name: str) -> bytes:
store_prefix = sha256(module_name.encode())[:STORE_PREFIX_LENGTH]
# Set first bit of module_id to 0.
store_prefix[0] &= 0x7F
return store_prefix
This LIP is purely informational, hence it does not imply any incompatibilities per se.