Configurable Hash Functions #7111

illuzen · 2025-01-10T10:32:31Z

Is there an existing issue?

I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

This is not a support question.

Motivation

I want to change the internal hash functions of substrate to poseidon in order to make it zk-friendly. Currently it seems the only way to do that for Accounts, Events, Extrinsics, Blocks and entropy is to directly modify the frame-system pallet, which is not-ideal. Blake2 and Twox64Concat are hardcoded into several places despite almost always having access to T:Config, which allows configurability. So you could replace Blake2_256 with T::Hashing in most places.

Request

Please make the internal hash functions Blake2 and Twox64Concat configurable via setting Config in the runtime so I don't have to mod frame-system.

The one I'm not sure about is pub fn unique(entropy: impl Encode) -> [u8; 32], since it doesn't have access to the Config object. Maybe just put it in the pallet module.

Solution

	#[pallet::getter(fn account)]
	pub type Account<T: Config> = StorageMap<
		_,
	        T::Hashing,
		T::AccountId,
		AccountInfo<T::Nonce, T::AccountData>,
		ValueQuery,
	>;
	
		/// Map of block numbers to block hashes.
	#[pallet::storage]
	#[pallet::getter(fn block_hash)]
	pub type BlockHash<T: Config> =
		StorageMap<_, T::Hashing, BlockNumberFor<T>, T::Hash, ValueQuery>;

	/// Extrinsics data for the current block (maps an extrinsic's index to its data).
	#[pallet::storage]
	#[pallet::getter(fn extrinsic_data)]
	#[pallet::unbounded]
	pub(super) type ExtrinsicData<T: Config> =
		StorageMap<_, T::Hashing, u32, Vec<u8>, ValueQuery>;

	#[pallet::storage]
	#[pallet::unbounded]
	#[pallet::getter(fn event_topics)]
	pub(super) type EventTopics<T: Config> =
		StorageMap<_, T::Hashing, T::Hash, Vec<(BlockNumberFor<T>, EventIndex)>, ValueQuery>;

Are you willing to help with this request?

Yes!

The text was updated successfully, but these errors were encountered:

bkchr · 2025-01-10T11:50:18Z

The hash functions you are mentioning there are only used to calculate the storage keys. Why do you need to use a different hash function there?

I think this is a xyproblem. So, you want to use ZK, but for what exactly?

illuzen · 2025-01-10T11:57:52Z

The main use case is for EIP-7503 style ZK-Wormhole. This requires forming a zk-circuit out of state proofs, which, if I'm not mistaken, involves calculating storage keys.

Is there a good reason not to make these hash functions configurable? @bkchr

burdges · 2025-01-10T19:59:02Z

In principle, you could use any hash function you like for your chain's data, but parachain infrastructure needs substrate's regular storage and hash function. In practice, our DSL infrastructure makes doing your own thing difficult.

At least for our envisioned identity stuff, we'd want an append only Poseidon MMR for whole blocks, and then another Poseidon Merkle tree within blocks, but being append only makes this simpler than maintaining some actual state with updates.

We've much interest in making NOMT work in substrate, which basically brings exactly the same problems, so maybe one should figure out who is making progress there? JAM involves some new storage proposal too, but afaik it mnight lack some of the important optimizations in NOMT etc.

illuzen · 2025-01-11T02:13:17Z

Seems like, if you want to support both parachains and solochains, you could make a separate types for the solochain-only configs and just clearly label them as such. Maybe T::Hashing is too broad, could do something more specific like T::EventHashing and T::ExtrinsicHashing.

We will just fork frame-system if necessary. I don't see anything close to what Substrate has accomplished in terms of a build-your-own-blockchain-library.

gui1117 · 2025-01-11T14:44:13Z

What is the issue for NOMT usage in substrate, only the hash choice? Or the whole storage type architecture? (Key being: 128 bit for pallet name + 128 bit for storage item name + choosen hash and encoding for the key that can be very long as it can include the key itself)

I think we can bring more flexibility and configurability.

burdges · 2025-01-11T15:35:03Z

Afaik NOMT hasn't even optimized their hashing: Ideally, you'd invoke the bare blake3 compression function, which gives really nice SIMD implementations, and then provide some new leaf hash.

In substrate, there is a string based lookup so that related prefixes show up together, which reduces PoV size. In NOMT there is a single 32 byte (?) key, of which yes you can shave off a few bits, but mostly you'd want complex accounts records to be substrucutres, maybe subtrees, or maybe blake3 encoded bytes, since you can do a read or edit proof into blake3 hashed data, or maybe accounts should be bytes in a designated tree-like format ala JSON, but employ some structure aware hashing. It's lots of work doing this nicely.

We will just fork frame-system if necessary

Are you sure you do not need append only here?

illuzen · 2025-01-12T01:35:17Z

Are you sure you do not need append only here?

We are storing nullifiers on chain permanently (in the latest state) to prevent double-spends, which I would describe as append-only, and the sequence of finalized blocks is also append-only, but you seem to be talking about a different layer?

Using the Merkle-Patricia-Trie for storage doesn't seem to interfere with our use case, but I would be curious to hear more.

illuzen · 2025-01-14T07:36:03Z

I think we can bring more flexibility and configurability.

We're happy to support this if you can suggest the boundary conditions for the edits. @gui1117

bkchr · 2025-01-14T10:18:10Z

The main use case is for EIP-7503 style ZK-Wormhole. This requires forming a zk-circuit out of state proofs, which, if I'm not mistaken, involves calculating storage keys.

If you want to build a zk storage proof, you can compute the keys "offline" and input them into the prover. No need to compute them inside the circuit?

bkchr · 2025-01-14T10:21:16Z

Afaik NOMT hasn't even optimized their hashing: Ideally, you'd invoke the bare blake3 compression function, which gives really nice SIMD implementations, and then provide some new leaf hash.

I don't see how the request here relates to NOMT. We need the key structure that we have to support iterations, reconstructing the keys and similar. However, NOMT doesn't really care how the key is being build. The problem there right now is that they don't support iterating the storage, but this can probably be build on top of it. Still we are talking here about the hash that is being used to generate the KEYs and not the hash that is being used to build the trie.

illuzen added the I5-enhancement An additional feature request. label Jan 10, 2025

github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable Hash Functions #7111

Configurable Hash Functions #7111

illuzen commented Jan 10, 2025

bkchr commented Jan 10, 2025

illuzen commented Jan 10, 2025

burdges commented Jan 10, 2025

illuzen commented Jan 11, 2025

gui1117 commented Jan 11, 2025

burdges commented Jan 11, 2025

illuzen commented Jan 12, 2025

illuzen commented Jan 14, 2025

bkchr commented Jan 14, 2025

bkchr commented Jan 14, 2025

Configurable Hash Functions #7111

Configurable Hash Functions #7111

Comments

illuzen commented Jan 10, 2025

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Motivation

Request

Solution

Are you willing to help with this request?

bkchr commented Jan 10, 2025

illuzen commented Jan 10, 2025

burdges commented Jan 10, 2025

illuzen commented Jan 11, 2025

gui1117 commented Jan 11, 2025

burdges commented Jan 11, 2025

illuzen commented Jan 12, 2025

illuzen commented Jan 14, 2025

bkchr commented Jan 14, 2025

bkchr commented Jan 14, 2025