Darren Kellenschwiler (deggen@kschw.com), Deggen
Tone Engel (tone@kizmet.org), TonesNotes
Ty Everett (ty@projectbabbage.com)
Damian Orzepowski (damian.orzepowski@4chain.studio)
Arkadiusz Osowski (arkadiusz.osowski@4chain.studio)
We propose the BSV Unified Merkle Path format in both binary and JSON encoding optimized for generation by transaction processors, and also happens to be convenient for proof validating clients.
At a high level the format encodes a number of txids which all exist within one particular block, along with each of their merkle paths and the blockHeight.
The blockHeight is encoded first, followed by level 0 of the Merkle tree, which includes the txids of interest, and their corresponding siblings. Thereafter we encode each level of the tree thereafter, but only include branches of the tree which are required to calculate the Merkle root the txids which are of interest to us. For example if we only have one txid of interest, we will include it and its sibling, followed by one leaf per level of the tree.
This BRC is licensed under the Open BSV license.
BUMP Showcase can help form an understanding of how BUMP works to encode all necessary data.
Several formats have made their own improvements to the original format which was returned by a Bitcoin node via json-rpc method getmerkleproof
.
Improvements include:
- BRC-10 a TSC creation which was subsequently returned by the node's json-rpc method
getmerkleproof2
- BRC-11 removing the need for specifying targets, replacing with height to improve validation speed.
- BRC-58 removal of all extraneous data to minimize data size.
- BRC-61 introduction of a compound path encoding which allows representation of multiple paths within the same block.
The purpose of defining this new specification is to capture the incremental improvements under one spec which encapsulates the pros of each, and removes the cons. This new spec should allow:
- Inclusion of height makes lookup extremely fast while only adding maximum 9 bytes to the data size.
- Multiple paths can be expressed in the same data model.
- One format for everything, so that there is no need to convert from single to compound path.
- Size optimization allowing us to skip encoding of far right leaves when duplication of working hash would suffice.
The top level encoding specifies a block height and a tree height.
Field | Description | Size |
---|---|---|
block height | VarInt block height in which the transactions are encapsulated |
1-9 bytes |
tree height | The height of the Merkle Tree in this block, max 64 | 1 byte |
Thereafter the number of leaves at the top height is specified, and the leaves for this height follow.
Field | Description | Size |
---|---|---|
nLeaves | VarInt number of leaves at this height |
1-9 bytes |
leaves | Each leaf encoded in the format below. | sum of leaf sizes |
Once all leaves at this height have been specified, an implied increment of the height in the tree occurs and we specify the number of leaves in the next level up, and so on until we have specified the leaves at level (treeHeight - 1) at which point we stop. We do not need to encode the root hash as it is always calculable.
Field | Description | Size |
---|---|---|
offset | VarInt offset from left hand side within tree |
1-9 bytes |
flags | Flags can be 00 , or 01 , or 02 - detailed meaning in table below |
1 byte |
hash | A hash representing a txid, sibling hash, or a branch | 0 or 32 bytes |
The first flag is to indicate whether or not to duplicate the working hash or use the following data. The second flag indicates whether the hash is a relevant txid or just a sibling hash.
bits | byte | meaning |
---|---|---|
0000 0000 | 00 | data follows, not a client txid |
0000 0001 | 01 | nothing follows, duplicate working hash |
0000 0010 | 02 | data follows, and is a client txid |
fe8a6a0c000c04fde80b0011774f01d26412f0d16ea3f0447be0b5ebec67b0782e321a7a01cbdf7f734e30fde90b02004e53753e3fe4667073063a17987292cfdea278824e9888e52180581d7188d8fdea0b025e441996fc53f0191d649e68a200e752fb5f39e0d5617083408fa179ddc5c998fdeb0b0102fdf405000671394f72237d08a4277f4435e5b6edf7adc272f25effef27cdfe805ce71a81fdf50500262bccabec6c4af3ed00cc7a7414edea9c5efa92fb8623dd6160a001450a528201fdfb020101fd7c010093b3efca9b77ddec914f8effac691ecb54e2c81d0ab81cbc4c4b93befe418e8501bf01015e005881826eb6973c54003a02118fe270f03d46d02681c8bc71cd44c613e86302f8012e00e07a2bb8bb75e5accff266022e1e5e6e7b4d6d943a04faadcf2ab4a22f796ff30116008120cafa17309c0bb0e0ffce835286b3a2dcae48e4497ae2d2b7ced4f051507d010a00502e59ac92f46543c23006bff855d96f5e648043f0fb87a7a5949e6a9bebae430104001ccd9f8f64f4d0489b30cc815351cf425e0e78ad79a589350e4341ac165dbe45010301010000af8764ce7e1cc132ab5ed2229a005c87201c9a5ee15c0f91dd53eff31ab30cd4
fe8a6a0c00 // blockHeight (813706), VarInt
0c // treeHeight (12), byte
// Level 0, client TXIDs and sibling TXIDs (TXIDs required only to compute internal tree hash).
04 // nLeaves, VarInt
fde80b // offset, VarInt
00 // flags
11774f01d26412f0d16ea3f0447be0b5ebec67b0782e321a7a01cbdf7f734e30 // hash
fde90b // offset VarInt
02 // flags = CLIENT_TXID
004e53753e3fe4667073063a17987292cfdea278824e9888e52180581d7188d8 // hash
fdea0b // offset VarInt
02 // flags = CLIENT_TXID
5e441996fc53f0191d649e68a200e752fb5f39e0d5617083408fa179ddc5c998 // hash
fdeb0b // offset VarInt
01 // flags = DUPLICATE_WORKING_HASH
// Level 1, internal merkle tree hashes
02 // nLeaves, VarInt
fdf405 // offset, VarInt
00 // flags
0671394f72237d08a4277f4435e5b6edf7adc272f25effef27cdfe805ce71a81 // hash
fdf505 // offset VarInt
00 // flags
262bccabec6c4af3ed00cc7a7414edea9c5efa92fb8623dd6160a001450a5282 // hash
// Level 2, internal merkle tree hashes
01 // nLeaves VarInt at level 2
fdfb02 // offset VarInt
01 // flags = DUPLICATE_WORKING_HASH
// Level 3, internal merkle tree hashes
01 // nLeaves VarInt at level 3
fd7c01 // offset VarInt (three hundred and eighty)
00 // flags
93b3efca9b77ddec914f8effac691ecb54e2c81d0ab81cbc4c4b93befe418e85 // hash
// Level 4, internal merkle tree hashes
01 // nLeaves VarInt at level 4
bf // offset VarInt
01 // flags = DUPLICATE_WORKING_HASH
// Level 5, internal merkle tree hashes
01 // nLeaves VarInt at level 5
5e // offset VarInt
00 // flags
5881826eb6973c54003a02118fe270f03d46d02681c8bc71cd44c613e86302f8 // hash
// Level 6, internal merkle tree hashes
01 // nLeaves VarInt at level 6
2e // offset VarInt
00 // flags
e07a2bb8bb75e5accff266022e1e5e6e7b4d6d943a04faadcf2ab4a22f796ff3 // hash
// Level 7, internal merkle tree hashes
01 // nLeaves VarInt at level 7
16 // offset VarInt
00 // flags
8120cafa17309c0bb0e0ffce835286b3a2dcae48e4497ae2d2b7ced4f051507d // hash
// Level 8, internal merkle tree hashes
01 // nLeaves VarInt at level 8
0a // offset VarInt
00 // flags
502e59ac92f46543c23006bff855d96f5e648043f0fb87a7a5949e6a9bebae43 // hash
// Level 9, internal merkle tree hashes
01 // nLeaves VarInt at level 9
04 // offset VarInt
00 // flags
1ccd9f8f64f4d0489b30cc815351cf425e0e78ad79a589350e4341ac165dbe45 // hash
// Level 10, internal merkle tree hashes
01 // nLeaves VarInt at level 10
03 // offset VarInt
01 // flags = DUPLICATE_WORKING_HASH
// Level 11, internal merkle tree hashes
01 // nLeaves VarInt at level 11
00 // offset VarInt
00 // flags
af8764ce7e1cc132ab5ed2229a005c87201c9a5ee15c0f91dd53eff31ab30cd4 // hash
In the JSON encoding - we start with a height of a block in which transactions from BUMP are mined. A path Array index corresponds to the height within the Merkle tree, so we start with level 0 which includes all of the txid's of interest to a client in this block and the txid's of additional transactions required to begin the merkle root computation. Within each array element we contain an array of one or more leaves which are specified as a leaf.
Within the leaf itself we have am offset
- the only required parameter, along with optional hash
, txid
and duplicate
. The hash
is a hex string encoding reversed bytes of the hash at this position in the Merkle tree, the duplicate
true is a boolean and represents a "no data" for this position, this is to encode for the right hand side of the merkle tree. The expected behavior is for a parser to duplicate the working hash in this case, therefore no further data is required. A txid
boolean is included if true - to indicate whether the hash in question is considered a relevant txid to the receiving party, rather than just a sibling hash needed to calculate the root.
{
"blockHeight": 813706,
"path": [
[
{
"offset": 3048,
"hash": "304e737fdfcb017a1a322e78b067ecebb5e07b44f0a36ed1f01264d2014f7711"
},
{
"offset": 3049,
"txid": true,
"hash": "d888711d588021e588984e8278a2decf927298173a06737066e43f3e75534e00"
},
{
"offset": 3050,
"txid": true,
"hash": "98c9c5dd79a18f40837061d5e0395ffb52e700a2689e641d19f053fc9619445e"
},
{
"offset": 3051,
"duplicate": true
}
],
[
{
"offset": 1524,
"hash": "811ae75c80fecd27efff5ef272c2adf7edb6e535447f27a4087d23724f397106"
},
{
"offset": 1525,
"hash": "82520a4501a06061dd2386fb92fa5e9ceaed14747acc00edf34a6cecabcc2b26"
}
],
[
{
"offset": 763,
"duplicate": true
}
],
[
{
"offset": 380,
"hash": "858e41febe934b4cbc1cb80a1dc8e254cb1e69acff8e4f91ecdd779bcaefb393"
}
],
[
{
"offset": 191,
"duplicate": true
}
],
[
{
"offset": 94,
"hash": "f80263e813c644cd71bcc88126d0463df070e28f11023a00543c97b66e828158"
}
],
[
{
"offset": 46,
"hash": "f36f792fa2b42acfadfa043a946d4d7b6e5e1e2e0266f2cface575bbb82b7ae0"
}
],
[
{
"offset": 22,
"hash": "7d5051f0d4ceb7d2e27a49e448aedca2b3865283ceffe0b00b9c3017faca2081"
}
],
[
{
"offset": 10,
"hash": "43aeeb9b6a9e94a5a787fbf04380645e6fd955f8bf0630c24365f492ac592e50"
}
],
[
{
"offset": 4,
"hash": "45be5d16ac41430e3589a579ad780e5e42cf515381cc309b48d0f4648f9fcd1c"
}
],
[
{
"offset": 3,
"duplicate": true
}
],
[
{
"offset": 0,
"hash": "d40cb31af3ef53dd910f5ce15e9a1c20875c009a22d25eab32c11c7ece6487af"
}
]
]
}
Let's start by dumping this format as hex into a Buffer in JavaScript and parsing it into an object with a Buffer Reader. Then we can calculate the merkle root from any of the included txids.
const { createHash } = require('crypto')
const { Br, Bw } = require('bsv')
// Displaying hashes as hex strings in reverse byte order is a matter of convention with respect to txids.
// The functions below handle the conversions such that when we "hash()" something, we are running sha256 -
// digesting the reverse bytes of a hex string, and returning the reverse bytes encoded as a hex string.
const hexRevToBuf = (str) => Buffer.from(str, 'hex').reverse()
const bufRevToHex = (buf) => Buffer.from(buf.toString('hex'), 'hex').reverse().toString('hex')
const hash = (str) => bufRevToHex(createHash('sha256').update(createHash('sha256').update(hexRevToBuf(str)).digest()).digest())
function bumpHexToJSON(str) {
const reader = new Br()
reader.buf = Buffer.from(str, 'hex')
let blockHeight = reader.readVarIntNum()
let treeHeight = parseInt(reader.read(1).toString('hex'), 16)
let path = Array(treeHeight).fill(0).map(() => ([]))
let flags, offset, nLeavesAtThisHeight
for (let level = 0; level < treeHeight; level++) {
nLeavesAtThisHeight = reader.readVarIntNum()
while (nLeavesAtThisHeight) {
offset = reader.readVarIntNum()
flags = parseInt(reader.read(1).toString('hex'), 16)
const leaf = { offset }
if (flags & 1) {
leaf.duplicate = true
} else {
if (flags & 2) leaf.txid = true
leaf.hash = reader.read(32).reverse().toString('hex')
}
path[level].push(leaf)
nLeavesAtThisHeight--
}
path[level].sort((a, b) => a.offset - b.offset)
}
return { blockHeight, path }
}
function bumpJSONtoHex({ blockHeight, path }) {
const bw = new Bw()
bw.writeVarIntNum(blockHeight)
let treeHeight = path.length
bw.writeUInt8(treeHeight)
for (let level = 0; level < treeHeight; level++) {
let nLeaves = Object.keys(path[level]).length
bw.writeVarIntNum(nLeaves)
for (const leaf of path[level]) {
bw.writeVarIntNum(leaf.offset)
let flags = 0
if (!!leaf?.duplicate) flags |= 1
if (!!leaf?.txid) flags |= 2
bw.writeUInt8(flags)
if ((flags & 1) === 0)
bw.write(hexRevToBuf(leaf.hash))
}
}
return bw.toBuffer().toString('hex')
}
function calculateMerkleRootFromBUMP(bump, txid) {
// Find the index of the txid at the lowest level of the Merkle tree
const index = bump.path[0].find(l => l.hash === txid).offset
if (!index) throw Error(`The BUMP does not contain the txid: ${txid}`)
// Calculate the root using the index as a way to determine which direction to concatenate.
let workingHash = txid
bump.path.map((leaves, height) => {
const offset = index >> height ^ 1
const leaf = leaves.find(l => l.offset === offset)
if (!leaf) throw new Error(`We do not have a hash for this index at height: ${height}`)
if (leaf.duplicate) {
workingHash = hash(workingHash + workingHash)
} else if (offset % 2) {
workingHash = hash(leaf.hash + workingHash)
} else {
workingHash = hash(workingHash + leaf.hash)
}
})
return workingHash
}
const bump = bumpHexToJSON('fe8a6a0c000c04fde80b0011774f01d26412f0d16ea3f0447be0b5ebec67b0782e321a7a01cbdf7f734e30fde90b02004e53753e3fe4667073063a17987292cfdea278824e9888e52180581d7188d8fdea0b025e441996fc53f0191d649e68a200e752fb5f39e0d5617083408fa179ddc5c998fdeb0b0102fdf405000671394f72237d08a4277f4435e5b6edf7adc272f25effef27cdfe805ce71a81fdf50500262bccabec6c4af3ed00cc7a7414edea9c5efa92fb8623dd6160a001450a528201fdfb020101fd7c010093b3efca9b77ddec914f8effac691ecb54e2c81d0ab81cbc4c4b93befe418e8501bf01015e005881826eb6973c54003a02118fe270f03d46d02681c8bc71cd44c613e86302f8012e00e07a2bb8bb75e5accff266022e1e5e6e7b4d6d943a04faadcf2ab4a22f796ff30116008120cafa17309c0bb0e0ffce835286b3a2dcae48e4497ae2d2b7ced4f051507d010a00502e59ac92f46543c23006bff855d96f5e648043f0fb87a7a5949e6a9bebae430104001ccd9f8f64f4d0489b30cc815351cf425e0e78ad79a589350e4341ac165dbe45010301010000af8764ce7e1cc132ab5ed2229a005c87201c9a5ee15c0f91dd53eff31ab30cd4')
calculateMerkleRootFromBUMP(bump, '304e737fdfcb017a1a322e78b067ecebb5e07b44f0a36ed1f01264d2014f7711') // '57aab6e6fb1b697174ffb64e062c4728f2ffd33ddcfa02a43b64d8cd29b483b4'
calculateMerkleRootFromBUMP(bump, 'd888711d588021e588984e8278a2decf927298173a06737066e43f3e75534e00') // '57aab6e6fb1b697174ffb64e062c4728f2ffd33ddcfa02a43b64d8cd29b483b4'
calculateMerkleRootFromBUMP(bump, '98c9c5dd79a18f40837061d5e0395ffb52e700a2689e641d19f053fc9619445e') // '57aab6e6fb1b697174ffb64e062c4728f2ffd33ddcfa02a43b64d8cd29b483b4'
bumpJSONtoHex(bump)
A note on compounding multiple BUMPs together. The first check should always be the blockHeight - ensure it matches. The second check is the root. Each BUMP calculates its root, and if they don't match - you cannot combine them. If they match then the process is a simple inclusion of all leaves, dropping duplicates.
function combinePaths(one, two) {
if (one.blockHeight !== two.blockHeight)
throw Error('You cannot combine paths which do not have the same blockHeight.')
const txid1 = one.path[0].find(leaf => !!leaf?.hash).hash
const root1 = calculateMerkleRootFromBUMP(one, txid1)
const txid2 = two.path[0].find(leaf => !!leaf?.hash).hash
const root2 = calculateMerkleRootFromBUMP(two, txid2)
if (root1 !== root2)
throw Error('You cannot combine paths which do not have the same root.')
const combinedPath = []
for (let h = 0; h < one.path.length; h++) {
combinedPath.push([])
for (let l = 0; l < one.path[h].length; l++) {
combinedPath[h].push(one.path[h][l])
}
for (let l = 0; l < two.path[h].length; l++) {
if (!combinedPath[h].find(leaf => leaf.offset === two.path[h][l].offset)) {
combinedPath[h].push(two.path[h][l])
} else {
// Ensure that any elements which appear in both are not downgraded to a non txid.
if (!!two.path[h][l]?.txid)
combinedPath[h].find(leaf => leaf.offset === two.path[h][l]).txid = true
}
}
}
return { blockHeight: one.blockHeight, path: combinedPath }
}
Golang - go-bc