[DISCUSSION] - Comments on BUMP #89

kyuupichan · 2024-01-17T17:57:32Z

kyuupichan
Jan 17, 2024

BRC ID

74

Discussion

Thanks for creating ths spec; I found the TSC spec quite dissatisfactory and it seems others did too.

My understanding is that the BUMP format is intended for use when SPV wallets share merkle proofs with each other during transaction negotiation, and also for when SPV wallets receive proofs from miners or third party services. It is also likely a useful format for a wallet to save proofs in its own database for later use. My comments below are with this is mind.

After digesting the spec, my first impression was that there should be no need for the flags parameter if approached a little differently. Further, the spec does not cover ambiguities, such as if an offset appears twice in a level, and an unknown flag. The included sample code, and reference implementation at https://github.com/libsv/go-bc, does not appear to handle some problematic scenarios. I haven't learnt the go language so please be kind if I have misunderstood the code!

First, the flags parameter. I cannot understand the distinction between client and non-client txids. A valid BUMP should be considered a proof for all transaction hashes in the first level. The hashes in other levels are obviously not transaction hashes. So why distinguish? Secondly, if the block transaction count is provided (similarly to the height), then it is always known if a given hash should be duplicated or not, removing the need for that flag too. Specifying the tx count makes the "tree height" entry redundant, so perhaps it should replace it? That was my initial idea. I believe in all cases this would make a more compact format (certainly in JSON, and almost certainly in binary too). The block transaction count can be useful metadata for a wallet too.

Next, as BUMPs are being provided to wallets from untrusted third parties, an implementation should be able to detect malicious or erroneous data; indeed the entire BUMP data structure should be verifiable. Implementations are helped if the spec indicates how ambiguities and corner cases should be handled (I give a list of these and my suggested handling below). The sample code does not appear to handle leaves in the BUMP that are unnecessary for the proof (and therefore are unchecked) or the same offset being present more than once with different hashes.

The spec discusses a merge operation; this is useful for a wallet merging BUMPs for different groups of transactions in the same block. Consider the combinePaths function shown in the spec, merging two bumps. Suppose at least one of them is a malicious BUMP, that proves what it was supposed to prove, but has ambiguous or extraneous leaves included, so its malicious nature is not initially apparent. I believe that merging the BUMPs as shown, because of the problem of ambiguous and/or conflicting leaves overriding each other, is likely to result in a BUMP that no longer correctly proves the full set of transaction hashes that A and B originally correctly proved separately. Further, resolving / detecting this during the merge operation is non-trivial.

A final thing to worry about is that of "phantom" merkle branches, possible because of Satoshi's unfortunate decision to duplicate the last hash when there is an odd number in a level. For example, a block with 6 transactions, and a wallet asking for proof of inclusion of the last transaction, can be fooled by someone creating a BUMP for an 8-transaction block with the same first 6 transacactions, and tx 7 a duplicate of tx 5 and tx 8 a duplicate of tx 6. This fake BUMP would give the correct merkle root and therefore I believe be accepted by the sample code. An implementation should be able to detect if it is being lied to like this. The TSC spec discusses this point and how to detect it.

Here are scenarios I think a wallet should detect, and my suggested handling:

Offsets out of range in a level. REJECT
An offset appearing more than once in a level. REJECT if the hashes differ, ACCEPT redundant duplicates but remove them.
Presence of a phantom merkle branch. REJECT
Leaves in the path that do not contribute to proving any of the tx hashes in the base level. REJECT the bump, unless the leaves are redundant in that they match the combined hash of the two hashes in the level below, in which case ACCEPT them but remove them from the bump.
BUMP path of wrong depth: REJECT

In summary, I think it is best, and arguably necessary, for an implementation to fully validate the entirety of the data in the BUMP. I do not know, but I doubt, if that is possible with the current spec without a transaction count. If the transaction count is included, then it is possible to fairly easily verify everything in the BUMP and implement the handling I describe above.

In passing, as the 2nd point is sometimes missed, note a bump only proves inclusion of a tx in a block if

the merkle root of the proof matches the merkle root of a block header (at the claimed height if it helps lookup)
the tx is a valid bitcoin tx (so the hash isn't arbitrary), which proves the tree is not truncated

That leaves a final problem: how to prove the transaction count? The transaction count is proven (if the phantom branch check is implemented) by including the final transaction in the block in the proof of every BUMP. This renders the explicit tx count unnecessary in the format, as it is one plus the largest offset in the first row of the path, saving a few more bytes. However it adds more hashes to the BUMP if the final transaction is not one that was originally wanted (but likely only for a few levels). Considering the elimination of the flags, I suspect a BUMP in this format is not much different on average to a BUMP in the spec format (binary and JSON), and will compare increasingly favourably as the number of transactions in the BUMP rises.

To summarise, this is my suggested format - not too different to the current one:

block height (as now), followed by
the path: a list of levels. Each level: a varint count, followed by (offset, hash) pairs.

with the understanding that the last TX in the block is always included. Then the tx count is determined to be 1 plus the maximum offset in the first level of the path, which in turn determines the length of the path.

This suggestion has the benefit of being a simpler format, therefore likely a simpler implementation, and being robust to various attacks. It's main downside is it's not the existing spec, so not implemented in ref wallet, Arc, or typescript library 😃

I have written a fully-tested implementation in Python in the bitcoinX repo; the entire code is quite tight and about 270 lines. It includes all the verifications and checks I mention above, and can read / write to / from binary and JSON. It also includes code to create a BUMP given the tx hashes of a block and a desired subset. The merge operation is also implemented. See https://github.com/kyuupichan/bitcoinX/blob/master/bitcoinx/merkle.py

I will implement the spec as-is so I can compare sizes properly, but Daniel was pressuring me to comment more quickly :)

sirdeggen · 2024-01-18T02:19:26Z

sirdeggen
Jan 18, 2024
Maintainer

Thanks I really appreciate the thoughtful review. BUMP went through a good few versions before landing where it is. Your recommended change in summary is:

do more validation and checking (agree)
drop tree height
include last tx always

Correct?

In which case I will think about what new problems that may introduce 😁

0 replies

sirdeggen · 2024-01-18T02:23:45Z

sirdeggen
Jan 18, 2024
Maintainer

The client txid flag is an idea from @tonesnotes - which he may be better placed to articulate and defend.

0 replies

sirdeggen · 2024-01-18T02:26:18Z

sirdeggen
Jan 18, 2024
Maintainer

I don't think I'm concerned about phantom merkles -- my understanding is that the thing this would allow is for someone to fake the number of txids in a block. That's is not a number I concern myself with as a user.

Whether it's at index 6 or 8 doesn't matter to me so long as it creates the same root and is therefore de facto in that block.

0 replies

sirdeggen · 2024-01-18T02:28:16Z

sirdeggen
Jan 18, 2024
Maintainer

The duplicate offsets and redundant inclusions also don't concern me that much. What's the incentive for someone to send more than necessary? Is this just to keep things orderly and check things are kept to a minimum?

0 replies

kyuupichan · 2024-01-18T07:10:45Z

kyuupichan
Jan 18, 2024
Author

Your summary is correct. Not rejecting phantom merkles means accepting invalid BUMPs, which would then not be mergeable with other valid bumps, which utlimately becomes a hard-to-resolve conundrum for the wallet that accepted it. They are definitely a problem, and as some implementations would reject them, a wallet would risk being incompatible with other wallets if it accepted them and passed them on.
Regarding duplicates - essentially I'm saying that properly validating the current spec is more complex, and I'm not sure it's possible to the standard I'm suggesting it should be done. As this will likely become a base standard of wallets going forward, I think it's important to get it right from the start, and particularly to be immune to various attacks that the wallet would find hard to resolve later.

0 replies

tonesnotes · 2024-01-18T17:53:05Z

tonesnotes
Jan 18, 2024
Maintainer

The idea of client txid flags is that at the leaf level of the tree, roughly half the hashes are there only in support of the actual txid's of interest to the party that requested (and likely paid for) the BUMP. The BUMP was generated in response to a request for proofs for a collection of client txids.

The BUMP format should have some way for the paying client to reconcile the data received against outstanding txid proofs requested.

"Here's the data you requested, and just for fun, I'm randomly hiding your results among an equal number of results you don't care about."

0 replies

kyuupichan · 2024-01-18T18:05:31Z

kyuupichan
Jan 18, 2024
Author

But that's obvious ... it's the TX hashes you didn't request. The fact is the BUMP is a proof for all included tx hashes. You even have the offsets; the wallet knows the siblings are needed. The flag is redundant.

0 replies

kyuupichan · 2024-01-18T18:06:51Z

kyuupichan
Jan 18, 2024
Author

What does the wallet do if the flag for a client tx ID is on a non-client TX? What if it's TX ID is flagged non-client? I can't understand how this flag helps in any way, it just gives more things to validate and you knew the answer anyway. Remember that in general the other side is untrusted, and if BSV starts to take off, you can bet those against BSV will be trying to exploit any holes available in wallets and other infrastructure.

0 replies

tonesnotes · 2024-01-18T18:10:30Z

tonesnotes
Jan 18, 2024
Maintainer

With the flag, processing receipt of a BUMP is running through a list updating your database.
Without the flag, you must fist query your database for all txids without proofs, reconcile this list against the incoming list, then do the updates. Not my definition of redundant.

If the flag lies, then the update fails. And the client thinks about finding a transaction processor whose results don't lie.

0 replies

kyuupichan · 2024-01-18T18:12:33Z

kyuupichan
Jan 18, 2024
Author

A reasonable wallet would have pending and other stuff in memory (or would load it at startup); at least mine will.....

0 replies

tonesnotes · 2024-01-18T18:12:54Z

tonesnotes
Jan 18, 2024
Maintainer

I very much like adding the tx count.

But encoding it as the max offset -1 in the leaf level is a kludge. You're hiding the data and
adding unnecessary data.

0 replies

tonesnotes · 2024-01-18T18:15:21Z

tonesnotes
Jan 18, 2024
Maintainer

There are other kinds of wallets to consider. Certainly the client can do extra work to reduce the work, BUT THEY ARE THE CLIENT PAYING FOR A SERVICE. We've lived far too long in this mode where external data services are unmonetized.

0 replies

kyuupichan · 2024-01-18T18:17:25Z

kyuupichan
Jan 18, 2024
Author

The rationale is it's provable; everything in the BUMP in the format I propose is easily verified as fact (or rejected). Without the last hash, you're just taking their word for it, and as I tried to explain, this gives all kinds of future problems with merge operations. Such merge operations could be rejected when the bump is valid, and/or the new merged bump can no longer be a valid proof for what it used to be. Such problems are a nightmare for a wallet to deal with - it doesn't know which data is correct.

The other side can be another wallet. Your argument is essentially to wing it and hope for the best. We can do better than that. I don't want to be the support person for a wallet operating in that way.

0 replies

tonesnotes · 2024-01-18T18:17:36Z

tonesnotes
Jan 18, 2024
Maintainer

I'd also suggest we add a "Complete" flag.
If Complete, no offsets are included. Every level is complete sequential hashes at that level of the tree.

0 replies

tonesnotes · 2024-01-18T18:23:30Z

tonesnotes
Jan 18, 2024
Maintainer

You have a valid point that the tx count needs a validity check and yes, its hash and confirming the pattern of required hash duplications to confirm the merkle root seems solid. Please replace "kludge" with "elegant solution" in my previous comment ;-)

0 replies

kyuupichan · 2024-01-22T15:54:22Z

kyuupichan
Jan 22, 2024
Author

Transactions in the tree with duplicate hashes are unspendable. This has occurred at least twice in early coinbase txs and was the motivator for BIP30.

To clarify this - you mean the blockchain, not the tree, I think. The one that comes later is spendable - it overwrites the first, if not already spent. In the cases you cite, the prior UTXOs were not spent by the time the duplicates appeared, and hence are lost forever. Each later duplicate is spendable still.

0 replies

tonesnotes · 2024-01-22T16:07:41Z

tonesnotes
Jan 22, 2024
Maintainer

Correct. Thanks Neil.

Are you sure about the overwriting behavior. My take-away from long ago was that a transaction would not be valid today if it duplicated an existing utxo's txid. And of those specific duplicates, neither the original txid nor the duplicate were spendable.

0 replies

kyuupichan · 2024-01-22T17:10:37Z

kyuupichan
Jan 22, 2024
Author

I don't think there is code to check / enforce that - can you point to any? I wouldn't put it past Core to confiscate the money for the 2nd tx, though....

0 replies

sirdeggen · 2024-01-23T16:55:50Z

sirdeggen
Jan 23, 2024
Maintainer

think about it for a while

Thank you for your patience. I was in the mindset of only looking at data given without knowledge of whether the Merkle root was valid or not. Even then I agree it's possible.

Rephrasing to ensure I've understood: If it's not the last txid in the block, then there will always be at least one hash to right (odd offset) which is not a duplicate.

0 replies

tonesnotes · 2024-01-23T19:21:47Z

tonesnotes
Jan 23, 2024
Maintainer

Yes.

And here's one reason why having a provable tx count is valuable: If one purpose of the blockchain is to put events in order, then it is valuable to know the index of the last transaction in each block. With this information you can count and enumerate all events. Determine how many events lie between any two events. Even assign a unique "block time" value to each transaction that reflects both their approximate UTC time and exact order of occurrence.

0 replies

sirdeggen · 2024-01-24T14:23:49Z

sirdeggen
Jan 24, 2024
Maintainer

Summarizing to ensure I've understood:

The motivation for including final tx is that it renders the need to encode treeHeight and duplicate flags redundant. This saves one byte per branch, plus one for the tree height.

The point at which this would be a saving overall is something like --- napkin calculation ---
when there are 1,000 or so transactions included.

... this was in edit overnight and then we had our catch up call. I believe we're on the same page. Thanks!

0 replies

tonesnotes · 2024-01-24T16:38:59Z

tonesnotes
Jan 24, 2024
Maintainer

Not at all. The point is that including the final txid of the block would be the ONLY WAY for an SPV client to verify the number of transactions in each block and that this information is valuable in and of itself.

No change to the encoding is required. Flags remain unchanged.

It would be optional and only recommended when the BUMP is generated by a transaction processor. Specifically, it would not be recommended when BUMPs are generated for client-to-client SPV transaction exchange protocols.

0 replies

kyuupichan · 2024-01-24T17:46:20Z

kyuupichan
Jan 24, 2024
Author

Can you share your napkin calculation? I think it's a saving much earlier than that.

0 replies

sirdeggen · 2024-02-05T03:58:24Z

sirdeggen
Feb 5, 2024
Maintainer

I've added some more tests and checking based on the recommendations here to the ts-sdk. Thanks for your feedback Neil, I think that it's a much tighter data model now. For now I've kept the flag byte and treeHeight because I don't think the trade off is worth it.

I agree @tonesnotes I suspect that a Merkle Service could send a request body which is a list of txids, as well as a flag stating whether or not the final transaction in the block is required - since its txid is otherwise unknown.

I'll replace the current example code in the BRC with the ts-sdk implementation and go-sdk too once they're released, and add you as a co-author with the new stipulations around non essential hashes and so on.
I wonder if this would be a sufficient compromise from your perspective @kyuupichan?

Otherwise, I'd be happy to revisit the topic as part of the Merkle Service project development, as well as the potential py-sdk.

0 replies

kyuupichan · 2024-02-06T07:47:57Z

kyuupichan
Feb 6, 2024
Author

It doesn't seem anything really changed.

From experience with Bitcoin Unlimited, ElectrumX and ElectrumSV, people will write code to exploit any small vulnerability in BTC and more so non-BTC software. Implementations should be robust, and not accepting of arbitrary input. This BRC encourages shortcuts and problematic implementations; forming a kind of technical debt that I am quite sure will come back to bite many users and software developers later.

0 replies

sirdeggen · 2024-02-06T22:01:06Z

sirdeggen
Feb 6, 2024
Maintainer

The changes are that it only accepts offsets which are required to encode the given leaves of the tree, no extra data allowed, and any inclusions which do not hash to the same Merkle root are also excluded.

There's no enforcement of the final tx being included, but this is something which can be stipulated as an additional requirement when specifically being used within the context of a compound merkle path from a merkle service.

0 replies

sirdeggen · 2024-02-06T22:03:58Z

sirdeggen
Feb 6, 2024
Maintainer

If my requirement is that I need to be sure the txid is present in a particular merkle root, and I have the block headers, then the phantom merkle branch is of no concern to me.

Whether you've taken the time to fake a merkle tree with reflected hashes at the end doesn't really concern me, it would still confirm the presence of the txid in the block. I don't care how many transactions are in a given block, I'm not a miner.

0 replies

sirdeggen · 2024-02-06T22:04:48Z

sirdeggen
Feb 6, 2024
Maintainer

Or have I missed something? Possible that I'm not quite following your thinking.

0 replies

sirdeggen · 2024-02-06T22:05:52Z

sirdeggen
Feb 6, 2024
Maintainer

I'll update the BRC, to reflect the changes in the now published ts-sdk

0 replies

kyuupichan · 2024-02-07T10:04:13Z

kyuupichan
Feb 7, 2024
Author

You don't want to have fake merkle paths / bumps because your peers may reject them, then what do you do?

0 replies

[DISCUSSION] - Comments on BUMP #89

kyuupichan Jan 17, 2024

BRC ID

Discussion

Replies: 41 comments

sirdeggen Jan 18, 2024 Maintainer

sirdeggen Jan 18, 2024 Maintainer

sirdeggen Jan 18, 2024 Maintainer

sirdeggen Jan 18, 2024 Maintainer

kyuupichan Jan 18, 2024 Author

tonesnotes Jan 18, 2024 Maintainer

kyuupichan Jan 18, 2024 Author

kyuupichan Jan 18, 2024 Author

tonesnotes Jan 18, 2024 Maintainer

kyuupichan Jan 18, 2024 Author

tonesnotes Jan 18, 2024 Maintainer

tonesnotes Jan 18, 2024 Maintainer

kyuupichan Jan 18, 2024 Author

tonesnotes Jan 18, 2024 Maintainer

tonesnotes Jan 18, 2024 Maintainer

kyuupichan Jan 22, 2024 Author

tonesnotes Jan 22, 2024 Maintainer

kyuupichan Jan 22, 2024 Author

sirdeggen Jan 23, 2024 Maintainer

tonesnotes Jan 23, 2024 Maintainer

sirdeggen Jan 24, 2024 Maintainer

tonesnotes Jan 24, 2024 Maintainer

kyuupichan Jan 24, 2024 Author

sirdeggen Feb 5, 2024 Maintainer

kyuupichan Feb 6, 2024 Author

sirdeggen Feb 6, 2024 Maintainer

sirdeggen Feb 6, 2024 Maintainer

sirdeggen Feb 6, 2024 Maintainer

sirdeggen Feb 6, 2024 Maintainer

kyuupichan Feb 7, 2024 Author

kyuupichan
Jan 17, 2024

sirdeggen
Jan 18, 2024
Maintainer

sirdeggen
Jan 18, 2024
Maintainer

sirdeggen
Jan 18, 2024
Maintainer

sirdeggen
Jan 18, 2024
Maintainer

kyuupichan
Jan 18, 2024
Author

tonesnotes
Jan 18, 2024
Maintainer

kyuupichan
Jan 18, 2024
Author

kyuupichan
Jan 18, 2024
Author

tonesnotes
Jan 18, 2024
Maintainer

kyuupichan
Jan 18, 2024
Author

tonesnotes
Jan 18, 2024
Maintainer

tonesnotes
Jan 18, 2024
Maintainer

kyuupichan
Jan 18, 2024
Author

tonesnotes
Jan 18, 2024
Maintainer

tonesnotes
Jan 18, 2024
Maintainer

kyuupichan
Jan 22, 2024
Author

tonesnotes
Jan 22, 2024
Maintainer

kyuupichan
Jan 22, 2024
Author

sirdeggen
Jan 23, 2024
Maintainer

tonesnotes
Jan 23, 2024
Maintainer

sirdeggen
Jan 24, 2024
Maintainer

tonesnotes
Jan 24, 2024
Maintainer

kyuupichan
Jan 24, 2024
Author

sirdeggen
Feb 5, 2024
Maintainer

kyuupichan
Feb 6, 2024
Author

sirdeggen
Feb 6, 2024
Maintainer

sirdeggen
Feb 6, 2024
Maintainer

sirdeggen
Feb 6, 2024
Maintainer

sirdeggen
Feb 6, 2024
Maintainer

kyuupichan
Feb 7, 2024
Author