Investigation: failing faster when validating txs with missing inputs #1358

amesgen · 2025-01-11T15:48:34Z

In various common scenarios, the mempool will (re)apply a tx to a ledger state that does not contain all of the necessary inputs (usually because the tx has already been applied before). Therefore, it is desirable for tx (re) application to fail fast in these cases.

The purpose of this PR is just to summarize effort to look into improvements here. Ideally, no modification to Consensus would be necessary in the end; Ledger should be able to do everything on their side.

Specifically, this PR tries two things:

Try to use reapplyTx instead of applyTx when the tx is already in the mempool. The idea here is to avoid crypto work that will be unnecessary as tx validation will fail due to missing inputs eventually anyways.

This tx is a concrete example for a tx where this saves time.
Try to restrict the UTxO set before calling the ledger to the potentially needed inputs of the tx. That's exactly what UTxO HD does, so this PR branch is targeting UTxO-HD targeting main #1267.

To test/benchmark this, this PR modifies the --repro-mempool-and-forge db-analyser pass. When processing a block, it adds all of its transactions to the mempool, and then individuall tries to add each tx again, measuring how long it takes until the mempool is rejected. The output data is stored in readd-txs.csv.

I ran db-analyser like this

cabal run db-analyser -- --v2-in-mem --db /path/to/db --no-snapshot-checksum-on-read \
  --repro-mempool-and-forge 1 --analyse-from 133660855 --num-blocks-to-process 100000 \
  cardano --config /path/to/config.json

(takes ~30min) and used the mean of two runs.

Plots are created using a very ad-hoc script.

Here, GHC mutator time is used, and mut_{baseline,patched} refer to versions against current main (ie without UTxO HD), and mut_utxohd_{baseline,patched} to versions in this branch. _baseline means that the mempool hasn't been patched to use reapplyTx, whereas _patched means exactly that.

Show plot 📈

Short summary of the data (in microseconds):

       mut_baseline   mut_patched  mut_utxohd_baseline  mut_utxohd_patched
count  1.131739e+06  1.131739e+06         1.131739e+06        1.131739e+06
mean   3.425196e+02  2.497219e+02         3.629060e+02        2.856910e+02
std    1.934589e+02  1.584481e+02         1.612693e+02        1.332435e+02
min    1.725000e+02  1.140000e+02         2.020000e+02        1.490000e+02
25%    2.317500e+02  1.585000e+02         2.660000e+02        2.030000e+02
50%    2.925000e+02  1.945000e+02         3.215000e+02        2.375000e+02
75%    3.805000e+02  2.875000e+02         4.010000e+02        3.270000e+02
max    9.764000e+03  5.660500e+03         6.436000e+03        3.460500e+03

Correlation table:

                     mut_baseline  mut_patched  mut_utxohd_baseline  mut_utxohd_patched
mut_baseline             1.000000     0.821137             0.927224            0.769448
mut_patched              0.821137     1.000000             0.775456            0.913521
mut_utxohd_baseline      0.927224     0.775456             1.000000            0.793829
mut_utxohd_patched       0.769448     0.913521             0.793829            1.000000

Thoughts:

The mempool patch definitely helps, both with and without UTxO HD, but only for certain txs.
UTxO HD seems is slower when comparing means/medians.
UTxO HD seems to significantly help for certain txs, visible both in "mut_baseline vs mut_utxohd_baseline" and "mut_patched vs mut_utxohd_patched".
Eyeballing seems to indicate that there are significantly fewer big outliers with UTxO HD.

A next step would be to profile individual txs to understand why they are currently relatively "slow to fail". I did this already for a few txs (that's e.g. how I saw that the tx mentioned above is slow to fail due to crypto without the mempool patch) like this:

Create a ledger snapshot for the preceding block of the tx.
Run the db-analyser pass with --num-blocks-to-process 1, and make use of {start,stop}ProfTimer (see the code for an example).
Load the result into https://www.speedscope.app/ (requires +RTS -pj).

Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>

amesgen and others added 3 commits January 11, 2025 16:12

db-analyser: support V2 LedgerDB

669959d

db-analyser: benchmark failing to add txs

95f28d3

Mempool: fail faster if tx is already in the mempool

78555ee

Co-authored-by: Nicolas Frisby <nick.frisby@iohk.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: failing faster when validating txs with missing inputs #1358

Investigation: failing faster when validating txs with missing inputs #1358

amesgen commented Jan 11, 2025 •

edited

Loading

Investigation: failing faster when validating txs with missing inputs #1358

Are you sure you want to change the base?

Investigation: failing faster when validating txs with missing inputs #1358

Conversation

amesgen commented Jan 11, 2025 • edited Loading

amesgen commented Jan 11, 2025 •

edited

Loading