Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation: failing faster when validating txs with missing inputs #1358

Draft
wants to merge 3 commits into
base: utxo-hd-main
Choose a base branch
from

Conversation

amesgen
Copy link
Member

@amesgen amesgen commented Jan 11, 2025

In various common scenarios, the mempool will (re)apply a tx to a ledger state that does not contain all of the necessary inputs (usually because the tx has already been applied before). Therefore, it is desirable for tx (re) application to fail fast in these cases.

The purpose of this PR is just to summarize effort to look into improvements here. Ideally, no modification to Consensus would be necessary in the end; Ledger should be able to do everything on their side.

Specifically, this PR tries two things:

  • Try to use reapplyTx instead of applyTx when the tx is already in the mempool. The idea here is to avoid crypto work that will be unnecessary as tx validation will fail due to missing inputs eventually anyways.

    This tx is a concrete example for a tx where this saves time.

  • Try to restrict the UTxO set before calling the ledger to the potentially needed inputs of the tx. That's exactly what UTxO HD does, so this PR branch is targeting UTxO-HD targeting main #1267.


To test/benchmark this, this PR modifies the --repro-mempool-and-forge db-analyser pass. When processing a block, it adds all of its transactions to the mempool, and then individuall tries to add each tx again, measuring how long it takes until the mempool is rejected. The output data is stored in readd-txs.csv.

I ran db-analyser like this

cabal run db-analyser -- --v2-in-mem --db /path/to/db --no-snapshot-checksum-on-read \
  --repro-mempool-and-forge 1 --analyse-from 133660855 --num-blocks-to-process 100000 \
  cardano --config /path/to/config.json

(takes ~30min) and used the mean of two runs.

Plots are created using a very ad-hoc script.

Here, GHC mutator time is used, and mut_{baseline,patched} refer to versions against current main (ie without UTxO HD), and mut_utxohd_{baseline,patched} to versions in this branch. _baseline means that the mempool hasn't been patched to use reapplyTx, whereas _patched means exactly that.

Show plot 📈

plot

Short summary of the data (in microseconds):

       mut_baseline   mut_patched  mut_utxohd_baseline  mut_utxohd_patched
count  1.131739e+06  1.131739e+06         1.131739e+06        1.131739e+06
mean   3.425196e+02  2.497219e+02         3.629060e+02        2.856910e+02
std    1.934589e+02  1.584481e+02         1.612693e+02        1.332435e+02
min    1.725000e+02  1.140000e+02         2.020000e+02        1.490000e+02
25%    2.317500e+02  1.585000e+02         2.660000e+02        2.030000e+02
50%    2.925000e+02  1.945000e+02         3.215000e+02        2.375000e+02
75%    3.805000e+02  2.875000e+02         4.010000e+02        3.270000e+02
max    9.764000e+03  5.660500e+03         6.436000e+03        3.460500e+03

Correlation table:

                     mut_baseline  mut_patched  mut_utxohd_baseline  mut_utxohd_patched
mut_baseline             1.000000     0.821137             0.927224            0.769448
mut_patched              0.821137     1.000000             0.775456            0.913521
mut_utxohd_baseline      0.927224     0.775456             1.000000            0.793829
mut_utxohd_patched       0.769448     0.913521             0.793829            1.000000

Thoughts:

  • The mempool patch definitely helps, both with and without UTxO HD, but only for certain txs.
  • UTxO HD seems is slower when comparing means/medians.
  • UTxO HD seems to significantly help for certain txs, visible both in "mut_baseline vs mut_utxohd_baseline" and "mut_patched vs mut_utxohd_patched".
  • Eyeballing seems to indicate that there are significantly fewer big outliers with UTxO HD.

A next step would be to profile individual txs to understand why they are currently relatively "slow to fail". I did this already for a few txs (that's e.g. how I saw that the tx mentioned above is slow to fail due to crypto without the mempool patch) like this:

  • Create a ledger snapshot for the preceding block of the tx.
  • Run the db-analyser pass with --num-blocks-to-process 1, and make use of {start,stop}ProfTimer (see the code for an example).
  • Load the result into https://www.speedscope.app/ (requires +RTS -pj).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant