Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Mainnet epoch 528: Missed block at slot 143012062 with error TraceBlockFromFuture + Missed block at slot 143094566 with no information in node log. #6058

Open
asnakep opened this issue Dec 20, 2024 · 5 comments
Labels
needs triage Issue / PR needs to be triaged.

Comments

@asnakep
Copy link

asnakep commented Dec 20, 2024

Cardano Mainnet Blocks Production

Mainnet epoch 528: Missed block at slot 143012062 with error TraceBlockFromFuture + Missed block at slot 143094566 with no information in BP node log.

Good morning, IntersectMBO Team,

We encountered two missed blocks during our epoch 528 schedule. After thorough investigation, we ruled out the most common causes:

  1. Server time synchronization issue: Chrony NTP reported no errors or issues.
  2. High CPU usage: CPU utilization was only ~6% during the error occurrence.
  3. Other potential system errors: No anomalies were found in dmesg or syslog.

1. First missed block: Slot 143012062 - Timestamp: 2024-12-19 03:19:13 UTC - Error: TraceBlockFromFuture

In the attached BP node log (see linked JSON file and screenshots), we observe the following sequence:

TraceStartLeadershipCheck for slot 143012061 is logged.
Subsequent entries detail block fetch and adoption events for slot 143012063.
Finally, TraceStartLeadershipCheck for slot 143012062 appears but is immediately followed by the error TraceBlockFromFuture.

Added comments in attached log:
"TraceStartLeadershipCheck for slot: 143012061"
"All entries about block fetch from slot 143012063"
"Missed block at slot 143012062. Note: Leadership check is logged 1.69 seconds in the past at 03:19:13.000Z, while the previous entry is from 03:19:14.690Z."
"Error: TraceBlockFromFuture for slot 143012062"


2. Second missed block: Slot 143094566 - Timestamp: 2024-12-20 02:14:17 UTC - Error: No information in logs

Unlike the first missed block, this one lacks any direct log entries. However, the logs do not show TraceStartLeadershipCheck for the affected slot (143094566), even though such checks are present for all surrounding slots.

  • Screenshot with highlighted parts from BP log
    image

  • Pool: pool1xs34q2z06a46nk7hl48d27dj5gzc6hh9trugw2ehs9ajsevqffx
  • OS Name: Ubuntu
  • OS Version: 24.04.1 LTS
  • Node version:
    cardano-node 10.1.3 - linux-x86_64 - ghc-8.10
    git rev 36871ba
  • CLI version:
    cardano-cli 10.1.1.0 - linux-x86_64 - ghc-8.10
    git rev 01bda2e

Thank you,
kind regards,
Manuel

@asnakep asnakep added the needs triage Issue / PR needs to be triaged. label Dec 20, 2024
@amesgen
Copy link
Member

amesgen commented Jan 6, 2025

Thanks for the great report!

One potential reason for these could be a concurrent garbage collection. Do you see a TookSnapshot event nearby?

@asnakep
Copy link
Author

asnakep commented Jan 7, 2025

Hello, IntersectMBO Team,

I just checked the BP log, and found a TookSnapshot event approximately 4 seconds before encountering the TraceBlockFromFuture event.

{"app":[],"at":"2024-12-19T03:19:10.456Z","data":{"enclosedTime":{"tag":"RisingEdge"},"kind":"TraceSnapshotEvent.TookSnapshot","snapshot":{"kind":"snapshot"},"tip":"RealPoint (SlotNo 142969741) 54c7a620b6e5c87b5cbe099a0f836f19debe82ebd6dedf0cf6cfd7d78b0339e9"},"env":"10.1.3:36871","host":"snake-co","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"926","sev":"Info","thread":"316"}

{"app":[],"at":"2024-12-19T03:19:14.708Z","data":{"credentials":"Cardano","val":{"current slot":143012062,"kind":"TraceBlockFromFuture","tip":143012063}},"env":"10.1.3:36871","host":"snake-co","loc":null,"msg":"","ns":["cardano.node.Forge"],"pid":"926","sev":"Error","thread":"334"}

Thank you,
cheers,
Manuel

@amesgen
Copy link
Member

amesgen commented Jan 7, 2025

Thanks, that then seems quite likely as the reason for the first missed slot. Reducing the performance impact of ledger snapshots is tracked in IntersectMBO/ouroboros-consensus#868; and things should improve fairly soon with the integration of Ledger's MemPack (IntersectMBO/cardano-ledger#4811).

@asnakep
Copy link
Author

asnakep commented Jan 7, 2025

Cool thank you, so lets close this one

cheers,
Manuel

@asnakep asnakep closed this as completed Jan 7, 2025
@asnakep
Copy link
Author

asnakep commented Jan 8, 2025

Good Morning,

I noticed that this issue 6058 is mentioned in IntersectMBO/ouroboros-consensus#868

I reopen it, but feel free to close it if reopen is not needed,

kind regards,
Manuel

@asnakep asnakep reopened this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Issue / PR needs to be triaged.
Projects
None yet
Development

No branches or pull requests

2 participants