Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Player/SpellQueue): bandaid crashfix #21103

Merged
merged 2 commits into from
Jan 10, 2025

Conversation

sogladev
Copy link
Member

@sogladev sogladev commented Jan 6, 2025

Changes Proposed:

This PR proposes changes to:

  • Core (units, players, creatures, game systems).
  • Scripts (bosses, spell scripts, creature scripts).
  • Database (SAI, creatures, etc).

if there's an invalid spell in the queue, clear the queue and return

Issues Addressed:

  • Closes

SOURCE:

The changes have been validated through:

  • Live research (checked on live servers, e.g Classic WotLK, Retail, etc.)
  • Sniffs (remember to share them with the open source community!)
  • Video evidence, knowledge databases or other public sources (e.g forums, Wowhead, etc.)
  • The changes promoted by this pull request come partially or entirely from another project (cherry-pick). Cherry-picks must be committed using the proper --author tag in order to be accepted, thus crediting the original authors, unless otherwise unable to be found

Tests Performed:

This PR has been:

  • Tested in-game by the author.
  • Tested in-game by other community members/someone else other than the author/has been live on production servers.
  • This pull request requires further testing and may have edge cases to be tested.

How to Test the Changes:

  • This pull request can be tested by following the reproduction steps provided in the linked issue
  • This pull request requires further testing. Provide steps to test your changes. If it requires any specific setup e.g multiple players please specify it as well.

Known Issues and TODO List:

  • [ ]
  • [ ]

How to Test AzerothCore PRs

When a PR is ready to be tested, it will be marked as [WAITING TO BE TESTED].

You can help by testing PRs and writing your feedback here on the PR's page on GitHub. Follow the instructions here:

http://www.azerothcore.org/wiki/How-to-test-a-PR

REMEMBER: when testing a PR that changes something generic (i.e. a part of code that handles more than one specific thing), the tester should not only check that the PR does its job (e.g. fixing spell XXX) but especially check that the PR does not cause any regression (i.e. introducing new bugs).

For example: if a PR fixes spell X by changing a part of code that handles spells X, Y, and Z, we should not only test X, but we should test Y and Z as well.

@github-actions github-actions bot added CORE Related to the core file-cpp Used to trigger the matrix build labels Jan 6, 2025
@Nyeriah
Copy link
Member

Nyeriah commented Jan 6, 2025

cc @walkline

@walkline
Copy link
Contributor

walkline commented Jan 6, 2025

From this crashlog, we see that the spell cast request has invalid values. The more interesting question is how the values became invalid.

I saw some other recent crashlogs - one and two - which make me think that something bad is happening, such as using a dangling pointer (using a deleted player) or a race condition (e.g., double parallel updates of the same map or instance).

@Grimdhex
Copy link
Contributor

Grimdhex commented Jan 6, 2025

Agree with walkline. No real sense to have a invalid spell here. Probably a deeper issue.

@sogladev also why clearing the queue instead to ignoring it? (Simple question, I don't remember all the process of your system)

@sogladev
Copy link
Member Author

sogladev commented Jan 6, 2025

Agree with walkline. No real sense to have a invalid spell here. Probably a deeper issue.

@sogladev also why clearing the queue instead to ignoring it? (Simple question, I don't remember all the process of your system)

If the front of std::deque contains invalid data then the queue behavior is undefined. So it's best to clear the entire thing than trying to consider the next element to be valid.

I also question when std::deque contains invalid data that the pointer to the next element is also invalid, pointing to undefined memory. Now, I'm not sure what happens if .clear() is called 😆

@Nyeriah
Copy link
Member

Nyeriah commented Jan 10, 2025

Suddenly this crash became quite frequent, do we think this will help easen up the crashes?

@sogladev
Copy link
Member Author

Suddenly this crash became quite frequent, do we think this will help easen up the crashes?

It can't make it crash more often..

@Nyeriah Nyeriah merged commit 5bc20a1 into azerothcore:master Jan 10, 2025
13 checks passed
@walkline
Copy link
Contributor

But it wouldn’t solve the issue either, it would crash on .clear().

@Takenbacon
Copy link
Contributor

From the vague research I did earlier in the week it shouldn't even be possible for spellInfo to be null in the container to begin with (under normal circumstances). Once I'm done with my work week I'll try to spend a little more time on it if possible.

@Nyeriah
Copy link
Member

Nyeriah commented Jan 12, 2025

@Takenbacon
Copy link
Contributor

Yes, it didnt help https://gist.github.com/Nyeriah/bbcd980360e462731503bb23f1c79c4c

I spent a little time on it tonight and while there's a couple issues I saw nothing that would lead to a crash. I'd be curious to try disabling spell queue (it's a config option) and see if similar crashes are still occurring, would narrow it down specifically to spell queue or if spell queue container is just "unlucky" and being corrupted from something else.

@sudlud
Copy link
Member

sudlud commented Jan 13, 2025

so for the latest crashlog nyeriah provided this bandaid pr actually catched the error and then crashed on SpellQueue.clear(); inside the bandaid fix😅

could probably use another "safer" way to clear SpellQueue to a working state until the root cause has been found

@Takenbacon
Copy link
Contributor

so for the latest crashlog nyeriah provided this bandaid pr actually catched the error and then crashed on SpellQueue.clear(); inside the bandaid fix😅

could probably use another "safer" way to clear SpellQueue to a working state until the root cause has been found

Not possible. An element in the container or even the player object itself point to invalid memory, all paths will inevitably lead to a seg fault once the memory is accessed during deallocation or use. The only solution is to fix the underlying issue, it's just a matter of trying to narrow down where it is or what causes it. It's significantly more difficult to do based off a crashlog without being able to check a memory dump in gdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE Related to the core file-cpp Used to trigger the matrix build Ready to be Reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants