Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An extra delete request during reindexing has been removed. #444

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

resendersphoenix
Copy link

During reindexing, a request is made to clear the index, and then each record (which no longer exists) is deleted again, which slows down the process (reindexing takes hours instead of minutes)

During reindexing, a request is made to clear the index, and then each record (which no longer exists) is deleted again, which slows down the process
@tw2113
Copy link
Member

tw2113 commented Dec 14, 2024

Hi @resendersphoenix

Can you open up an issue with a description and everything that you can manage to provide for us to make sure we're understanding what you're seeing and what this PR is meant to solve?

Thanks.

@resendersphoenix
Copy link
Author

The problem is similar #372
My problem is the same - very long indexing because you are sequentially deleting objects one per request (even if they are definitely not there anymore)

Once again, I analysed what happens when reindexing from wp-cli and the admin-panel
I think my fixes break the gradual reindex from wp-cli.
The right thing to do is to wait for your approval of #442 and #443 because it is also related.
And then I'll add the fixes because they depend on what you decide there.

@tw2113
Copy link
Member

tw2113 commented Dec 17, 2024

Understood on those parts, and I'm not outright saying no on this or anything like that, but will want some time to think things through for what you're proposing and if I'm seeing potential better and more appropriate routes.

One thing to keep in mind, is that records and posts aren't always 1:1. If you have an extremely long post, that spans like 5 records, based on max size settings and whatnot, then the one $this->delete_item( $post, $should_wait ); that you're wrapping, should be getting rid of all 5 records in one go, if logic serves me correct.

Regardless, leaving this open for the moment while we continue to explore and think everything through as best we can.

Handling variations in specifying the ‘algolia_clear_index_if_existing’ filter and reindexing by ‘specific_ids’
@resendersphoenix
Copy link
Author

resendersphoenix commented Dec 17, 2024

Of course, I have seen that one object can be stored as several parts.
But if the index is completely empty (because it was just created or cleared), it doesn't matter.
And the main problem is not parts of records (they are really deleted in batches), but the fact that non-existent records are deleted sequentially (when reindexing with a full cleanup or during the first indexing).

You can see - I added a parameter that contains information about whether the index has already been completely cleaned or not. And used it when analysing the need to delete an item.

This is now very easy to combine with the --clear wp-cli parameter fix:
just add $clear_index_on_page_one to the $already_cleared formation in re_index and decide what is more important - the $clear_index_on_page_one parameter or the algolia_clear_index_if_existing filter

I checked with my fixes - full reindexing (with clear) even in batches of 1000 goes very fast (from wp-admin panel)

@tw2113
Copy link
Member

tw2113 commented Dec 19, 2024

will keep all of that in mind and definitely also work to recreate for testing purposes as well, as we evaluate things. Also trying to not rush into things too hastily as that has bitten us in the proverbial butt in the past too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants