pageserver: upload/delete race #10283

erikgrinaker · 2025-01-05T18:42:03Z

Uploads and deletes are racy in the upload queue, which can lead to delayed deletes removing newly uploaded files. This will be true even after upload reordering in #10218 ensures conflicting operations are executed in order, because deletions are async -- they're only submitted to the deletion queue:

neon/pageserver/src/tenant/remote_timeline_client.rs

Lines 2103 to 2111 in a77e87a

    
           self.deletion_queue_client 
        
               .push_layers( 
        
                   self.tenant_shard_id, 
        
                   self.timeline_id, 
        
                   self.generation, 
        
                   delete.layers.clone(), 
        
               ) 
        
               .await 
        
               .map_err(|e| anyhow::anyhow!(e))

A lot of complexity has been added to deal with this, see e.g. #9844.

We should instead make deletes synchronous, such that the upload queue can properly order these and avoid races. We can either continue using the deletion queue, but wait for the deletion to execute before completing the task, or simply run the deletion operation synchronously.

When we do this, we should also document the invariants around this, in a doc comment and/or an RFC, with diagrams.

jcsp · 2025-01-06T09:50:36Z

So with reordering, I guess we now don't really care if the delete operation itself is slower (because it waits for flush), because other operations will usually just skip past it, right?

We would need to care about runtime of deletions at some points that we do a wait_completion though, like in some shutdown paths, so maybe those paths will need to kick the deletion queue if the timeline client has any deletions pending

erikgrinaker · 2025-01-06T09:58:50Z

So with reordering, I guess we now don't really care if the delete operation itself is slower (because it waits for flush), because other operations will usually just skip past it, right?

Yes, except for any conflicting operations (same file).

I'm not sure to what extent we need to use the deletion queue at all, or if we can just submit delete requests directly. But I'm assuming we use it for good reason.

We would need to care about runtime of deletions at some points that we do a wait_completion though, like in some shutdown paths, so maybe those paths will need to kick the deletion queue if the timeline client has any deletions pending

How long does it take for the deletion queue to process deletions? Can we bound it to something reasonable like 5 seconds?

jcsp · 2025-01-06T11:11:58Z

I'm not sure to what extent we need to use the deletion queue at all, or if we can just submit delete requests directly. But I'm assuming we use it for good reason.

The deletion queue has two purposes (via 025-generation-numbers.md)

Batching generation validation requests to the controller (pageservers need to effectively ask permission before doing a deletion)
Batching deletion requests to S3.

In practices, the batches are often pretty small, and this optimization has limited absolute benefit, but the queue makes it easier to reason about scale and things being "cheap": we don't have to worry about how storage controller (or DB) load would spike if a pageserver decided to execute lots of deletions all of a sudden, because they're nicely batched.

One could imagine a world where we collapse the generation validation logic into RemoteTimelineClient, so that it can essentially do its own batching, but that's a little awkward because generations are per tenant and clients are per-timeline. There's a tradeoff between the separation of concerns (status quo, deletion queue is its own little subsystem), and net LoC (it would be terser to build it all into one queue).

How long does it take for the deletion queue to process deletions? Can we bound it to something reasonable like 5 seconds?

We could, but 5s would still be too long for something like a timeline shutdown during live migration.

erikgrinaker added c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug labels Jan 5, 2025

erikgrinaker self-assigned this Jan 5, 2025

skyzh mentioned this issue Jan 6, 2025

pageserver: reorder upload queue when possible #10218

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: upload/delete race #10283

pageserver: upload/delete race #10283

erikgrinaker commented Jan 5, 2025 •

edited

Loading

jcsp commented Jan 6, 2025

erikgrinaker commented Jan 6, 2025

jcsp commented Jan 6, 2025

pageserver: upload/delete race #10283

pageserver: upload/delete race #10283

Comments

erikgrinaker commented Jan 5, 2025 • edited Loading

jcsp commented Jan 6, 2025

erikgrinaker commented Jan 6, 2025

jcsp commented Jan 6, 2025

erikgrinaker commented Jan 5, 2025 •

edited

Loading