Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] Merger: pass the destination in the ctor rather than in Merge() #17473

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

silverweed
Copy link
Contributor

@silverweed silverweed commented Jan 21, 2025

RNTupleMerger should share the lifetime of its destination file to better support incremental merging, e.g.

// open the destination file once
auto destination = CreatePageSink();
RNTupleMerger merger { *destination };
auto opts = RNTupleMergeOptions{};
opts.fMergingMode = ENTupleMergingMode::kUnion;
// keep merging stuff into it as it arrives (e.g. from the network)
while (auto result = AcquireSources()) {
  merger.Merge(result->sources, opts);
}

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

@silverweed silverweed self-assigned this Jan 21, 2025
@silverweed silverweed requested a review from jblomer as a code owner January 21, 2025 15:31
RNTupleMerger();
/// Creates a RNTupleMerger with the given destination.
/// `destination` must outlive the RNTupleMerger.
explicit RNTupleMerger(RPageSink &destination);
Copy link
Member

@pcanal pcanal Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't passing a std::shared_ptr be a better indication of the required lifetime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, a) because it's clear that the merger should have a strictly smaller lifetime than the Sink it's operating on (whereas the shared_ptr would blur this concept) and, more importantly b) our API always returns PageSinks/Sources as unique_ptr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the RNTupleMerger maybe own the sink via a unique_ptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but how would the user get back ownership if they need it when they are done with it? If we add a RetrieveSink() method (or whatever) we leave the Merger in an invalid state..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the user should ever get it back; as we discussed the merger will soon own the model, so I'm not sure the user could even do something useful with the sink except merging in more files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the unique_ptr solves the problem.

because it's clear that the merger should have a strictly smaller lifetime than the Sink

Yes for a use case where the user still need access to the object (and thus unique_ptr is not an option), capturing the reference is 'silently' sharing the ownership/usage of the object and will lead to user error. We are trying to move away from implicitness. i.e. should have a strictly smaller lifetime might be true but can not be enforced nor checked by the compiler (nor simple static analysis tools).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pcanal so we have an agreement on the unique_ptr right? Can this thread be resolved?

Copy link

github-actions bot commented Jan 21, 2025

Test Results

    17 files      17 suites   3d 21h 12m 48s ⏱️
 2 687 tests  2 664 ✅ 23 💤 0 ❌
44 030 runs  44 030 ✅  0 💤 0 ❌

Results for commit 2c80009.

♻️ This comment has been updated with latest results.

This will allow to do special logic in case of incremental merging.

With this change, the merger acquires ownership of the destination sink
(which is the only sound thing to do in case of incr. merging due to
the merger's need to also have ownership of the sink's model)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants