-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parallel pull mode #256
Conversation
Added results from tests (see PR description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick comments from a first glance - will try to find time to read through the actual interesting bits e.g. the parallelized bits later :)
src/sync.rs
Outdated
use remote_command::RemoteCommandResult; | ||
|
||
#[derive(Debug, PartialEq, Clone)] | ||
pub enum PushResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to make this a type synonym instead so you can reuse all the functionality of Result
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually, how about something like:
pub enum PushResult { | |
pub struct PushResult { | |
pub duration: Duration, | |
pub result: Result<(), String>, | |
} |
I think this would reflect the intent more accurately, looking at fn push()
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason to not make a synonym was to better support extension with additional fields, but I think best version would be to combine custom structs with raw Result
pub struct PushOk {
pub duration: Duration
}
pub struct PushErr {
pub duration: Duration
}
Result<PushOk, PushErr>
I def feel weird that enums can't have named fields, structs seem to provide more protection from accidental value mismatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you meant... PushEr
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would still be better to have a product type at the top since you'll always want the duration, either via a named struct as I suggested above or an (anonymous) tuple. This also lets you punt on custom structs for the Ok/Err cases until you have extra values there - the compiler will enable safe refactoring later. This lets you avoid adding too much complexity; e.g., the execute_rsync(&mut command)
on master which is currently in your PR:
match execute_rsync(&mut command) {
Err(reason) => Err(PushErr {
duration: start_time.elapsed(),
message: reason,
}),
Ok(_) => Ok(PushOk {
duration: start_time.elapsed()
}),
}
could be just
(start_time.elapsed(), execute_rsync(&mut command))
Similarly, code like
let remote_command_duration = match remote_command_result {
Err(err) => err.duration,
Ok(ok) => ok.duration
};
``` goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like a product top-level type but I'd also like to immediately be able to distinguish its subtype (if needed)
In Kotlin (best example I have on mind) I'd do this:
sealed class PushResult(val duration: Duration) {
data class Ok(override val duration: Duration): PushResult(duration)
data class Err(override val duration: Duration): PushResult(duration)
}
This way it's still an enum (sealed class in Kotlin) while also containing always-present data in each case.
In Rust I guess I could do something like this:
pub enum PushResult {
Ok(Duration),
Err(Duration)
}
pub trait _PushResult {
fn duration(&self) -> &Duration;
}
pub impl _PushResult for PushResult {
fn duration(&self) -> &Duration {
match self {
PushResult::Err(duration) => duration,
PushResult::Ok(duration) => duration
}
}
}
wdyt?
@aneeshusa thanks for review, that was really helpful 👍 I've addressed all your comments, don't want to waste more of your time but of course you're welcome to review again :) |
args.command.clone(), | ||
config.clone(), | ||
sync::project_dir_on_remote_machine(&local_dir_absolute_path.clone()), | ||
2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 2
is bugging me (perhaps more than it should). Any reason execute_remote_command
doesn't return the bus
directly? This will also let you get rid of the pop
s and their accompanying unwraps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah… I can't return Bus
(that's what I've tried first until I got compiler yelling at me)
It's owned by closure that I pass to thread::spawn
in execute_remote_command
, there is an upstream issue about similar problem jonhoo/bus#19
src/main.rs
Outdated
Err(_) => exit_with_error(&format!("\nFailure: took {}.", format_duration(duration)), 1), | ||
_ => println!("\nSuccess: took {}.", format_duration(duration)) | ||
Err(_) => { | ||
match pull_result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branches are currently identical, can you get rid of the match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, lost some compile-time safety though, but not too bad
.unwrap(); | ||
|
||
match remote_command_result { | ||
Err(ref err) => eprintln!("\nExecution failed: took {}.\nPulling...", format_duration(err.duration)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Pulling...
part of this message seems a bit misleading in the parallel case, as I'd expect one last incremental sync. Consider having separate messaging for the serial vs parallel cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's incremental, it's nontheless potentially long
In many cases remote command produces file(s) at the very end, thus parallel pulling might not be parallel
@@ -11,3 +11,5 @@ authors = ["Artem Zinnatullin <artem.zinnatullin@gmail.com>", "Artur Dryomov <ar | |||
[dependencies] | |||
yaml-rust = "0.4.2" | |||
linked-hash-map = "0.5.1" | |||
crossbeam-channel = "0.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used it in a while, but any reason not to use std::sync::mpsc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I swapped to crossbeam
because I needed multi-receiver support, but then even that wasn't enough so I've also added bus
Decided to keep crossbeam for performance lol (nanoseconds matter hey!), we'll be stripping final binary #242 so it shouldn't matter which one we're using from the binary size point
src/sync.rs
Outdated
use remote_command::RemoteCommandResult; | ||
|
||
#[derive(Debug, PartialEq, Clone)] | ||
pub enum PushResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would still be better to have a product type at the top since you'll always want the duration, either via a named struct as I suggested above or an (anonymous) tuple. This also lets you punt on custom structs for the Ok/Err cases until you have extra values there - the compiler will enable safe refactoring later. This lets you avoid adding too much complexity; e.g., the execute_rsync(&mut command)
on master which is currently in your PR:
match execute_rsync(&mut command) {
Err(reason) => Err(PushErr {
duration: start_time.elapsed(),
message: reason,
}),
Ok(_) => Ok(PushOk {
duration: start_time.elapsed()
}),
}
could be just
(start_time.elapsed(), execute_rsync(&mut command))
Similarly, code like
let remote_command_duration = match remote_command_result {
Err(err) => err.duration,
Ok(ok) => ok.duration
};
``` goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems fine, but per #256 (comment) I see a lot of types being added with an unclear benefit - would love to this feature implemented with fewer lines of code :)
@artem-zinnatullin, rebase? Also expecting a lot of issues with this mode in practice 🤔 |
Rebased, all issues that we will encounter should be solvable |
@@ -0,0 +1 @@ | |||
type-complexity-threshold = 450 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error: very complex type used. Consider factoring parts into `type` definitions
--> src/sync.rs:93:47
|
93 | let (pull_finished_tx, pull_finished_rx): (Sender<Result<PullOk, PullErr>>, Receiver<Result<PullOk, PullErr>>) = unbounded();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `-D clippy::type-complexity` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity
error: very complex type used. Consider factoring parts into `type` definitions
--> src/sync.rs:110:47
|
110 | let (pull_finished_tx, pull_finished_rx): (Sender<Result<PullOk, PullErr>>, Receiver<Result<PullOk, PullErr>>) = unbounded();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity
error: aborting due to 2 previous errors
😭
Closes #188.
Finally, here is the implementation of new pull mode:
parallel
In this mode Mainframer pulls files from remote machine in parallel to the remote command execution.
This mode dramatically improves total execution time.
With
serial
pull mode (1.x, 2.x behavior) user had to wait for all files to be pulled after the remote command finished.Now with
parallel
pull mode, most of the files will be already pulled by the time remote command finishes and after it finishes last incremental pull will be done (usually ~1 second).Results from
tests/test_pulls_big_files.sh
(creates 30 64MB files):1.6x faster overall execution time!
vs