-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add missing functions and docs for LazyFrame #49
base: main
Are you sure you want to change the base?
Conversation
Thanks, but just to be clear, don't expect to see this huge change merged immediately as it is very difficult to review. |
R/lazyframe-frame.R
Outdated
#' | ||
#' df$clear(n = 5) | ||
lazyframe__clear <- function(n = 0) { | ||
pl$DataFrame(schema = self$schema)$clear(n)$lazy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs a small adjustment since we don't have schema
but .schema_overrides
in pl$DataFrame()
but that's the idea, yes:
(Note that this doesn't work because $clear()
is not implemented for DataFrame yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/rust/src/lazyframe/serde.rs
Outdated
#[savvy] | ||
fn deserialize_lf(json: &str) -> Result<PlRLazyFrame> { | ||
let lp = serde_json::from_str::<DslPlan>(json).map_err(|_| { | ||
let msg = "could not deserialize input into a LazyFrame"; | ||
RPolarsErr::Other(msg.to_string()) | ||
})?; | ||
let out = LazyFrame::from(lp); | ||
Ok(<PlRLazyFrame>::from(out)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a clone of Python Polars' ones, is it?
The function should be the same as Python Polars.
Of course, it must be an associated function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think I had copied this from r-polars actually, so it should be modified/removed.
Deserialization is a bit tricky in terms of API and I think that's why we have deviated with this deserialize_lf()
in r-polars. In py-polars they have:
pl.DataFrame.deserialize(...)
pl.LazyFrame.deserialize(...)
and I'm not sure neo-r-polars support something like this since pl$DataFrame
just returns the body of the function. I wouldn't mind removing this serde file for now, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pl$DataFrame
is a function, something like pl$DataFrame$foo
cannot be realized in R.
However, this is a different thing: whether the function on the R side is something like deserialize_lf
, an associated function can be used on the Rust side (extendr binds the associated functiona to the R side in a strange way, but with savvy, it is not a problem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand and I think this might be tricky to implement. I have removed this function for now: it's not essential and I don't want it to block the rest of the PR. I added a todo and we can address this later in a more focused PR.
@eitsupi just checking in to know if you were able to review parts of this PR already. I know this one is big, I could maybe split it in 2 or 3 if it helps. I'll also try to fix the wasm build errors. Do you know if there's a way to mock this build locally or if it can only be done in CI? |
Sorry for the late review. I have not read through most of the individual methods.
This would be relatively easy to run locally as it builds on Docker, but I have not tried it. |
I'm not convinced about this for several reasons. First, it increases the number of arguments to browse in the docs, and maybe more statistics will be added in the future so it would be annoying to have 5-6 arguments only about statistics. Second, we use lists in other params as well (e.g. Other packages like Regarding the rust side, I'm also not a fan of the current |
@eitsupi Thoughts on the previous comment? |
Sorry, I forgot to comment. Your opinion makes sense, and I think it's a good idea to pass the elements to Rust that decompose the list on the R side. |
I've added a We can discuss this further but if so I'd rather put a TODO on this and address this in a subsequent PR so we can focus on the other methods in this PR. Once this one is merged I can move on with the tests and the DataFrame methods (which I will do in smaller pieces to avoid massive PRs like this one). |
Same changes as in #31, just without the tests