-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are active bindings and methods redefined in wrap()
?
#3
Comments
Regarding the |
I understand the concern and have noted it in the README. Lines 216 to 222 in 130ea55
This is because it is different from the way Python (and R6 etc.) stores methods, causing R Polars to not be able to copy the behavior of Python Polars. Without such a mechanism, it is not possible to implement features such as: |
It certainly seems possible, but copying the environment doesn't seem easy and I don't know how much of a performance advantage it would be. If I understand correctly, what I am doing here is the same thing that R6 and savvy are doing, and if there are performance issues, they may not adopt these approaches, I think. |
One place where this approach can be useful is where the following Datatype properties are registered in the bindings. neo-r-polars/R/datatypes-classes.R Lines 22 to 41 in 130ea55
neo-r-polars/src/rust/src/datatypes.rs Lines 91 to 170 in 130ea55
|
I don't know about the perf but it would probably be possible (and cleaner) with |
If it improves performance, then copying env seems like a good way to go, but I am not sure if it will work properly because of the processing we are doing to update neo-r-polars/R/series-series.R Lines 55 to 59 in ddb0a2b
If overwriting Lines 1026 to 1030 in ddb0a2b
Line 1106 in ddb0a2b
|
Sorry, I won't have enough time in the next weeks to think about this in detail. Let's keep going like this for now but I'd like to revisit this before we make the switch in |
Yes, of course it would have to be after implementing a large number of methods to make a clear difference in performance. |
If you want to avoid copying, extendr's way might be more preferable. It prepares an environment and extract the method from it. There's no strong reason I didn't follow this way in savvy. I just felt directly using environment is good in tab-completion. (If I would implement, I would store the operation on R's side and pass it to Rust's side lazily, though.) |
Thank you. at <- arrow::as_arrow_table(mtcars)
at$AddColumn
#> function (i, new_field, value)
#> Table__AddColumn(self, i, new_field, value)
#> <environment: 0x55e07fc9dc10> at2 <- rlang::env_clone(at)
at2$AddColumn
#> function (i, new_field, value)
#> Table__AddColumn(self, i, new_field, value)
#> <environment: 0x55e07fc9dc10> at2
#> <environment: 0x55e07fdbac68> Created on 2024-09-06 with reprex v2.1.1 In any case, I don't think anyone is concerned about R6 performance in the arrow package, so I don't know if this is really a performance issue. |
A simple experiment: comparing the process with and without going through bench::mark(
without_wrap = {
neopolars:::PlRSeries$new_i32("", 1:10^5)$cast(neopolars::pl$Int64$`_dt`, TRUE)$cast(neopolars::pl$String$`_dt`, TRUE)
},
with_wrap = {
neopolars::as_polars_series(1:10^5)$cast(neopolars::pl$Int64, TRUE)$cast(neopolars::pl$String, TRUE)
},
check = FALSE,
min_iterations = 30
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 without_wrap 48.4ms 51.9ms 18.8 2.08MB 0
#> 2 with_wrap 49ms 51.9ms 18.6 741.02KB 0 Created on 2024-09-09 with reprex v2.1.1 |
Just a note: if we move to S7, S7 does not allow environment-based objects (i.e., those with reference semantics) (it seems that this may be allowed in the future RConsortium/S7#290, but it seems incompatible with the functional OOP philosophy), so we need to go back to custom method implementations for the One of the reasons I wanted to move away from custom |
There clearly seems to be a performance hit due to the way library(polars)
system.time({
for (i in 1:300) {
polars::pl$col("x")$cast(pl$String)$mean()$sum()$std()
}
})
#> user system elapsed
#> 0.353 0.017 0.369
library(neopolars)
#>
#> Attaching package: 'neopolars'
#> The following objects are masked from 'package:polars':
#>
#> as_polars_df, as_polars_lf, as_polars_series, is_polars_df,
#> is_polars_dtype, is_polars_lf, is_polars_series, pl
system.time({
for (i in 1:300) {
neopolars::pl$col("x")$cast(pl$String)$mean()$sum()$std()
}
})
#> user system elapsed
#> 2.724 0.002 2.727 (Note: this is with Of course this is not an actual usecase, but I think it's very possible to have this kind of scenario, for example chaining a lot of |
Thank you for profiling. |
It's my first exploration of
neo-polars
so far so I'm just putting some remarks here.I'm surprised by the block of
makeActiveBinding
and thelapply()
call below:neo-r-polars/R/dataframe-frame.R
Lines 67 to 89 in eaa014f
Correct me if I'm wrong, but
wrap()
is called in every single function, so this code basically stores the active bindings and the list of methods multiple times if we chain several functions together. This seems very wasteful and potentially quite a perf hit. The current way we do this in r-polars is to store functions in specific environments only once when we load the package. Why isn't this possible here? I suppose you thought about that and found some issues implementing it?(I don't mean this as a criticism, I know this is a massive undertaking, just wondering if you can explain so that I can maybe help with that)
The text was updated successfully, but these errors were encountered: