-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving efficiency in Crop Definitions #9372
Improving efficiency in Crop Definitions #9372
Conversation
In the PR on this (#9052) we have the following suggestions:
This set of combinations in this new R code took 6 minutes 58 seconds to run with the new (it timed out previously)
I've tried to run it three different times on the old code. On this third time it's been running for over an hour so I'm quite satisfied this has helped somewhat. |
@lilyclements here is what I tried, on your new version - with the Zambia data - I made it 312 years. I got this error. I have not yet checked the old version with these data - I'll do that now. |
@rdstern thanks for this! The issue is because I've put in "years" as a hardcoded variable! Oops. You have "s_year" so it can't find this "year" since it's not called "year". I've now fixed this so hopefully that error should be fixed :) |
@lilyclements ok, very well done, it now works. I tried - for comparison - with the same values as for 0.8.1 above. Now it works. Great. But: b) And just a teensie inconvenience in the summary data frame! First I really like that you have the with and without in the same summary sheet. That's a great improvement. |
@rdstern thanks for this - the problems are now hopefully fixed and so it is ready for you to look at again |
@lilyclements it still works, but I have a few buts: The results are not the same! With the 3 conditions I think you always get zero and that doesn't seem quite right either! |
a) I've put it into the VB code now b) It was a different column order in the "crops_def" to "crop_props" data frames before, so I've put it to be the same column order in them both now (station - day - length - rain, but this can be very easily changed) c) Agreed on the station point. It used to be station then year then planting day, length, rain. (So you would have all your Saltpond - 1944's, then all your Saltpond 1945's, etc) This feels less intuitive to me, but, I haven't seen this dialog in use so you have a much better understanding! d) You're absolutely right on the issue that it is no longer giving the same results as the previous version. That is something I accidentally added earlier today when fixing something else. I've now sorted that, so this should now be giving the same results as the previous version. @rdstern this is now ready for review again |
@lilyclements this is great. I have spent a long time however, looking into the results. They are often the same and they are similar when they are different. I wonder why? I suspect any year when either is missing (or even when the seasonal total is missing) should simply be omitted from the calculations - indeed maybe you do that now? I'll check tomorrow what that does to the calculations. |
@lilyclements it seems to work (almost) fine now! And it is faster. It took 23 seconds for 45 conditions on 5 stations compared to over 50 seconds using version 0.8.1. Lundazi is interesting. It is always slightly different, and is the worst of the stations in terms of current missing data: There are two other points perhaps for later: a) When the start and end are finished we should assess how to use them in this dialog. What should we do if there is no start? We could think of that as a failed year, so include that year, and it always fails. Or we could report that issue separately and omit those years from these combined risks. I assume that's what we do now. |
@rdstern thanks for outlining where you are seeing these differences - this difference in the values should now be fixed! The issue was occurring when the rainfall in the entire year for that station was missing. My mistake, and fixed it now! This is ready for re-review. I agree it would be very good to have a discussion to point (a). To point (b), I'm not sure how fixable it is in it's current state. Perhaps someone who knows the VB code better would be able to say if this is just a very easy fix, or if we do need to have two data frames as you suggest. A side note, if you were interested: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lilyclements that looks great now. I am approving.
@N-thony please can you check and merge if ok.
@N-thony when you are checking, there is one small item that maybe for you to change. Here is the output window after running the dialog:
Now I like the fact there is a brief report in the output window, when the main results are the 2 new data frames. But there are now also 2 lines at the bottom about undo, and I wonder if those are needed? They seem to me to be a distraction. If they are needed, then I have no problem with them coming in the log window, but do they need also to be here?
Thanks
@rdstern can you confirm this? |
Linked to PR #9052
After discussions with EPICSA on this, we wanted to be able to have this code (a) run quicker and (b) have the option to only produce probability (risk) tables. This is because we might have an instance where the user looks for a large number of combinations and at the moment this does not work in R-Instat. This was reported first by @Vitalis95 here
This code improves the efficiency:
return_crop_definitions
and is defaultTRUE
since that is how it works in R-Instat at the moment.Follow up to this is to introduce a checkbox for "Return Crop Definitions" in the dialog in R-Instat. @rdstern does this sound like a good suggestion to you? I am hoping this will help us have the option to just return those probability tables, and then exporting to EPICSA will be much quicker.
Run times for 3213 combinations was 5 minutes 37 seconds
Run times for this is now 59 seconds.
I'm currently trying for 39442 combinations. This before would time out so I am not sure how long this used to take, but it took at least an hour. I will update with the new time when I have it.