Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we change MultiVector(layout, arr) to no longer copy arr? #282

Open
eric-wieser opened this issue Mar 12, 2020 · 9 comments
Open

Can we change MultiVector(layout, arr) to no longer copy arr? #282

eric-wieser opened this issue Mar 12, 2020 · 9 comments
Labels

Comments

@eric-wieser
Copy link
Member

eric-wieser commented Mar 12, 2020

Almost all of our operations create an extra copy (see #280). The fix proposed there is to opt-in for copy-free construction (spelt layout.MultiVector(value, copy=False)). However, it seems that every single construction within clifford would want this optin, which would make it seem better as a default.


If we made not copying the default, there are two types of code that would start doing the wrong thing:

  • Code which modifies arrays to create multiple multivectors

    def mv_range(n):
        mvs = []
        buffer = np.zeros(32)
        for i in range(n):
            buffer[0] = i
            mvs.append(layout.MultiVector(buffer))
        return mvs  # if the above does not copy, these end up all the same!
  • Code which modifies multivectors sharing the same array

    def mv_range(n):
        mvs = []
        buffer = np.zeros(32)
        for i in range(n):
            mv = layout.MultiVector(buffer)
            mv[()] = i
            mvs.append(mv)
        return mvs  # if the above does not copy, these end up all the same!

Both of these are easy to fix by using buffer.copy() instead of buffer, but:

  • Users upgrading may not know they need to do this
  • It's still possible to write this type of code by accident.

Does the benefit of not having copy=False sprinked throughout tools justify the cost of the potential breakage here?

@eric-wieser eric-wieser pinned this issue Mar 12, 2020
@arsenovic
Copy link
Member

what are the performance benefits?

@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 12, 2020

20% less time spent on multiplication, 30% less time on addition. Possibly a little less, since the benchmark in that PR folds in another change too.

But remember, the question isn't whether we want these benefits- it's whether we want user (or clifford.tools) code to automatically get the same benefit, or if we want to force that code to opt in by making this change everywhere:

-layout.MultiVector(my_optimized_function(...))
+layout.MultiVector(my_optimized_function(...), copy=False)

@arsenovic
Copy link
Member

right. but the motivation behind this change is performance, correct?

@eric-wieser
Copy link
Member Author

eric-wieser commented Mar 12, 2020

That's one way of looking at it, We can get the performance boost either by:

  1. Adding copy=False on 75 lines of our code, and telling users to do the same to their code. If they don't listen, all that happens is that user code is slower.
  2. Changing the behavior to not copy by default, and warning users via our docs that the behavior has changed. If they don't listen, then their code might not do the right thing any more.

My hunch is that the number of users using layout.MultiVector(some_array) is approximately zero anyway, so the only point that matters is that choosing 1 over 2 saves us some code.

@arsenovic
Copy link
Member

right.
users who are constructing MV's manually like this probably care about performance, whould you agree? in this case i think breaking things is ok, and putting a warning in docs is good.

@hugohadfield
Copy link
Member

Yeah I agree, by far the main reason to be manually constructing multivectors in this way is for performance and in that case making the copy = false default makes sense

@hugohadfield
Copy link
Member

hugohadfield commented Mar 12, 2020

Here are some reasons why I (and hence maybe users) build multivectors manually:

  • io with other GA software/files
  • to apply a matrix to the multivector array directly
  • to apply jitted functions to the multivectors
  • to interface with non GA libraries such as scipy.optimize

I don't know if any of those scenarios are likely to cause trouble

@hugohadfield
Copy link
Member

Having thought about this, I think the copy=True default makes more sense. I think it is important to remember our position in the GA software ecosystem and goals of the project. In most of the places I have seen Clifford in use our users are students (eg. in Brno) or mathematicians/applied scientists who are getting started with GA and want everything to just work and want to play around with a few algebras etc, not write super performant code (at first anyway). If we change it to not copy by default we introduce a higher probability of writing bugs, especially for inexperienced users.

Of course we would in general love to have fast code. Users who want higher performance for some bits of their code can change to using copy=False for those sections. In doing this they make a conscious decision and, hopefully, will be aware of the safety tradeoff inherent in their choice.

So overall on reflection I think we should just add the sprinklings of copy=False into the tools, its not a big pain for us at all, but a new user getting behaviour they don't expect is definitely something that would sit badly with them.

@arsenovic arsenovic unpinned this issue Jan 4, 2021
@donhatch
Copy link
Contributor

Another possibility to consider: remove the default; that is, force all callers to make a conscious decision by saying copy=True or copy=False.

Benefits:

  • there are no "if users don't listen" bugs or performance implications: it's impossible to not listen since code that omits it simply won't compile
  • there are no "if users forget the semantics" bugs or performance implications: it's almost impossible to forget since it's always explicit at the call site.
  • less for users to have to keep in mind and keep track of (personally, my brain will never retain information about an arbitrary default like this, so as long as there's a default, I'll have to keep looking it up an unlimited number of times into the future, which is not my idea of fun, and also I'll neglect to do it and get it wrong some percentage of the time, leading back to the "if users forget the semantics" bugs and performance problems)
  • (perhaps best of all) the fact that users (both external and within the library) will have to make a pass over their code to add the explicit directive will likely uncover bugs and performance problems that will then be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants