Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with deepMap #3266

Open
wants to merge 14 commits into
base: develop
Choose a base branch
from
Open

Performance issues with deepMap #3266

wants to merge 14 commits into from

Conversation

dvd101x
Copy link
Collaborator

@dvd101x dvd101x commented Sep 16, 2024

Hi Jos,

This PR addresses issue #3262 and introduces improvements to our recursion functions which increases performance in general:

Key changes:

  • collection.deepMap and collection.deepForEach: are optimized to run the callback with only one argument. Thus avoiding the use of matrix.map.

Other changes:

  • Fixed collection.deepMap: Uses skipZeros argument
  • Performance Boost: Recursion functions now use the correct number of arguments, clone the index only when necessary, and separate map and forEach recursions for improved speed.

Areas for Future Consideration:

  • Merging Recursion Functions: Explore merging the recursion functions in collection and array for potential code consolidation and simplification.
  • Argument Handling: Consider a more robust approach to determining the correct number of arguments for regular functions. (not func.length)
  • skipZeros for BigNumbers: In this PR it only works for regular numbers.
  • Array recursion: The recurse functions in array are set up in a way that they can also work for matrices. Thus the duplicated first and second argument for arrays (array, array, callback) but for matrices is (matrix.valueOf(), matrix, callback).

While I previously suggested discussing these changes before submitting a PR, I decided to proceed directly without breaking changes and to facilitate further discussion based on the concrete implementation.

Please review the changes and provide feedback. I'm open to discussing potential modifications or alternative approaches.

Copy link

@john9384 john9384 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@josdejong
Copy link
Owner

This looks really promising @dvd101x ! Some thoughts:

  1. I expect that the most important performance difference is between one callback argument and more than one. Maybe we can handle the cases of 2 and 3 arguments the same way to reduce the amount of code?
  2. In #3251, @Galm007 has created a single _forEach method used by both map and forEach methods to deduplicate the logic. It may be worth trying to let deepMap use deepForEach under the hood to deduplicate logic of this PR too. What do you think?
  3. Can you use index.slice() instead of [...index]? I tried it out and it is a bit faster.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Sep 18, 2024

Thanks Jos!

  1. Will test performance and report back.
  2. I did some tests previously and it affects speed whether we use map or forEach, using forEach and pushing into an array was slower than map. I will check Speed up the map() and forEach() functions in DenseMatrix.js by over 300% #3251 and try to come up with something.
  3. OK

@dvd101x
Copy link
Collaborator Author

dvd101x commented Sep 20, 2024

While working on tests I found something very interesting. If a typed function is wrapped in a regular function it seems that the map and forEach function gains significant speed.

abs(array)                                 x 585,307 ops/sec ±1.17% (97 runs sampled)
...
map(array, abs)                            x 131,558 ops/sec ±0.37% (98 runs sampled)
map(array, (x)=>abs(abs))                  x 515,884 ops/sec ±0.54% (97 runs sampled)
map(array, abs.signatures.number)          x 580,084 ops/sec ±1.20% (96 runs sampled)

It might be due to the try catch that happens on each step (only for typed functions) and if it is possible to run it at a higher level.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Sep 21, 2024

In the latest version some big improvements were made regarding speed, even in a few cases surpassing abs(array)

By running benchmarks on a MacBook Air M1 with 8 GB the following results were found comparing develop vs this.

deepMapFix

Note: When a particular signature is mapped, map and abs(matrix) show similar results, probably eliminating the issue with deepMap.

deepForEachFix

Note: There are more benefits in forEach due to the current implementation using mostly the same map functions without returning their value and this implementation supplying separate recurse functions.

The next step is to try to reduce the lines of code while trying to maintain performance as previously commented.

Also I will try merging the array.deepMap and collection.deepMap and merging array.deepForEach and collection.deepForEach with the expectation of closing the gap between mapping a matrix and an array.

@josdejong
Copy link
Owner

😎 thanks for the updates

@josdejong
Copy link
Owner

@dvd101x I've just merged #3251, which improves DenseMatrix.map and DenseMatrix.forEach. There is a merge conflict, and I think we should figure out how to dedupe any deepmap and deepForeach logic, since that overlaps with DenseMatrix.map and DenseMatrix.forEach. Maybe we should move (part of?) the logic of DenseMatrix._forEach into deepMap or deepForEach? I'm not sure what will be handiest.

For reference: I compared the current develop against the current deepMap-2 (without the latest changes from develop)

# develop

abs(genericMatrix)                         x 344,632 ops/sec ±0.50% (96 runs sampled)
abs(array)                                 x 630,263 ops/sec ±0.90% (96 runs sampled)
abs(numberMatrix)                          x 320,922 ops/sec ±1.79% (87 runs sampled)
genericMatrix.map(abs)                     x 152,504 ops/sec ±0.68% (97 runs sampled)
numberMatrix.map(abs)                      x 152,673 ops/sec ±0.82% (96 runs sampled)
map(genericMatrix, abs)                    x 150,779 ops/sec ±1.11% (97 runs sampled)
map(numberMatrix, abs)                     x 149,533 ops/sec ±1.25% (90 runs sampled)
map(array, abs)                            x 50,960 ops/sec ±0.92% (96 runs sampled)
map(array, abs.signatures.number)          x 74,270 ops/sec ±0.95% (95 runs sampled)
genericMatrix.map(abs.signatures.number)   x 417,552 ops/sec ±1.12% (92 runs sampled)
numberMatrix.map(abs.signatures.number)    x 417,524 ops/sec ±1.26% (93 runs sampled)

# deepMap-2 (without the latest develop)

abs(genericMatrix)                         x 244,629 ops/sec ±1.79% (93 runs sampled)
abs(array)                                 x 829,277 ops/sec ±0.55% (95 runs sampled)
abs(numberMatrix)                          x 249,694 ops/sec ±0.41% (96 runs sampled)
genericMatrix.map(abs)                     x 228,617 ops/sec ±0.48% (98 runs sampled)
numberMatrix.map(abs)                      x 169,613 ops/sec ±0.53% (98 runs sampled)
map(genericMatrix, abs)                    x 227,495 ops/sec ±0.38% (98 runs sampled)
map(numberMatrix, abs)                     x 168,517 ops/sec ±0.58% (97 runs sampled)
map(array, abs)                            x 292,857 ops/sec ±0.51% (94 runs sampled)
map(array, abs.signatures.number)          x 825,040 ops/sec ±0.49% (97 runs sampled)
genericMatrix.map(abs.signatures.number)   x 253,256 ops/sec ±0.64% (97 runs sampled)
numberMatrix.map(abs.signatures.number)    x 176,966 ops/sec ±1.32% (94 runs sampled)
map(array, abs + idx)                      x 412,065 ops/sec ±0.87% (94 runs sampled)
genericMatrix.map(abs + idx)               x 189,221 ops/sec ±0.57% (96 runs sampled)
numberMatrix.map(abs + idx)                x 147,322 ops/sec ±0.55% (97 runs sampled)
map(array, abs + idx + arr)                x 397,918 ops/sec ±0.58% (98 runs sampled)
genericMatrix.map(abs + idx + matrix)      x 130,700 ops/sec ±0.53% (92 runs sampled)
numberMatrix.map(abs + idx + matrix)       x 110,057 ops/sec ±0.73% (96 runs sampled)

@dvd101x
Copy link
Collaborator Author

dvd101x commented Sep 27, 2024

Hi Jos,

Thanks for the update

In this PR the intention was to see how much performance could be gained while maintaining recursion, as I understand the main improvement of #3251 is this new algorithm that eliminates recursion. I wasn't sure at the beginning but I'm glad that even with all the attempts to fix the issues with deepMap the algorithm in #3251 is still faster.

I agree that we should merge some of the logic in a way that arrays can also benefit from this new algorithm (if possible).

I did test using forEach as the main algorithm for map but the difference seems significant, I think if we test both approaches the benefit in splitting the logic could be shown also in develop.

I'm still reviewing the new algorithm and trying to understand how does it manages the differences when a callback has one, two or three arguments and it's effect in performance.

I'm also not sure what is the best way to proceed. I can try something for the following weeks.

@josdejong
Copy link
Owner

Maybe it is easiest to start in a new PR and split the task in two steps:

  1. Unify logic: extract DenseMatrix._forEach into the deepForEach utility function and use that from DenseMatrix.map, DenseMatrix.forEach, deepMap.
  2. Optimize callback: add the optimizations again to not provide the callback index when not needed (mostly copying the logic from this PR).

@dvd101x
Copy link
Collaborator Author

dvd101x commented Oct 1, 2024

Maybe you are right. I found another bottleneck in this code and it made the recursion functions faster, but they were still slower than develop.

I will attempt first unify logic into a utility, maybe the option of skip zeros will have to be removed or implemented with a special callback.

The next step is that the unification eventually is lost, when the dense matrix is stored as a flat array, I assume it might be faster even with the index conversions.

I think there is another area of opportunity for the optimize callback utility, including an option to also simplify when the callback has only one signature with the right amount of arguments.

Please let me give it a try and eventually close this PR.

@josdejong
Copy link
Owner

Sounds good, thanks!

@dvd101x
Copy link
Collaborator Author

dvd101x commented Nov 7, 2024

Hi Jos,

Just to let you know, I did have some progress with this, but couldn't make it work for all tests.

I was thinking of some options to move forwards:

  • Leaving the matrix functions as such and make some faster recursion on the array functions and deepMap and deepForEach in collections.
  • Updating the current branch I have (including the failing tests) and request for help.
  • Keep working on it and maybe have the solution by the end of this year (2024).

Ideally I don't think it's needed for all to use the same logic, as I think the main goal is for matrices to use flat arrays internally (as tensorflowjs does).

@josdejong
Copy link
Owner

Thanks for the update David! Good to hear you're planning to work on this next month.

@dvd101x
Copy link
Collaborator Author

dvd101x commented Jan 19, 2025

Hi Jos,

This branch maintains the current mapping algorithm for matrices but for arrays it implements the fixes in recursion functions and addresses the issues in deep map.

Here are the results.
map

It is currently passing tests, but I would like to add more benchmarks for map and forEach address a reduction of logic following your comment regarding

Maybe we can handle the cases of 2 and 3 arguments the same way to reduce the amount of code?

Previously we discussed to unify the logic for DenseMatrix._forEach which I might be able to resolve in the future, but I would like to include this proposal as such I case it's desirable.

Unify logic: extract DenseMatrix._forEach into the deepForEach utility function and use that from DenseMatrix.map, DenseMatrix.forEach, deepMap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants