Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

Open
mhansen opened this issue Nov 18, 2024 · 2 comments

Comments

@mhansen
Copy link
Contributor

mhansen commented Nov 18, 2024

Hi, I was recently optimising some hashmap heavy code. We got a big speedup from moving from HashMap<Object, Double> to Object2DoubleOpenHashMap, so first let me say thank you for fastutil, it's great.

We noticed an opportunity in the profiling afterwards though. I expected mergeDouble to have an optimisation to hash the input once, then find the position of the data, then replace the data in that position. Like HashMap.compute.

But in the profile I see that mergeDouble is implemented only in the interface as Object2DoubleMap.mergeDouble, which doesn't know about hashing, and is implemented in terms of calling double getDouble(Object) then put(Object, double) This ends up hashing the Object twice, and finding the hashmap position twice (including calling .equals to see if it's the right position).

It's not a big deal or anything; we're more than happy with the performance improvements delivered. But I thought I'd report it in case it's an idea you like. Here's a flamegraph:

image

@vigna
Copy link
Owner

vigna commented Nov 18, 2024

Mmmmh. There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace? In any case, that wouldn't solve your problem, in the sense that fastutil internal functions find and insert do not carry the hash. You must understand fastutil was designed more than 20 years ago, when compound operations such as merge and compute were not in the interface. It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once. It's a good suggestion, I just don't know when I'll find the bandwidth to implement it (but your are free to send PRs 😂; just joking, I know it's complex code).

@mhansen
Copy link
Contributor Author

mhansen commented Nov 18, 2024

Thank you for this background!

I'll go off what I see in the javadoc as it's a little hard to link to the generated code.

There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace?

I see there's an implementation for Object2DoubleOpenHashMap.merge(K k, double v, BiFunction<? super Double,? super Double,? extends Double> remappingFunction): https://javadoc.io/static/it.unimi.dsi/fastutil/8.5.15/it/unimi/dsi/fastutil/objects/Object2DoubleOpenHashMap.html#merge(K,double,java.util.function.BiFunction)

That's the boxing function that takes Double.

But the primitive mergeDouble(K key, double value, DoubleBinaryOperator remappingFunction) function is listed under "Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2DoubleMap". So there isn't a specialization/override of mergeDouble for Object2DoubleOpenHashMap.

It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once.

I'm OK with this way to do it; but I was thinking that we could keep the current definition of find what returns an int, and use that returned position to update in-place. insert looks like it takes a position too?

I have no expectation that you get around to it, soon or ever. I might be able to suggest a PR if we can figure out that it's feasible.

mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024
mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024
mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024
mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024
mhansen added a commit to mhansen/fastutil that referenced this issue Nov 20, 2024
Previously, we were hashing the key 3x:
- get
- containsKey
- put

Now, we override mergeDouble with a hashamp-specific implementation that
hashes the key once.

Towards vigna#336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants