Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

mhansen · 2024-11-18T02:15:36Z

Hi, I was recently optimising some hashmap heavy code. We got a big speedup from moving from HashMap<Object, Double> to Object2DoubleOpenHashMap, so first let me say thank you for fastutil, it's great.

We noticed an opportunity in the profiling afterwards though. I expected mergeDouble to have an optimisation to hash the input once, then find the position of the data, then replace the data in that position. Like HashMap.compute.

But in the profile I see that mergeDouble is implemented only in the interface as Object2DoubleMap.mergeDouble, which doesn't know about hashing, and is implemented in terms of calling double getDouble(Object) then put(Object, double) This ends up hashing the Object twice, and finding the hashmap position twice (including calling .equals to see if it's the right position).

It's not a big deal or anything; we're more than happy with the performance improvements delivered. But I thought I'd report it in case it's an idea you like. Here's a flamegraph:

The text was updated successfully, but these errors were encountered:

vigna · 2024-11-18T08:08:00Z

Mmmmh. There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace? In any case, that wouldn't solve your problem, in the sense that fastutil internal functions find and insert do not carry the hash. You must understand fastutil was designed more than 20 years ago, when compound operations such as merge and compute were not in the interface. It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once. It's a good suggestion, I just don't know when I'll find the bandwidth to implement it (but your are free to send PRs 😂; just joking, I know it's complex code).

mhansen · 2024-11-18T09:26:14Z

Thank you for this background!

I'll go off what I see in the javadoc as it's a little hard to link to the generated code.

There is a merge implementation for hash maps, so I'm wondering why it isn't called. Don't you see it in a stack trace?

I see there's an implementation for Object2DoubleOpenHashMap.merge(K k, double v, BiFunction<? super Double,? super Double,? extends Double> remappingFunction): https://javadoc.io/static/it.unimi.dsi/fastutil/8.5.15/it/unimi/dsi/fastutil/objects/Object2DoubleOpenHashMap.html#merge(K,double,java.util.function.BiFunction)

That's the boxing function that takes Double.

But the primitive mergeDouble(K key, double value, DoubleBinaryOperator remappingFunction) function is listed under "Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2DoubleMap". So there isn't a specialization/override of mergeDouble for Object2DoubleOpenHashMap.

It is right that in the present situation it would be better to have find and insert accepting a hash, so that compound operations compute it just once.

I'm OK with this way to do it; but I was thinking that we could keep the current definition of find what returns an int, and use that returned position to update in-place. insert looks like it takes a position too?

I have no expectation that you get around to it, soon or ever. I might be able to suggest a PR if we can figure out that it's feasible.

Towards vigna#336

Previously, we were hashing the key 3x: - get - containsKey - put Now, we override mergeDouble with a hashamp-specific implementation that hashes the key once. Towards vigna#336

mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024

OpenHashMap.mergeX: Avoid double-hashing key

2308679

Towards vigna#336

mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024

OpenHashMap.mergeX: Avoid double-hashing key

0659a61

Towards vigna#336

mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024

OpenHashMap.mergeX: Avoid double-hashing key

a61950b

Towards vigna#336

mhansen added a commit to mhansen/fastutil that referenced this issue Nov 18, 2024

OpenHashMap.mergeX: Avoid double-hashing key

02c5414

Towards vigna#336

mhansen mentioned this issue Nov 19, 2024

OpenHashMaps.mergePRIMITIVE: Avoid double-hashing key #337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

mhansen commented Nov 18, 2024

vigna commented Nov 18, 2024

mhansen commented Nov 18, 2024

Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

Feature Request: Implement Object2DoubleOpenHashMap.mergeDouble to avoid double-hashing #336

Comments

mhansen commented Nov 18, 2024

vigna commented Nov 18, 2024

mhansen commented Nov 18, 2024