Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Introduce TypeKey instead of String keys for named types #590

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

frankdavid
Copy link
Contributor

@frankdavid frankdavid commented Dec 18, 2024

Overview
We can achieve serious performance improvements by introducing a new TypeKey struct instead of String for candid types represented by Type::Var. When deserializing a candid message, we need to store information about these types (e.g. in the TypeEnv struct). Previously, we used BTreeMap with String keys. This is slow and a significant amount of time is spent looking up types by name.

Optimization 1: distinguish indexed (integer) keys from String keys
When decoding type information from the message header in binary_parser.rs, the types are indexed. Previously these indexed types were assigned a String key by calling format!("table{index}"). Working with integer keys is much faster.

Optimization 2: use HashMap instead of BTreeMap
Looking up values in a HashMap is much faster (especially if the keys are small).

Optimization 3: precompute hash of String-based TypeKeys

Breaking changes
This is a breaking change:

  • In Type::Var(var) var now has type TypeKey instead of String. Calling var.as_str() returns &str and var.to_string() returns a String. The string representation of indexed variables remains table{index} to maintain compatibility with previous versions.
  • TypeEnv now contains a HashMap instead of BTreeMap. Code that relied on the iteration order of the map (e.g. env.0.iter()) should make use of the newly added TypeEnv::to_sorted_iter() method which returns types sorted by their keys.

@frankdavid frankdavid requested a review from a team as a code owner December 18, 2024 14:04
Copy link

github-actions bot commented Dec 18, 2024

Name Max Mem (Kb) Encode Decode
blob 4_224 20_459_217 ($\textcolor{green}{-0.00\%}$) 12_081_559 ($\textcolor{green}{-0.02\%}$)
btreemap 75_456 4_022_698_926 ($\textcolor{green}{-0.00\%}$) 15_635_820_900 ($\textcolor{green}{-0.22\%}$)
nns 192 ($\textcolor{red}{50.00\%}$) 1_826_780 ($\textcolor{green}{-19.08\%}$) 3_111_783 ($\textcolor{green}{-43.33\%}$)
nns_list_proposal 1_088 6_716_864 ($\textcolor{green}{-4.76\%}$) 60_218_792 ($\textcolor{green}{-26.47\%}$)
option_list 128 7_140_253 ($\textcolor{green}{-0.01\%}$) 24_538_182 ($\textcolor{green}{-4.70\%}$)
text 6_336 20_455_518 ($\textcolor{red}{0.00\%}$) 17_839_609 ($\textcolor{red}{0.00\%}$)
variant_list 128 7_141_877 ($\textcolor{green}{-0.00\%}$) 23_393_906 ($\textcolor{green}{-4.31\%}$)
vec_int16 16_704 168_582_716 ($\textcolor{red}{0.00\%}$) 1_006_655_081 ($\textcolor{green}{-6.25\%}$)
  • Parser cost: 18_056_769 ($\textcolor{green}{-3.89\%}$)
  • Extra args: 2_347_729 ($\textcolor{green}{-27.54\%}$)
Click to see raw report

---------------------------------------------------

Benchmark: blob
  total:
    instructions: 32.54 M (-0.01%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 20.46 M (-0.00%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 12.08 M (-0.02%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 38.30 M (0.00%) (change within noise threshold)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 20.46 M (0.00%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 17.84 M (0.00%) (change within noise threshold)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 1.18 B (improved by 5.40%)
    heap_increase: 261 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 168.58 M (0.00%) (change within noise threshold)
    heap_increase: 261 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 1.01 B (improved by 6.25%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 19.66 B (-0.18%) (change within noise threshold)
    heap_increase: 1179 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 4.02 B (-0.00%) (change within noise threshold)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 15.64 B (-0.22%) (change within noise threshold)
    heap_increase: 1020 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 31.68 M (improved by 3.68%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 7.14 M (-0.01%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 24.54 M (improved by 4.70%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 30.54 M (improved by 3.33%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 7.14 M (-0.00%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 23.39 M (improved by 4.31%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 23.82 M (improved by 12.99%)
    heap_increase: 3 pages (regressed by 50.00%)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    instructions: 18.06 M (improved by 3.89%)
    heap_increase: 3 pages (regressed by 50.00%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 1.83 M (improved by 19.08%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 3.11 M (improved by 43.33%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 66.94 M (improved by 24.75%)
    heap_increase: 17 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    instructions: 6.72 M (improved by 4.76%)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    instructions: 60.22 M (improved by 26.47%)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.35 M (improved by 27.54%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------
Successfully persisted results to canbench_results.yml

lwshang
lwshang previously approved these changes Jan 6, 2025
@crusso
Copy link
Contributor

crusso commented Jan 16, 2025

This looks ok to me. However, I do recall @chenyan-dfinity being averse to introducing breaking changes to the API but I'm not sure how serious a problem this is.

Another approach might have been to introduce a second, indexed Var constructor (IndexedVar), used by the binary decoder to index into an array, since the binary decoder should actually know the size of the type table. This would be even more efficient than using a hash table, I expect.

@frankdavid
Copy link
Contributor Author

The TypeEnv data structure is indexed by the TypeKey (named or indexed), so just a new TypeInner variant would have not sufficed.

I did benchmark using a combination of a HashMap (for named types) + a Vec (for indexed types) to implement TypeEnv/TypeMap but it resulted in negligible performance gains and made the code quite a bit more complex so I decided against it.

@frankdavid frankdavid requested a review from a team as a code owner January 17, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants