Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-Api support and validation speedup #218

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

Pfeil
Copy link
Member

@Pfeil Pfeil commented Aug 28, 2024

I replaced the guava cache with an async parallel (no real async in java) cache with higher performance. Some tests do not succeed yet, because the details field is missing in some exception bodies. Not sure why it happens, but it is probably simple to fix and enough to do some speedup experiments.

This PR will last until we have a stable and significant performance gain (at least down to 25% or something) and have integrated all low-hanging fruits.

This requires some refactorings in very old parts of the Typed PID Maker, where I want to get rid of a lot of code.

  • Whether we really achieve a speed gain here has yet to be tested.
    • We currently get a 2x to 3x improvements on my notebook on Wi-Fi. Which is quite bad considering I experimented with different thread pool sizes.
  • Fix tests
    • The issue is likely that the exceptions are wrapped into CompletionExceptions. Fix should be easy.
  • instead of waiting for the profile, start by validating each attribute's value independently. This should give some speedup and ensures we check additional properties in any case.
  • Instead of using the JSON schemas in dtr-test, evaluate if the type API (current deployment or its current version) can replace it for us.
  • If one type is a profile, also check for mandatory attributes,
    • and for "allowAdditionalAttributes" being set. not supported by EOSC DTR, so we currently do not support it per profile until a DTR with support for this comes up.
      • If no additionalAttributesAllowed value is available, assume it to be true in the schema. Typed PID Maker configuration should not prevent this. The configuration should only lift this restriction (allow even though not allowed), not restrict it further (forbit instead of allowing).
    • Allow to override allowAdditionalAttibutes to true by configuration file or API parameter. TODO double check it does only override to true, not to false!
  • Use type-api as well as the legacy schemas from dtr-test for validation. Make sure it can be more easily extended to further schema generators.
  • Check if the name of the validation strategy shall be changed, e.g. to "EmbeddedProfilesValidationStrategy"
  • remove implicit profile validation and only rely on attribute validation
  • do profile validation only explicitly, e.g. create?dryrun=true&profile=a&profile=b
  • ensure we have good validation tests
  • evaluate switching to https://github.com/networknt/json-schema-validator for json schema validation. Metastore uses it, and the current library is in maintenance mode (though, a fork exists).

  • For faster and statistically more reliable testing, add an option to disable the cache from application.properties and document it in the application.properties file.
  • Set up some proper benchmarking suite to have comparable results. Use hurl 5 for that. Proposal: We have some testing scripts for the docker container. Replace them with hurl and make a new benchmark test. Update the README with how to run the tests and what the benchmarks parameters are we'd like to use for continuous comparison.
  • Make some tests on a wired connection to figure out how we can achieve the best performance.
  • We can do some analysis, maybe with a flame graph (like here) or VisualVM.
  • After the Java 21 update (merged, we can start here!), we can try the new virtual thread executor
  • Lazy-Type-Loading: We do collect more information than we might need (all profile attributes, even though we may only need a few of them). We can improve here: Profiles mustn't know their subtypes, we just have to compare the PIDs and request them as required. This should result in minimal validation cost and will potentially distribute the need for cache refreshes onto several requests.
  • Profile-Preloading: Define a set of PIDs which will regularly be updated (and on application startup). For example, search for profiles containing the setting which indicates which is the profile attribute, or just resolve all DTR profiles.
  • The code surely requires some structural improvements.
  • Fix test coverage
  • merge/rebase main into this branch for proper testing

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 7 times, most recently from 6bb28b6 to aad4408 Compare August 28, 2024 23:20
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from aad4408 to efb4bf5 Compare August 29, 2024 12:53
@coveralls
Copy link

coveralls commented Aug 29, 2024

Pull Request Test Coverage Report for Build #421

Details

  • 101 of 136 (74.26%) changed or added relevant lines in 4 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.5%) to 71.805%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/main/java/edu/kit/datamanager/pit/domain/Operations.java 12 18 66.67%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/TypingService.java 0 7 0.0%
src/main/java/edu/kit/datamanager/pit/typeregistry/impl/TypeRegistry.java 72 80 90.0%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/EmbeddedStrictValidatorStrategy.java 17 31 54.84%
Files with Coverage Reduction New Missed Lines %
src/main/java/edu/kit/datamanager/pit/pitservice/impl/EmbeddedStrictValidatorStrategy.java 1 57.58%
src/main/java/edu/kit/datamanager/pit/pitservice/impl/TypingService.java 2 31.17%
Totals Coverage Status
Change from base Build #420: -0.5%
Covered Lines: 871
Relevant Lines: 1213

💛 - Coveralls

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 3 times, most recently from a38cdee to b5fba64 Compare August 29, 2024 15:02
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from b5fba64 to f976bdd Compare August 29, 2024 15:30
@Pfeil Pfeil added the maintenance Not a bug, but should be done. label Oct 11, 2024
@Pfeil

This comment was marked as resolved.

@Pfeil Pfeil self-assigned this Nov 8, 2024
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from 6e64ea4 to d4317c8 Compare November 16, 2024 00:49
@Pfeil Pfeil changed the title Validation speedup experiments Type-Api support and validation speedup Nov 16, 2024
@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 3 times, most recently from 4b57404 to 6b07c6f Compare November 19, 2024 19:53
- support for records without profiles
- support for records with multiple profiles
- support for multiple profile attribute keys/types
- support for additional attributes
- in general, attribute validation and profile validation are now separate tasks
@Pfeil

This comment was marked as outdated.

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from 530b249 to d0c231a Compare January 3, 2025 18:20
@Pfeil
Copy link
Member Author

Pfeil commented Jan 3, 2025

next steps:

  • merge master in here
  • have some proper unit tests for the new code
    • test resolving schemas and attributes
    • test resolving profiles
    • test attribute validation, implemented in AttributeInfo
    • test profile validation, implemented in RegisteredProfile
    • test updated validation strategy
  • use test coverage to detect some unused classes and remove them (domain classes)
  • fix tests (remaining failed tests: 2)
  • increase test coverage

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from d0c231a to 85db2a0 Compare January 7, 2025 09:35
@Pfeil
Copy link
Member Author

Pfeil commented Jan 8, 2025

Note: Something is wrong with the CI.

  • On all my local machines, the tests run, fail, and end. The CI seems to run in some infinite loop on PID creation, which is not reproducible.
  • Local tests and the CI use both Java 21
  • Because the tests do not end, I cannot get any error reports
  • It always fails in this test (last lines in output):
e.k.datamanager.pit.web.CustomPidsTest   : Started CustomPidsTest in 3.542 seconds (process running for 63.61)
CustomPidsTest > testCrateCustomPidWhenFeatureDisabled() STANDARD_OUT
    2025-01-08T17:16:36.800Z  INFO 4724 --- [    Test worker] e.k.d.p.web.impl.TypingRESTResourceImpl  : Creating PID
  • The test happens using the in-memory pid system. So it should not be an issue with external pid systems, except for validation. Validation needs external services.
  • The test uses a custom PID, but disables the feature. So we will try to find a PID which does not exist yet forever, if we would for some reason not find any.
  • It seems to loop/wait somewhere in the validation process.

Ideas to fix this:

  • Delete recently used GitHub Caches -> didnt work out
  • AFAIK there are limitations to the CI in PRs, maybe the CI of the main branch is being executed and does not work any more for some reason? I remember there were some security reasons. Maybe we need to update the main CI definitions?
  • make sure we run the CI with log level traces next time and see how far we get. We may need to add further traces then and loop...

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch 7 times, most recently from 0b8163a to 7349a0c Compare January 10, 2025 17:28
@Pfeil
Copy link
Member Author

Pfeil commented Jan 10, 2025

I changed the async executor to be single threaded, and now it "hangs" here:

TypeApiTest > queryAttributeInfoOfSimpleType() STANDARD_OUT
    2025-01-10T17:41:57.609Z TRACE 2035 --- [    Test worker] e.k.d.pit.typeregistry.impl.TypeApi      : Querying attribute info for 21.T11148/b8457812905b83046284
    2025-01-10T17:41:57.611Z TRACE 2035 --- [pool-2-thread-1] e.k.d.pit.typeregistry.impl.TypeApi      : Loading attribute 21.T11148/b8457812905b83046284 to cache.
    2025-01-10T17:41:57.613Z TRACE 2035 --- [    Test worker] e.k.d.pit.typeregistry.impl.TypeApi      : Finished querying attribute info for 21.T11148/b8457812905b83046284

Which gives exactly no clue. But I seem to be able to reproduce the issue now locally: All tests I checked run infinitely. I think this is because the way "async" works in java it is not solvable using a single thread (blocking tasks). And this is again because I spawn new tasks from existing ones, wait for a task to finish here and there, etc.

Is this the same issue in the CI, though? If so, I guess it is the fault of the java implementation?


Turns out: No. The single thread executor is seemingly not able to interrupt tasks, which means you can quickly get into deadlocks depending on the complexity of your task. A task that spawns more tasks and need them befor finishing, running on the same executor, won't work. But this is what I currently do. This is why other issues appeared on both sides. But it had not directly something to do with the CI issue.

The solution was to move in the CI from OpenJDK "zulu" to "temurin". Temurin is also what we use in our docker container. The virtual thread executor of zulu seems to behave differently than the openJDK that homebrew provides. In any case, I am planning some additional changes after cleaning up my WIP mess:

  • use different executors for each cache
  • double check that there is no future that will implicitly be created with some default executor. Not sure if such a thing is possible, but I want to check if I explicitly defined the executor for each async task that I create. I believe that in my case, all futures come from the async caches, but I'll need to check.

@Pfeil Pfeil force-pushed the validation-speedup-experiments branch from 7349a0c to 865261f Compare January 10, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Not a bug, but should be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable using the Type API instead of Cordra to get a profiles JSON Schema.
2 participants