Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dacha cache poisoning #1187

Closed
1 of 5 tasks
rickybasra opened this issue Nov 1, 2024 · 5 comments · Fixed by #1188
Closed
1 of 5 tasks

Dacha cache poisoning #1187

rickybasra opened this issue Nov 1, 2024 · 5 comments · Fixed by #1188

Comments

@rickybasra
Copy link

Describe the bug
Experiencing a cache poisoining issue with the data returned from Edge. We are getting different versions of features returned every few GET requests. It should always show the correct and latest version of the feature but intermittently toggles between an old version.

Which area does this issue belong to?

  • FeatureHub Admin Web app
  • SDK
  • SDK examples
  • Documentation
  • Other

To Reproduce
Steps to reproduce the behavior:

  1. Run 3 pods for each service; edge, mr, dacha2, nats
  2. Go to 'FeatureHub Admin Web app'
  3. Kill the NATs pods and check they have been recreated
  4. Click on 'Toggle a feature'
  5. Use the rest api to get the feature numerous times and version and value should always match that in the mr console.

Expected behavior
Each time you retrieve the feature it should give back the correct value and version.

Screenshots
it either returns this payload
{
"id": "2f55cdfd-e3cc-4ff4-9326-5c959d7404e2",
"key": "rb-test-20240726-feature-01",
"l": false,
"version": 16,
"type": "BOOLEAN",
"value": true,
"strategies": []
},
or this
{
"id": "2f55cdfd-e3cc-4ff4-9326-5c959d7404e2",
"key": "rb-test-20240726-feature-01",
"l": false,
"version": 17,
"type": "BOOLEAN",
"value": false,
"strategies": []
},
keeps rotating every few refreshes

Versions

  • FeatureHub version [e.g. 1.8.0-RC]
  • Browser [chrome]
  • Bruno

You can get the version of the FeatureHub container by running docker ps command.
Please include the OS and what version of the OS and Docker you're running.

Additional context
Slack thread discussing the issue
https://anyways-labs.slack.com/archives/C0150T7AF25/p1722508085063969

@rvowles
Copy link
Contributor

rvowles commented Nov 4, 2024

So we have ascertained this is likely because the service isn't being destroyed because of a health check failure but there is enough of a lapse so that the dacha instance is losing events. The desire is to add an option which will cause Dacha to dump its cache if it detects a loss of connection to NATs and not reinstate caching until the connection has been re-established.

@rickybasra
Copy link
Author

Sounds good @rvowles . Do you have any ETA for when this feature would ready for us to use?

@rvowles
Copy link
Contributor

rvowles commented Nov 15, 2024

FWIW, we have some folks with big clusters running on NATs on k8s and they haven't reported this issue, so I am wondering if there is a configuration issue with your setup. Part of the change for this is that it will log the discovered servers on a connection event. Please check when this happens that all of your servers are getting discovered by the jnats client. There may be an interconnection issue in your NATs deployment.

rvowles added a commit that referenced this issue Nov 15, 2024
As Dacha2 heavily relies on NATS for its state, this allows
the unexpected dissolution of a NATs cluster to trigger a Dacha2
instance to drop being a cache and simply pass through requests to
MR until the NATS connection is reestablished.

#1187
@rvowles
Copy link
Contributor

rvowles commented Nov 18, 2024

This has been pushed into the pre-release 1.8.2-rc of FeatureHub now available on DockerHub - could you try it out please? If it is fit for purpose, we'll cut the release and make it generally available. Check the PR for the docs on variants you may wish to try.

@gavinwoolley
Copy link

I've pulled 1.8.2-rc and added dacha2.streaming.disconnect-behaviour: use-passthrough to application.properties via the HELM env var setting.
I've confirmed those values are mounted in the running container.
It does not seem as though any of the extra logging information is exposed that you've added

rvowles added a commit that referenced this issue Jan 13, 2025
* NATS disconnection notification support for Dacha2

As Dacha2 heavily relies on NATS for its state, this allows
the unexpected dissolution of a NATs cluster to trigger a Dacha2
instance to drop being a cache and simply pass through requests to
MR until the NATS connection is reestablished.

#1187

* support default keep cache until reconnect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants