-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orleans 3.0 compatibility #5
base: master
Are you sure you want to change the base?
Conversation
hmm, will check that failing tests tomorrow |
Something weird happens with AppVeyor build:
It is compiled under |
hi, I will allocate some time today and try to resolve this |
No luck? Maybe I can help somehow? |
Not sure either but it is not a good sign, I am not able to tell with confidece whether there is a concurrency issue in the Silo or RmqStreamProvider or in the tests; and thus I cannot merge this until this is resolved because it is a red flag. |
the latest stable (2.4.3) is available; v3 is still rc, so will wait with that...maybe by that time the issues with the tests will be resolved; they are pretty random :( and it seems like some issue with the Orleans cluster; I will resolve the conflicts later |
@zitmen can you point me what issues are you talking about so I could try checking myself? I've ran these tests locally for many times without issues. And regarding 3.0 is rc: the claim This release is ready for production use. So there is low chance of any changes coming and if any issues exists - better try fix'em now =) |
The following issue appears randomly at appveyor. I am not able to reproduce it locally. It fails 1 out of 3 times, which is not good. I am not sure if it is caused by a time out or one of the silos crashes or something else. I hope to find some more time this week to look into this more. But feel free to investigate yourself. Thanks.
https://ci.appveyor.com/project/zitmen65687/orleans-streams-rabbitmqstreamprovider/builds/28237257 |
Why did you make your own IQueueAdapterCache? Orlean's default SimpleQueueAdapterCache is not enough for your needs? Or does it act somehow different? I've replaced your cache with Orlean's built-in in my pullrequest, don`t really think it can be root of this issue, but need to mention. |
I don't know if they fixed it or not but it was broken and not reliable. Too many race conditions |
Hmm, it is unchanged since 2017, but used in Azure Queue, GCP, AWS SQS stream providers... |
It does not mean it is correct. I don't know what guarantees the other providers have but I tested it extensively with acking and it did not work correctly. That was the reason why this provider was created in the first place. Otherwise I could use the contrib one. |
I didn't say if its correct, but only that it is widely used without any issues and maybe in real world scenarios (inside Orleans scheduler) all these races you're talking about does not exist at all? |
you may be right |
From my side i can tell that we're running forked version of your provider (with fixed compatibility for 2.3+ and SimpleQueueAdapterCache) for about 2 or months on production under load like 500msg/s without any issues. Or maybe we're just missing some logs and don't know if we have issues? :) |
Ahh, I remember another one modification we've made - that stupid 'fix', because did not have time to dig deeper to its root. Without this it were delivering only to first subscriber. |
Re-checked that |
So I've looked into the I'm not seeing problematic race conditions, because SimpleQueueCache is supposed to be called only from the PersistentStreamPullingAgent's thread. It is also fully synchronous, so it is not like there can be asynchronous reentrancy race conditions either. But I do see some things in Starting out first with A lesser consideration is that it can only ack messages once they are no longer needed in the cache. Because the messages from different virtual streams are combined into the cache, and it only releases whole cache buckets at a time, a slow consumer on one stream could substantially delay release of messages for other streams, which in turn means delaying acking those messages back to RabbitMQ. I don't think this one is usually a terribly big deal, especially since Azure Queues acts the same way (although it that it never needs to nack messages, with them being requeued based on a timeout instead). The
|
@KevinCathcart you nailed it. My memory of all the reasoning is quite blurry because it was long time ago but you are right that the main issue with the simple cache was the (n)acking and the fact it was very unreliable in our application. My implementation was intended to behave the exact same as any other RMQ client because I was porting an external service which was triggering silo processing based on RMQ notifications into the Orleans streaming. Also, the application was required to work as close to real-time as possible so this whole issue with buckets was significant. |
Orleans 2.3 introduces some breaking change in
SiloPersistentStreamConfigurator
(in this PR) so older versions of this provider began throwing errors on start: