-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transactions/Atomic writes in the docstore #3501
Comments
@sandeepvinayak I'm the designer and implementer of docstore. I plan to review your document soon. We're not opposed to the change, we just need to review it carefully. Thanks for your patience. |
thank you @jba !Really appreciate your help on this. Looking forward to getting the feedback on the doc. |
@jba Hope you are doing great, just wanted to followup if you got a chance to review the doc. |
I've read the doc. I asked for commenter permission, but maybe it's better to discuss here. The hardest part is not the API, it is making sure we have uniform semantics across all providers. For example, consider a transaction that includes a Put followed by a Get on the same document. Does the Get see the value in the Put, or does it see the value before the transaction started? I would hope the former, but I don't know if every provider guarantees that. What about limitations in the transaction implementations of various providers? For example, Firestore requires that all reads happen before all writes (making the situation I describe above impossible). Doesn't that mean that all docstore transactions must have the same limitation? If not, how will you run write-before-read transactions on Firestore? Transactions often need to act on earlier values. For example, if you want to change the value of a field atomically based on its current value, you need a transaction like
How can an API based on the ActionList API deal with that? |
If you look at the solution 3 in the document (preferred one), it guarantees the same ordering what docstores support without transaction (after reads, concurrent reads and before reads) based on where the user put them in the action list. In this proposal we are not attempting to solve the reads as part of transactions (aws do provide transactional get items), but we are only trying to solve the writes in atomic way. In your example: coll.Actions().Get(doc).Update(doc1, mods).Update(doc2, mods).SetAtomicWrites(true).Do(ctx) The Get will be executed before write transaction because it was in actionlist before the Update doc1. |
Then I wouldn't call this feature "transactions". I also wouldn't use ActionLists to represent it, because having the reads mixed in is misleading. There should be a separate API just for writes. This feature seems limited compared to real transactions. As you point out, all the providers have some sort of transaction mechanism. And read-modify-write transactions are quite useful and common. I think it should be generalized to full transactions. Otherwise, if we add transactions later, we'll have two similar features, batched atomic writes and real transactions. |
I was thinking in the way to have closest to the existing semantics in docstore. Basically, the users can do atomic writes and then read the latest version by doing Get on the same actionList, just like we do today without atomic writes.
Sure, but in NoSQL world, it's not common have these semantics for the transactions. Most of the provides provide Read/Write transactions separately. For example: DynamoDB provides TransactWriteItems and TransactGetItems and we cannot have the same item in both the transactions at the same time, it would be a conflict. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html Therefore, I believe we need to align with the providers and provide write/read transactions separately. The alternative is to not let the user mix reads in the transaction writes if you feel strong about it. What do you think ? |
But that read is not guaranteed to get the latest write in the list. It would be equivalent to doing the read after the call to
OK, I get it now. I think we should have |
I believe it's not out of order, if we look at the code here, there is some sort of ordering guarantee, we don't completely shuffle the order https://github.com/google/go-cloud/blob/master/docstore/driver/util.go#L58 If read happens after write on the same key, it will stay after that write in afterGets. Likewise for beforeGets Just want to make sure we are on the same page but if you still think we should complete separate out the AtomicWrites and is a lesser confusion, we can do that. I have no strong opinion over choosing either of these two approaches, both serves the purpose of AtomicWrites which is the problem we are trying to solve. |
You are right. That is just a helper function, but the doc for ActionList says
That is also consistent with making the writes atomic. Will this API work for all providers?
The main difference from yours is that it only applies to subsequent writes, not the ones before. |
This will work. To confirm, we will have normal writes and atomic writes in a same ActionList, which is close to my first solution in doc but in a different form of apis exposed. But I like it. Instead of SetAtomicWrites, we are giving more flexibility to users where they can have concurrent writes and atomic writes in the same ActionList. So, for code changes, I can start with driver and aws in the first iteration and then other providers in the follow up iterations. |
Also (actually more important to me), we avoid the confusion where an AtomicWrites late in the list affects writes before it. Your coding plan sounds good. How will you test that writes are really happening atomically? I don't see any easy way to check that. What is your plan to test on all providers? |
I will write the conformance tests with atomic writes and then get the documents part of atomic writes and make sure it reads the latest values for all of them. I know it still doesn't guarantee that the writes happened atomically and but I will also write some unit tests to make sure the writes are grouped in an expected manner before it call the provider api(s) to execute operations. What do you think ? |
Yes, I think showing the grouping is right is far simpler and about as good as we can get. |
Awesome! thanks @jba ! I will begin to create PRs in next couple days for your review, thanks again! |
@sandeepvinayak just checking: you aren't waiting for me, are you? |
@jba Thanks for following up! No, I’m not blocked on anything; I was just juggling other ta sks. I’ll send out a PR by the end of the coming weekend Nov 16. |
@jba Following up on this if you got a chance to take a look at the PR ? |
Sorry. I saw it but haven't had time. |
Sure, @jba ! please let me know if there is anything needs to be addressed, thanks! |
@jba don't mean to rush but do you think you will be able to take a look at the PR in next couple of days, I am asking because we want to have this feature in our fork sooner but prefers to port it from the open source so we remain in sync, thanks! |
Transactions/Atomic writes in docstore
@eliben @vangent We are trying to add the transactions support/atomic writes in the docstore and this is the high level idea of how can we support that in go-cloud. https://docs.google.com/document/d/1UVj1kmwDfrs5qm8r7X1p4fAFmsdPEeBHcvWJ8zWF1dY/edit?tab=t.0
The PR is just at a high level to support the idea.
Let me know what you think about it and we can help to drive it further.
Edit:
The document is extended with more alternative solutions and solution 3 looks like the best one. Please note that the PR for idea was for solution 1. And if we agree we can provide the updated and completed PR.
The text was updated successfully, but these errors were encountered: