Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions/Atomic writes in the docstore #3501

Open
sandeepvinayak opened this issue Oct 11, 2024 · 23 comments
Open

Transactions/Atomic writes in the docstore #3501

sandeepvinayak opened this issue Oct 11, 2024 · 23 comments

Comments

@sandeepvinayak
Copy link
Contributor

sandeepvinayak commented Oct 11, 2024

Transactions/Atomic writes in docstore

@eliben @vangent We are trying to add the transactions support/atomic writes in the docstore and this is the high level idea of how can we support that in go-cloud. https://docs.google.com/document/d/1UVj1kmwDfrs5qm8r7X1p4fAFmsdPEeBHcvWJ8zWF1dY/edit?tab=t.0

The PR is just at a high level to support the idea.
Let me know what you think about it and we can help to drive it further.

Edit:
The document is extended with more alternative solutions and solution 3 looks like the best one. Please note that the PR for idea was for solution 1. And if we agree we can provide the updated and completed PR.

@sandeepvinayak
Copy link
Contributor Author

@eliben @vangent Wanted to follow up on this one, do you think you would be able to review the proposed changes to move it forward ?

@jba
Copy link
Contributor

jba commented Oct 17, 2024

@sandeepvinayak I'm the designer and implementer of docstore. I plan to review your document soon. We're not opposed to the change, we just need to review it carefully. Thanks for your patience.

@sandeepvinayak
Copy link
Contributor Author

thank you @jba !Really appreciate your help on this. Looking forward to getting the feedback on the doc.

@sandeepvinayak
Copy link
Contributor Author

@jba Hope you are doing great, just wanted to followup if you got a chance to review the doc.

@jba
Copy link
Contributor

jba commented Oct 22, 2024

I've read the doc. I asked for commenter permission, but maybe it's better to discuss here.

The hardest part is not the API, it is making sure we have uniform semantics across all providers. For example, consider a transaction that includes a Put followed by a Get on the same document. Does the Get see the value in the Put, or does it see the value before the transaction started? I would hope the former, but I don't know if every provider guarantees that.

What about limitations in the transaction implementations of various providers? For example, Firestore requires that all reads happen before all writes (making the situation I describe above impossible). Doesn't that mean that all docstore transactions must have the same limitation? If not, how will you run write-before-read transactions on Firestore?

Transactions often need to act on earlier values. For example, if you want to change the value of a field atomically based on its current value, you need a transaction like

doc := Get(id)
doc.field = doc.field * 3 + 1
Put(doc)

How can an API based on the ActionList API deal with that?

@sandeepvinayak
Copy link
Contributor Author

If you look at the solution 3 in the document (preferred one), it guarantees the same ordering what docstores support without transaction (after reads, concurrent reads and before reads) based on where the user put them in the action list. In this proposal we are not attempting to solve the reads as part of transactions (aws do provide transactional get items), but we are only trying to solve the writes in atomic way.

In your example:

coll.Actions().Get(doc).Update(doc1, mods).Update(doc2, mods).SetAtomicWrites(true).Do(ctx)

The Get will be executed before write transaction because it was in actionlist before the Update doc1.
Please let me know there is anything missing. I also added you in the doc.

@jba
Copy link
Contributor

jba commented Oct 22, 2024

we are only trying to solve the writes in atomic way

Then I wouldn't call this feature "transactions". I also wouldn't use ActionLists to represent it, because having the reads mixed in is misleading. There should be a separate API just for writes.

This feature seems limited compared to real transactions. As you point out, all the providers have some sort of transaction mechanism. And read-modify-write transactions are quite useful and common. I think it should be generalized to full transactions. Otherwise, if we add transactions later, we'll have two similar features, batched atomic writes and real transactions.

@sandeepvinayak
Copy link
Contributor Author

sandeepvinayak commented Oct 23, 2024

because having the reads mixed in is misleading

I was thinking in the way to have closest to the existing semantics in docstore. Basically, the users can do atomic writes and then read the latest version by doing Get on the same actionList, just like we do today without atomic writes.

read-modify-write transactions are quite useful and common

Sure, but in NoSQL world, it's not common have these semantics for the transactions. Most of the provides provide Read/Write transactions separately. For example: DynamoDB provides TransactWriteItems and TransactGetItems and we cannot have the same item in both the transactions at the same time, it would be a conflict. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transaction-apis.html

Therefore, I believe we need to align with the providers and provide write/read transactions separately.

The alternative is to not let the user mix reads in the transaction writes if you feel strong about it.

What do you think ?

@jba
Copy link
Contributor

jba commented Oct 23, 2024

I was thinking in the way to have closest to the existing semantics in docstore. Basically, the users can do atomic writes and then read the latest version by doing Get on the same actionList, just like we do today without atomic writes.

But that read is not guaranteed to get the latest write in the list. It would be equivalent to doing the read after the call to ActionList.Do. With normal ActionLists that doesn't cause confusion, because every action has the same status: possibly out of order, non-atomic. It's mixing them that is confusing.

DynamoDB provides TransactWriteItems and TransactGetItems and we cannot have the same item in both the transactions at the same time...

OK, I get it now.

I think we should have Collection.AtomicWrites() *ActionList. An ActionList created that way does atomic writes and panics if Get is called on it. Otherwise it behaves like a regular ActionList. Does that work?

@sandeepvinayak
Copy link
Contributor Author

sandeepvinayak commented Oct 23, 2024

With normal ActionLists that doesn't cause confusion, because every action has the same status: possibly out of order,

I believe it's not out of order, if we look at the code here, there is some sort of ordering guarantee, we don't completely shuffle the order

https://github.com/google/go-cloud/blob/master/docstore/driver/util.go#L58

If read happens after write on the same key, it will stay after that write in afterGets. Likewise for beforeGets

Just want to make sure we are on the same page but if you still think we should complete separate out the AtomicWrites and is a lesser confusion, we can do that. I have no strong opinion over choosing either of these two approaches, both serves the purpose of AtomicWrites which is the problem we are trying to solve.

@jba
Copy link
Contributor

jba commented Oct 24, 2024

You are right. That is just a helper function, but the doc for ActionList says

A Get and a write may refer to the same document. Each write may be paired with only one Get in this way. The Get and write will be executed in the order specified in the list: a Get before a write will see the old value of the document; a Get after the write will see the new value if the service is strongly consistent, but may see the old value if the service is eventually consistent.

That is also consistent with making the writes atomic.

Will this API work for all providers?

// AtomicWrites causes all following writes in the list to execute atomically.
func (*ActionList) AtomicWrites() *ActionList

The main difference from yours is that it only applies to subsequent writes, not the ones before.

@sandeepvinayak
Copy link
Contributor Author

This will work. To confirm, we will have normal writes and atomic writes in a same ActionList, which is close to my first solution in doc but in a different form of apis exposed.

But I like it. Instead of SetAtomicWrites, we are giving more flexibility to users where they can have concurrent writes and atomic writes in the same ActionList.

So, for code changes, I can start with driver and aws in the first iteration and then other providers in the follow up iterations.
We can hold the portable API simple change for users to consume until we complete all the providers. Does that sounds good?

@jba
Copy link
Contributor

jba commented Oct 24, 2024

we are giving more flexibility to users

Also (actually more important to me), we avoid the confusion where an AtomicWrites late in the list affects writes before it.

Your coding plan sounds good.

How will you test that writes are really happening atomically? I don't see any easy way to check that.

What is your plan to test on all providers?

@sandeepvinayak
Copy link
Contributor Author

How will you test that writes are really happening atomically? I don't see any easy way to check that.

I will write the conformance tests with atomic writes and then get the documents part of atomic writes and make sure it reads the latest values for all of them. I know it still doesn't guarantee that the writes happened atomically and but I will also write some unit tests to make sure the writes are grouped in an expected manner before it call the provider api(s) to execute operations. What do you think ?

@jba
Copy link
Contributor

jba commented Oct 24, 2024

Yes, I think showing the grouping is right is far simpler and about as good as we can get.

@sandeepvinayak
Copy link
Contributor Author

Awesome! thanks @jba ! I will begin to create PRs in next couple days for your review, thanks again!

@jba
Copy link
Contributor

jba commented Nov 11, 2024

@sandeepvinayak just checking: you aren't waiting for me, are you?

@sandeepvinayak
Copy link
Contributor Author

@jba Thanks for following up! No, I’m not blocked on anything; I was just juggling other ta sks. I’ll send out a PR by the end of the coming weekend Nov 16.

@sandeepvinayak
Copy link
Contributor Author

@jba Sorry, I missed to update here, the PR for portable API and dynamo is ready for review since last week
#3500
Please take a look. Thanks!

@sandeepvinayak
Copy link
Contributor Author

@jba Following up on this if you got a chance to take a look at the PR ?

@jba
Copy link
Contributor

jba commented Jan 4, 2025

Sorry. I saw it but haven't had time.

@sandeepvinayak
Copy link
Contributor Author

Sure, @jba ! please let me know if there is anything needs to be addressed, thanks!

@sandeepvinayak
Copy link
Contributor Author

@jba don't mean to rush but do you think you will be able to take a look at the PR in next couple of days, I am asking because we want to have this feature in our fork sooner but prefers to port it from the open source so we remain in sync, thanks!

jba pushed a commit that referenced this issue Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants