Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRDB: Initial setup + migrations #2581

Draft
wants to merge 10 commits into
base: grdb-feature-branch
Choose a base branch
from

Conversation

leandroalonso
Copy link
Member

@leandroalonso leandroalonso commented Dec 20, 2024

Important: these changes are targeting a feature branch. I don't like them, but I see no reason for having the GRDB library on the app until we're ready to test it.

This initial PR adds two things:

  1. GRDB library
  2. Database setup + migrations

Additional details

My plan for doing this is as follows:

  1. Use a feature branch until we have the whole data layer ready to be switched to GRDB
  2. While the above is not true, we'll have FMDB and GRDB working at the same time. This is not ideal but will help us test the GRDB changes
  3. After all the work is done, we'll have two layers: one using FMDB and the other using GRDB. More about this below.

FMDB and GRDB at the same time: how and why

My plan is basically to duplicate the whole data layer so we have one part working with FMDB and the other with GRDB. Code duplication is not great but in this case, I think it will guarantee us a safe environment for test and flexibility when switching to GRDB.

As with every feature we develop, we have the grdb flag, which will control whether GRDB is enabled or not. However, at the beginning of the app GRDB or FMDB will be chosen. And this will persist until the app is killed. Once initializing it again, it will, one more time, use GRDB or FMDB. This allows us to avoid scenarios such as switching the flag remotely and apps changing their database framework in runtime, something that we do not want.

Also, duplicating the code and classes will make it easier to delete code in the future. We can delete entire methods and classes as opposed to ifs.

Once we have the whole data layer in GRDB, what's the plan?

My plan is:

  1. To start enabling it only for development. Then we can start using it in our day to day of regular work;
  2. Then, we can enable the app for us and use the app normally
  3. Then, for our TestFlight users
  4. If everything goes well, we can enable that for a portion of users
  5. And finally, we can enable for everyone
  6. And last, but not least, we can remove FMDB and the FMDB code

To test

The feature flag is already enabled

  1. Do a clean install of the app
  2. Use the app normally, everything should work because the database setup and migrations were successful
  3. You can check the .sqlite file and look at the tables/columns created

Checklist

  • I have considered if this change warrants user-facing release notes and have added them to CHANGELOG.md if necessary.
  • I have considered adding unit tests for my changes.
  • I have updated (or requested that someone edit) the spreadsheet to reflect any new or changed analytics.

@leandroalonso leandroalonso added the [Type] Tech Debt Applies to issues involving upgrades or refactoring to maintain or enhance the codebase. label Dec 20, 2024
@leandroalonso leandroalonso added this to the Future milestone Dec 20, 2024
@leandroalonso leandroalonso requested a review from a team as a code owner December 20, 2024 18:48
@leandroalonso leandroalonso requested review from SergioEstevao, bjtitus and danielebogo and removed request for a team December 20, 2024 18:48
@dangermattic
Copy link
Collaborator

2 Warnings
⚠️ Modules/DataModel/Package.swift was changed without updating its corresponding Package.resolved. Please resolve the Swift packages in Xcode.
⚠️ This PR is larger than 500 lines of changes. Please consider splitting it into smaller PRs for easier and faster reviews.

Generated by 🚫 Danger

@@ -24,6 +55,564 @@ class DatabaseHelper {
}
}

// GRDB upgradeIfRequired
private class func upgradeIfRequired(schemaVersion: inout Int32, dbPool: DatabasePool) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have a lot of code added in these branches. These are not new codes, just the same queries adapted to GRDB and FMDB.

I did look at maybe having some "magic" that would not require these changes but GRDB and FMDB works differently and return different results, thus making that quite hard. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it looks that those changes are needed, could we use this change to create some protocol to abstract the DB operations, and then have FMDB implementation ( the current one) and another GRDB implementation?

@danielebogo
Copy link
Contributor

Thanks @leandroalonso for setting this up! I really look forward to have the new layer working in our app!

Just a few thoughts after my first test:

Code duplication is not great but in this case, I think it will guarantee us a safe environment for test and flexibility when switching to GRDB

We need to be careful with this, especially when it's about keeping both data model updated. Right now the app doesn't fully work as the new data model is missing the podcastHTMLDescription column in the Podcast model.
I'm wondering what we can do to avoid this in future any time we have to update a model.

We will have a lot of code added in these branches. These are not new codes, just the same queries adapted to GRDB and FMDB.

Related to this I'm wondering if (as mentioned by @SergioEstevao above) we can take this as an opportunity to refactor a bit the persisted layer and introduce something that can switch between the 2 framework underneath without impacting any other part in future. This should also reduce the duplication, keeping the possibility to delete a whole part we want.

@leandroalonso
Copy link
Member Author

leandroalonso commented Dec 23, 2024

Related to this I'm wondering if (as mentioned by @SergioEstevao above) we can take this as an opportunity to refactor a bit the persisted layer and introduce something that can switch between the 2 framework underneath without impacting any other part in future. This should also reduce the duplication, keeping the possibility to delete a whole part we want.

This was my initial idea. :)

Things are not simple though. I'll try to explain why I landed to this solution.

FMDB and GRDB are inherently different

The way the database calls work is different between the two frameworks. Let's take this example from FMDB:

try db.executeQuery("PRAGMA busy_timeout = 10000", values: nil).close()

            var startingSchemaVersion: Int32 = 0

            let rs = try db.executeQuery("PRAGMA user_version", values: nil)
            if rs.next() { startingSchemaVersion = rs.int(forColumnIndex: 0) }
            rs.close()

            var newSchemaVersion = startingSchemaVersion
            upgradeIfRequired(schemaVersion: &newSchemaVersion, db: db)

            if newSchemaVersion != startingSchemaVersion {
                try db.executeUpdate("PRAGMA user_version = \(newSchemaVersion)", values: nil)
            }

And the exact same code using GRDB:

            try dbPool.write { db in
                try db.execute(sql: "PRAGMA busy_timeout = 10000")
            }

            var startingSchemaVersion: Int32 = 0
            var newSchemaVersion: Int32 = 0
            try dbPool.read { db in

                let rows = try Row.fetchCursor(db, sql: "PRAGMA user_version")
                if let row = try? rows.next() { startingSchemaVersion = row["user_version"] }

                newSchemaVersion = startingSchemaVersion

                upgradeIfRequired(schemaVersion: &newSchemaVersion, dbPool: dbPool)
            }

            try dbPool.write { db in
                if newSchemaVersion != startingSchemaVersion {
                    try db.execute(sql: "PRAGMA user_version = \(newSchemaVersion)")
                }
            }

There are a few fundamental differences there:

  1. FMDB is one big block. It reads some stuff, then writes, then writes again if needed
  2. GRDB has contexts. You cannot write inside a read block. That's why GRDB's code breaks this into 3 different blocks.

Can we create some write and read methods and abstract both frameworks inside them? I believe so.

But then we have another challenge: GRDB and FMDB produce different results when reading. FMDB returns FMResultSet while GRDB returns Row (which is a dictionary). These types are not interchangeable so we would need to convert both to something else — or maybe convert the Row to a Dictionary.

The big challenge here is regarding FMDB, it needs to set types very explicitly: Eg.:

podcast.secondaryColor = rs.string(forColumn: "secondaryColor")

So I wonder if we can know what's the type in advance. It's something I would need to check out because I'm not sure — and if we can't infer the difference, then we need to be explicit, thus not being able to make a generic layer.

What's our main goal?

Is our goal to refactor the data layer so it's more generic? Or is our goal to move to GRDB as quickly as possible? or maybe is it both?

The only thing I'll point out is that we don't know what we don't know. Maybe GRDB won't work for us for a reason that we don't know yet.

That's why I landed on this quick and dirty concept: so we could use/test GRDB soon.

Please don't get me wrong, I get both your and @SergioEstevao's feedback and I agree with it. Duplication is not good, you already made a case of a property that was missing and we surely don't want to be changing code in two different places. However, I think that for a more smart-generic solution, we'll need more time — thus we need to transform this into a proper project.

In the meantime, I'll change this PR to a draft and submit another one (also as a draft) with the same idea presented here. I'll use the rest of my time to see if we can achieve some layer of abstraction. And then we can discuss where to go from there. :)

@leandroalonso leandroalonso marked this pull request as draft December 23, 2024 16:17
@leandroalonso
Copy link
Member Author

leandroalonso commented Dec 23, 2024

@SergioEstevao @danielebogo to summarize the biggest challenge, take a look at this: https://github.com/Automattic/pocket-casts-ios/blob/trunk/Modules/DataModel/Sources/PocketCastsDataModel/Public/Model/Podcast%2BfromDatabase.swift

To have a generic layer we need to convert both GRDB and FMDB returns into something a generic layer can work with.

The big challenge here is FMDB because FMResultSet requires you to use methods which clearly state the types. GRDB is a bit more clever because it allows us to do something like:

podcast.lastColorDownloadDate = row["lastColorDownloadDate"]

Any help on solving this is appreciated. :)

Nevermind, I have an idea. :)))

@SergioEstevao
Copy link
Contributor

From what I see on the codebase We are passing around the DBQueue object and then execute commands like:

executeQuery or execute then on the cases of reads we read from the result to the our model objects.

My idea was to abstract those methods in a protocol that will do these commands on each specific layer.

As you say above we have the issue of converting from ResultSets to the model objects. Could this be handle with another abstraction that implements the reading from the most common base types?

readString, readInt, readBool etc...?

@leandroalonso
Copy link
Member Author

leandroalonso commented Dec 23, 2024

executeQuery or execute then on the cases of reads we read from the result to our model objects.

@SergioEstevao there are differences though.

You mentioned "From what I see on the codebase We are passing around the DBQueue object and then execute commands". Yes, but this is always encapsulated in a inDatabase block. FMDB needs that while GRDB needs you to be very specific if you are reading or writing, so I'm still not sure about how we can abstract this in a layer. On FMDB you can have everything inside one block, on GRDB not.

So how do we do in the example that I shared which is a write/read/write? Open multiple dbQueue.inDatabase? One for each method call? Or something else?

One thing I don't want to do is to rewrite the database layer in ways that will introduce different behaviors — such as this one I mentioned, opening multiple dbQueue.inDatabase. This is fundamentally different from executing everything at once because if I'm not mistaken each block will go to the serial queue, so things can become even slower with FMDB.

The return of the database layer is something I've figured out but the main difference between each database framework is the tricky part.

@groue
Copy link

groue commented Dec 24, 2024

FMDB and GRDB are inherently different

GRDB author, here. GRDB is a direct descendant of FMDB, and I would not say that they are inherently different. On the contrary, both share a common way to access the database, which is the "database access methods": inDatabase in FMDB vs. write or read in GRDB.

  • FMDatabaseQueue -> DatabaseQueue or DatabasePool
  • FMDatabase -> Database.

There are a few fundamental differences there:

  • FMDB is one big block. It reads some stuff, then writes, then writes again if needed
  • GRDB has contexts. You cannot write inside a read block. That's why GRDB's code breaks this into 3 different blocks.

Again, not really. You should probably carefully preserve those "blocks" (i.e. database access units) in your translation. At first you can just replace all inDatabase with write. For exact compatibility it should be writeWithoutTransaction, because write wraps your database statements in a transaction (usually a good idea). Eventually, you can replace some writes with read: this will help DatabasePool perform concurrent reads, if your app needs this. But make sure you preserve those blocks, or you will introduce changes in the way your app serializes database access through its FMDatabaseQueue/DatabasePool.

You may also have a look at (now archived) http://github.com/groue/GRDBObjc, "FMDB-compatible bindings to GRDB.swift". It contains some compatibility notes that may be interesting.

@leandroalonso
Copy link
Member Author

leandroalonso commented Jan 2, 2025

@groue thank you so much for your input!

GRDB is a direct descendant of FMDB

I didn't know that. I guess some of my assumptions were wrong. Which is amazing. :)

this will help DatabasePool perform concurrent reads, if your app needs this.

This is something that we'd like to have day 1 of using GRDB. To be fair, it's one of the main reasons why we're looking into making this switch.

That's why I was under the impression that to take advantage of it we would have to use write/read. Which changes slightly how the two frameworks behave, as if FRDB we don't really need to specific which kind of transaction it is.

We'll take a look at the link you shared, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Tech Debt Applies to issues involving upgrades or refactoring to maintain or enhance the codebase.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants