Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read replica db for findByCid #1212

Conversation

samika98
Copy link
Contributor

@samika98 samika98 commented May 17, 2024

[Read Replica] - #[https://linear.app/3boxlabs/issue/WS2-3238/create-a-separate-read-replica-to-improve-cas-performace]

Description

We noticed Knex connection pools getting exhausted during high load after some time. In order to scale CAS we have to introduce a separate read replica to handle the read calls. This will distribute the load between the replicas giving us better throughput

This PR has env vars defined to access the CAS-API : https://github.com/3box/ceramic-infra/pull/798/files#diff-7a1b7f1deef12acc800844bd1061eecf14fc08ded251a2363f32325a3210fcea

How Has This Been Tested?

TODO : Saw decreased latency monitored via data dog for read queries. Earlier read queries would take ~3-4 minutes under high load. We observe time in order of milli seconds for reads with this change

References:

Update ts for typed-inject compatibility : microsoft/TypeScript#37400. For some reason this bug was still present on ts 5.0.4 but not on the latest version which is 5.4.5
https://www.npmjs.com/package/typed-inject

Copy link

linear bot commented May 17, 2024

@samika98 samika98 requested a review from smrz2001 May 17, 2024 19:48
@samika98 samika98 force-pushed the feature/ws2-3238-create-a-separate-read-replica-to-improve-cas-performace branch from 802d040 to b82f094 Compare May 22, 2024 15:44
@samika98 samika98 requested review from stephhuynh18 and 3benbox May 22, 2024 17:39
@codecov-commenter
Copy link

codecov-commenter commented May 22, 2024

Codecov Report

Attention: Patch coverage is 80.55556% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 78.20%. Comparing base (bbca649) to head (8e74e93).
Report is 2 commits behind head on develop.

Files Patch % Lines
src/services/request-service.ts 66.66% 5 Missing ⚠️
src/app.ts 0.00% 1 Missing ⚠️
src/db-connection.ts 90.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1212      +/-   ##
===========================================
- Coverage    78.44%   78.20%   -0.25%     
===========================================
  Files           47       48       +1     
  Lines         1935     1964      +29     
  Branches       309      314       +5     
===========================================
+ Hits          1518     1536      +18     
- Misses         417      428      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@samika98 samika98 marked this pull request as ready for review May 23, 2024 20:23
Copy link

@christianlavoie christianlavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this affect existing metrics? Are there existing db query centric metrics and how are they affected with the two connection pools?

@samika98
Copy link
Contributor Author

How does this affect existing metrics? Are there existing db query centric metrics and how are they affected with the two connection pools?
@christianlavoie the only db metric we have currently in CAS is db_serialiazation_errors, this should not be affected by the two connection pools. Other metrics on AWS RDS should get affected, we would see more reads on the replica and decline of reads in the main db. See this dashboard for AWS metrics which can change

@samika98 samika98 requested a review from christianlavoie May 28, 2024 15:58
Copy link

@christianlavoie christianlavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't know the codebase enough to feel comfortable approving.

if (!request) {
throw new RequestDoesNotExistError(cid)
request = await this.requestRepository.findByCid(cid)
logger.debug(`Request not found in replica db for ${cid}, fetching from main_db`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: probably should go before line 61

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this!

throw new RequestDoesNotExistError(cid)
request = await this.requestRepository.findByCid(cid)
logger.debug(`Request not found in replica db for ${cid}, fetching from main_db`)
Metrics.count(METRIC_NAMES.REPLICA_DB_REQUEST_NOT_FOUND, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to move this above line 61 (if line 61 errors, the metric won't be counted)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for below

@@ -69,6 +69,38 @@ export async function createDbConnection(dbConfig: Db = config.db): Promise<Knex
return connection
}

export async function createReplicaDbConnection(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a separate function can we reuse the one above , I like the explicit key value pairs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does this also need to do migrations (in case of future)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having two separate functions. So that if in the future we want to disable a read replica, it would be easier to add that as a config, and add a check in app.ts. I like having a function do just one thing.
No we don't need any migrations, rds takes care of that

@samika98 samika98 merged commit e8b6558 into develop May 31, 2024
5 checks passed
@samika98 samika98 deleted the feature/ws2-3238-create-a-separate-read-replica-to-improve-cas-performace branch May 31, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants