Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Bulk Document Creation #2772

Merged
merged 118 commits into from
Jan 18, 2024
Merged

Conversation

elipe17
Copy link

@elipe17 elipe17 commented Dec 7, 2023

Summary of Changes

  • Fixed issue where Elastic documents were not being created due to Django bulk model creation not sending appropriate signals
  • Bypassed Django signals and manually bulk create Elastic documents for minimized I/O
  • This also includes the addition of the Kibana container

How to Test

cd tdrs-frontend && docker-compose up
cd tdrs-backend && docker-compose up --build
  • Open http://localhost:3000/ and sign in.
  • Submit any number of data files
  • Verify documents exist in Elastic by running the following command in your terminal: curl -H "Content-Type: application/json" -X GET "localhost:9200/<INDEX_NAME>/_count" where <INDEX_NAME> can be tanf_t1_submissions.
  • Another good verification would be to submit ADS.E2J.NDM1.TS53_fake.txt and verify that it takes minutes and not hours to parse/validate. Should be around 3 - 5 minutes.

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • Parsing is still as fast as possible
  • Document creation actually happens now
  • adpennington approved

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

@@ -107,7 +114,8 @@ def evaluate_trailer(datafile, trailer_count, multiple_trailer_errors, is_last_l
def rollback_records(unsaved_records, datafile):
"""Delete created records in the event of a failure."""
logger.info("Rolling back created records.")
for model in unsaved_records:
for document in unsaved_records:
model = document.Django.model
num_deleted, models = model.objects.filter(datafile=datafile).delete()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity i ran the big fake_rollback file and noticed the delete requests going crazy

Screen.Recording.2023-12-20.at.10.39.15.AM.mov

but the db and elastic eventually both end up in a consistent state.

this may be a topic for another ticket, but implementing some sort of bulk_delete may be helpful. existing solutions might not honor the signals needed for elastic, though. https://stackoverflow.com/a/36935536

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, our transaction-less rollback is a pain point for sure. It would be interesting to test out the raw_delete.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to be careful with raw_delete though!

):
self.model = model
self.document = document
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change makes me nervous because of the potential side-effects, but you've covered all the cases i've thought of so far

Copy link

@jtimpe jtimpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and nice fix!

Copy link

@George-Hudson George-Hudson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


datafile = fields.ObjectField(properties={
'pk': fields.IntegerField(),
'id': fields.IntegerField(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only thing i would say here is that id is not always the pk and we've run into cases in the past where this assumption has broken down. everything works in this case, though

@ADPennington ADPennington added the Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI label Jan 10, 2024
Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elipe17 test results below. im not seeing the large file fully parse locally or in dev environment, so im unable to check those bulk create results. i assume thats beyond the scope of this ticket, so no reason to hold this up!

apennington@HHSLBDSWL73 MINGW64 ~/GitHub/RAFT/TANF-app (elastic-bulk-doc-creation)
$ curl -H "Content-Type: application/json" -X GET "localhost:9200/tanf_t1_submissions/_count"

{"count":1,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

# large file parsing timed out locally and apps crashed, potentially due to memory issues

dev env results for large file submission

2024-01-11T16:17:00.42-0500 [APP/PROC/WEB/0] ERR [2024-01-11 21:17:00,425: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:173 exited with 'signal 9 (SIGKILL)'
   2024-01-11T16:17:00.43-0500 [APP/PROC/WEB/0] ERR [2024-01-11 21:17:00,437: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 2.')
   2024-01-11T16:17:00.43-0500 [APP/PROC/WEB/0] ERR Traceback (most recent call last):
   2024-01-11T16:17:00.43-0500 [APP/PROC/WEB/0] ERR File "/home/vcap/deps/1/python/lib/python3.10/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
   2024-01-11T16:17:00.43-0500 [APP/PROC/WEB/0] ERR raise WorkerLostError(
   2024-01-11T16:17:00.43-0500 [APP/PROC/WEB/0] ERR billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 2.

@ADPennington ADPennington added Ready to Merge and removed raft review This issue is ready for raft review QASP Review Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Jan 11, 2024
@andrew-jameson andrew-jameson merged commit d5a44ff into develop Jan 18, 2024
12 checks passed
@andrew-jameson andrew-jameson deleted the elastic-bulk-doc-creation branch January 18, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants