Skip to content

Commit

Permalink
Elastic Bulk Document Creation (#2772)
Browse files Browse the repository at this point in the history
* Added formating for header and autofit columns

* Formatted the headers

* added year/month to the columns

* Added contants - translation column

* added friendly names to T1 and T2

* added friendly name to m1 and m2

* added friendly name to m3

* added friendly_name to t3

* added friendly_name to t4 and t5

* added friendly_name to t7

* correct missing friendly_name

* correction on failing tests

* addedfriendly name to excel report

* linting

* linting

* linting

* delete contants.py

* added test for json field in error model

* linting

* linting

* linting

* 2599-added friendly name to postparsing validators

* refining the validator tests

* added returning fields names to validators

* added friendly_name to error field

* linting

* corrections on views/tests

* corrections for fields

* failing test corrected

* failing test corrected

* correcting test failures

* linting

* corrected the excel fiel generator

* removed excessive space in validator

* linting

* listing

* added m6

* lint

* corrected new line break

* refactored validator logic

* linting and correction on t1

* friendly_name correction from comments

* friendly_name correction

* corrected failing test for m5

* refactor the field_json creation DRY

* - Added Kibana config

* friendly_name corrections

* linting and cleaning errors

* linting

* correction on friendly_names

* corrected friendly_name for test_util

* correction child care - number of months

* fixed a few more typos and some spacing. (#2767)

* fixed a few more typos and some spacing.

* fixed linting issues

* missed a spot.

---------

Co-authored-by: George Hudson <ghudson@teamraft.com>

* - Added basic security to Kibana/Elastic
- Added setup container to init elastic users, roles, and passwords

* - Remove debug code

* - change provider name

* - Updating settings to reference environment variables

* - Add elastic dependency

* - Fix network issue

* - Added bulk creation of elastic indices

* - Updated schemas to reference model based off of elastic document

* - Remove password auth from elastic/kibana

* - Fix tests

* - Fix lint

* - remove debug print

* Changes for fully local development
 - Enables direct frontend/backend communication sans Login.gov/Cloud.gov
 - Drives off new DEVELOPMENT env var
 - Pre-configures and disables frontend auth functionality
 - Testing based on new dev user
   - Install via web: ./manage.py generate_dev_user

* Reorganized front end logic on REACT_APP_DEVAUTH env var

* Reorganized backend logic on REACT_APP_DEVAUTH env var

* added is_superuser and is_staff attrs to dev user

* - Fix lint errors

* - Renaming variables to clarify things

* - fix lint

* DevAuth feature redesign inspired by Cypress
 - Initializing frontend w/POST /login/cypress: {devEmail, local-cypress-token}
 - Changed REACT_APP_DEVAUTH to provide the email of the desired dev user
 - Modified CustomAuthentication.authenticate to handle both known use cases
 - Added stt_id=31 to the initial dev user
 - Disabled ES disk threshold checking for local dev which blocked ES startup
 - Removed DevAuthentication and other now unnecessary code

* Fixed CustomAuthentication.authenticate return val for login.py use case

* Fixed CustomAuthentication.authenticate logging for login.py use case

* Removed unneeded permissions import

* Updates to REACT_APP_DEVAUTH env var settings
 - Enabled with an email address value
 - Disabled by default

* Restored support for CustomAuthentication.authenticate username keyword

* Modified CustomAuthentication.authenticate comment to satisfy flake8

* commit

* Revert "Modified CustomAuthentication.authenticate comment to satisfy flake8"

This reverts commit 761e4eb.

* Revert "Restored support for CustomAuthentication.authenticate username keyword"

This reverts commit 4bf8957.

* Revert "Updates to REACT_APP_DEVAUTH env var settings"

This reverts commit 7fc2a09.

* Revert "Removed unneeded permissions import"

This reverts commit c18383f.

* Revert "Fixed CustomAuthentication.authenticate logging for login.py use case"

This reverts commit 2b9b46f.

* Revert "Fixed CustomAuthentication.authenticate return val for login.py use case"

This reverts commit 97a0cf6.

* Revert "DevAuth feature redesign inspired by Cypress"

This reverts commit 1497d4a.

* Revert "commit"

This reverts commit a284856.

* Revert "added is_superuser and is_staff attrs to dev user"

This reverts commit 6ffbee8.

* Revert "Reorganized backend logic on REACT_APP_DEVAUTH env var"

This reverts commit 7fd7b4d.

* Revert "Reorganized front end logic on REACT_APP_DEVAUTH env var"

This reverts commit 32a4671.

* Revert "Changes for fully local development"

This reverts commit 556221b.

* asdf

* - Adding integration tests for elastic bulk doc creation

* Revert "asdf"

This reverts commit 26455b4.

* - fix lint

* fasdf

* - Added usage of document to tribal

* - Fixing error

---------

Co-authored-by: Mo Sohani <msohani@goraft.tech>
Co-authored-by: raftmsohani <97037188+raftmsohani@users.noreply.github.com>
Co-authored-by: George Hudson <georgehudson78@gmail.com>
Co-authored-by: George Hudson <ghudson@teamraft.com>
Co-authored-by: Thomas Tignor <thomas.tignor@QP9VN4FgnorRaft.fios-router.home>
Co-authored-by: Thomas Tignor <thomas.tignor@QP9VN4F4RH-thomastignor-Raft.local>
Co-authored-by: Andrew <84722778+andrew-jameson@users.noreply.github.com>
  • Loading branch information
8 people authored Jan 18, 2024
1 parent af10e1d commit d5a44ff
Show file tree
Hide file tree
Showing 35 changed files with 252 additions and 145 deletions.
15 changes: 14 additions & 1 deletion tdrs-backend/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,25 @@ services:
# Copy in the Localstack setup script to configure any buckets needed
- ../scripts/localstack-setup.sh:/docker-entrypoint-initaws.d/localstack-setup.sh

kibana:
image: elastic/kibana:7.17.10
ports:
- 5601:5601
environment:
- xpack.security.encryptionKey="something_at_least_32_characters"
- xpack.security.session.idleTimeout="1h"
- xpack.security.session.lifespan="30d"
volumes:
- ./kibana.yml:/usr/share/kibana/config/kibana.yml
depends_on:
- elastic

elastic:
image: elasticsearch:7.17.6
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- logger.discovery.level=debug
- xpack.security.enabled=false
ports:
- 9200:9200
- 9300:9300
Expand Down
2 changes: 2 additions & 0 deletions tdrs-backend/kibana.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
elasticsearch.hosts: ["http://elastic:9200"]
server.host: kibana
32 changes: 20 additions & 12 deletions tdrs-backend/tdpservice/parsers/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,22 +75,29 @@ def parse_datafile(datafile):

return errors


def bulk_create_records(unsaved_records, line_number, header_count, batch_size=10000, flush=False):
"""Bulk create passed in records."""
if (line_number % batch_size == 0 and header_count > 0) or flush:
logger.debug("Bulk creating records.")
try:
num_created = 0
num_expected = 0
for model, records in unsaved_records.items():
num_expected += len(records)
num_created += len(model.objects.bulk_create(records))
if num_created != num_expected:
logger.error(f"Bulk create only created {num_created}/{num_expected}!")
num_db_records_created = 0
num_expected_db_records = 0
num_elastic_records_created = 0
for document, records in unsaved_records.items():
num_expected_db_records += len(records)
created_objs = document.Django.model.objects.bulk_create(records)
num_elastic_records_created += document.update(created_objs)[0]
num_db_records_created += len(created_objs)
if num_db_records_created != num_expected_db_records:
logger.error(f"Bulk Django record creation only created {num_db_records_created}/" +
f"{num_expected_db_records}!")
elif num_elastic_records_created != num_expected_db_records:
logger.error(f"Bulk Elastic document creation only created {num_elastic_records_created}/" +
f"{num_expected_db_records}!")
else:
logger.info(f"Created {num_created}/{num_expected} records.")
return num_created == num_expected, {}
logger.info(f"Created {num_db_records_created}/{num_expected_db_records} records.")
return num_db_records_created == num_expected_db_records and \
num_elastic_records_created == num_expected_db_records, {}
except DatabaseError as e:
logger.error(f"Encountered error while creating datafile records: {e}")
return False, unsaved_records
Expand Down Expand Up @@ -127,7 +134,8 @@ def evaluate_trailer(datafile, trailer_count, multiple_trailer_errors, is_last_l
def rollback_records(unsaved_records, datafile):
"""Delete created records in the event of a failure."""
logger.info("Rolling back created records.")
for model in unsaved_records:
for document in unsaved_records:
model = document.Django.model
num_deleted, models = model.objects.filter(datafile=datafile).delete()
logger.debug(f"Deleted {num_deleted} records of type: {model}.")

Expand Down Expand Up @@ -218,7 +226,7 @@ def parse_datafile_lines(datafile, program_type, section, is_encrypted):
if record:
s = schema_manager.schemas[i]
record.datafile = datafile
unsaved_records.setdefault(s.model, []).append(record)
unsaved_records.setdefault(s.document, []).append(record)

all_created, unsaved_records = bulk_create_records(unsaved_records, line_number, header_count,)
unsaved_parser_errors, num_errors = bulk_create_errors(unsaved_parser_errors, num_errors)
Expand Down
8 changes: 4 additions & 4 deletions tdrs-backend/tdpservice/parsers/row_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ class RowSchema:

def __init__(
self,
model=dict,
document,
preparsing_validators=[],
postparsing_validators=[],
fields=[],
quiet_preparser_errors=False
quiet_preparser_errors=False,
):
self.model = model
self.document = document
self.preparsing_validators = preparsing_validators
self.postparsing_validators = postparsing_validators
self.fields = fields
Expand Down Expand Up @@ -90,7 +90,7 @@ def run_preparsing_validators(self, line, generate_error):

def parse_line(self, line):
"""Create a model for the line based on the schema."""
record = self.model()
record = self.document.Django.model() if self.document is not None else dict()

for field in self.fields:
value = field.parse_value(line)
Expand Down
2 changes: 1 addition & 1 deletion tdrs-backend/tdpservice/parsers/schema_defs/header.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@


header = RowSchema(
model=dict,
document=None,
preparsing_validators=[
validators.hasLength(
23,
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m1.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
from tdpservice.parsers.fields import Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.ssp import SSP_M1
from tdpservice.search_indexes.documents.ssp import SSP_M1DataSubmissionDocument

m1 = SchemaManager(
schemas=[
RowSchema(
model=SSP_M1,
document=SSP_M1DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(150),
],
Expand Down
11 changes: 5 additions & 6 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m2.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
from tdpservice.parsers.fields import TransformField, Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.ssp import SSP_M2
from tdpservice.search_indexes.documents.ssp import SSP_M2DataSubmissionDocument


m2 = SchemaManager(
schemas=[
RowSchema(
model=SSP_M2,
document=SSP_M2DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(150),
],
Expand Down Expand Up @@ -78,7 +78,7 @@
result_field='EDUCATION_LEVEL',
result_function=validators.or_validators(
validators.isInStringRange(1, 16),
validators.isInStringRange(98, 99)
validators.isInStringRange(98, 99),
),
),
validators.if_then_validator(
Expand Down Expand Up @@ -367,8 +367,7 @@
required=False,
validators=[
validators.or_validators(
validators.isInLimits(0, 16),
validators.isInLimits(98, 99)
validators.isInLimits(0, 16), validators.isInLimits(98, 99)
)
]
),
Expand Down Expand Up @@ -414,7 +413,7 @@
validators.or_validators(
validators.isInLimits(1, 4),
validators.isInLimits(6, 9),
validators.isInLimits(11, 12)
validators.isInLimits(11, 12),
)
]
),
Expand Down
6 changes: 3 additions & 3 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m3.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
from tdpservice.parsers.fields import TransformField, Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.ssp import SSP_M3
from tdpservice.search_indexes.documents.ssp import SSP_M3DataSubmissionDocument

first_part_schema = RowSchema(
model=SSP_M3,
document=SSP_M3DataSubmissionDocument(),
preparsing_validators=[
validators.notEmpty(start=19, end=60),
],
Expand Down Expand Up @@ -315,7 +315,7 @@
)

second_part_schema = RowSchema(
model=SSP_M3,
document=SSP_M3DataSubmissionDocument(),
quiet_preparser_errors=True,
preparsing_validators=[
validators.notEmpty(start=60, end=101),
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m4.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
from tdpservice.parsers.fields import Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.ssp import SSP_M4
from tdpservice.search_indexes.documents.ssp import SSP_M4DataSubmissionDocument

m4 = SchemaManager(
schemas=[
RowSchema(
model=SSP_M4,
document=SSP_M4DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(66),
],
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m5.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
from tdpservice.parsers.fields import TransformField, Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.ssp import SSP_M5
from tdpservice.search_indexes.documents.ssp import SSP_M5DataSubmissionDocument


m5 = SchemaManager(
schemas=[
RowSchema(
model=SSP_M5,
document=SSP_M5DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(66),
],
Expand Down
16 changes: 5 additions & 11 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m6.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
from ...fields import Field, TransformField
from ...row_schema import RowSchema
from ... import validators
from tdpservice.search_indexes.models.ssp import SSP_M6
from tdpservice.search_indexes.documents.ssp import SSP_M6DataSubmissionDocument

s1 = RowSchema(
model=SSP_M6,
document=SSP_M6DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(259),
],
Expand Down Expand Up @@ -170,7 +170,7 @@
)

s2 = RowSchema(
model=SSP_M6,
document=SSP_M6DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(259),
],
Expand Down Expand Up @@ -331,7 +331,7 @@
)

s3 = RowSchema(
model=SSP_M6,
document=SSP_M6DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(259),
],
Expand Down Expand Up @@ -492,10 +492,4 @@
)


m6 = SchemaManager(
schemas=[
s1,
s2,
s3
]
)
m6 = SchemaManager(schemas=[s1, s2, s3])
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/ssp/m7.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from ...row_schema import RowSchema
from ...transforms import calendar_quarter_to_rpt_month_year
from ... import validators
from tdpservice.search_indexes.models.ssp import SSP_M7
from tdpservice.search_indexes.documents.ssp import SSP_M7DataSubmissionDocument

schemas = []

Expand All @@ -20,7 +20,7 @@
for i in range(1, 31):
schemas.append(
RowSchema(
model=SSP_M7,
document=SSP_M7DataSubmissionDocument(),
quiet_preparser_errors=i > 1,
preparsing_validators=[
validators.hasLength(247),
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/tanf/t1.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
from tdpservice.parsers.fields import Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.tanf import TANF_T1
from tdpservice.search_indexes.documents.tanf import TANF_T1DataSubmissionDocument


t1 = SchemaManager(
schemas=[
RowSchema(
model=TANF_T1,
document=TANF_T1DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(156),
],
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/tanf/t2.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
from tdpservice.parsers.fields import TransformField, Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.tanf import TANF_T2
from tdpservice.search_indexes.documents.tanf import TANF_T2DataSubmissionDocument


t2 = SchemaManager(
schemas=[
RowSchema(
model=TANF_T2,
document=TANF_T2DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(156),
],
Expand Down
6 changes: 3 additions & 3 deletions tdrs-backend/tdpservice/parsers/schema_defs/tanf/t3.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
from tdpservice.parsers.fields import TransformField, Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.tanf import TANF_T3
from tdpservice.search_indexes.documents.tanf import TANF_T3DataSubmissionDocument


child_one = RowSchema(
model=TANF_T3,
document=TANF_T3DataSubmissionDocument(),
preparsing_validators=[
validators.notEmpty(start=19, end=60),
],
Expand Down Expand Up @@ -313,7 +313,7 @@
)

child_two = RowSchema(
model=TANF_T3,
document=TANF_T3DataSubmissionDocument(),
quiet_preparser_errors=True,
preparsing_validators=[
validators.notEmpty(start=60, end=101),
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/tanf/t4.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@
from tdpservice.parsers.fields import Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.tanf import TANF_T4
from tdpservice.search_indexes.documents.tanf import TANF_T4DataSubmissionDocument


t4 = SchemaManager(
schemas=[
RowSchema(
model=TANF_T4,
document=TANF_T4DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(71),
],
Expand Down
4 changes: 2 additions & 2 deletions tdrs-backend/tdpservice/parsers/schema_defs/tanf/t5.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
from tdpservice.parsers.fields import TransformField, Field
from tdpservice.parsers.row_schema import RowSchema
from tdpservice.parsers import validators
from tdpservice.search_indexes.models.tanf import TANF_T5
from tdpservice.search_indexes.documents.tanf import TANF_T5DataSubmissionDocument


t5 = SchemaManager(
schemas=[
RowSchema(
model=TANF_T5,
document=TANF_T5DataSubmissionDocument(),
preparsing_validators=[
validators.hasLength(71),
],
Expand Down
Loading

0 comments on commit d5a44ff

Please sign in to comment.