Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dsaps to use s3 #37

Closed
wants to merge 8 commits into from
Closed

Conversation

jonavellecuerdo
Copy link

What does this PR do?

A few sentences describing the overall goals of the pull request's commits.
Why are we making these changes? Is there more work to be done to fully
achieve these goals?

Helpful background context

Describe any additional context beyond what the PR accomplishes if it is likely
to be useful to a reviewer.

Delete this section if it isn't applicable to the PR.

How can a reviewer manually see the effects of these changes?

Explain how to see the proposed changes in the application if possible.

Delete this section if it isn't applicable to the PR.

What are the relevant tickets?

Screenshots (if appropriate)

Delete this section if it isn't applicable to the PR.

Requires Database Migrations?

YES | NO

Includes new or updated dependencies?

YES | NO

@jonavellecuerdo jonavellecuerdo force-pushed the refactor-dsaps-to-use-s3 branch from 5df4372 to aa8acf3 Compare March 19, 2024 16:58
@jonavellecuerdo jonavellecuerdo changed the base branch from main to s3 March 19, 2024 16:59
Why these changes are being introduced:
* Consolidate S3 functions/methods into its own client class
and effectively distinguish current Client class as DSpaceClient.

How this addresses that need:
* Introduces a new class called S3Client to dsaps.models
* Replace helpers.create_file_list with S3Client.list_objects
* Pull out 's3_client' from Client

Side effects of this change:
* None

Relevant ticket(s):
* TBD
Why these changes are being introduced:
* The list of file objects returned by S3Client.list_objects
may need to be filtered based on prefix (i.e., a "subfolder" in
the S3 bucket). Additionally, the file names, identifiers, and
file object prefixes will need to be parsed from the returned
file object to create a Bitstream.

How this addresses that need:
* Add helper.filter_files_by_prefix to get files from specified
S3 bucket subfolders (i.e., prefixes)
* Update Item.bitstreams_in_directory to use new helper method

Side effects of this change:
* None

Relevant ticket(s):
* TBD
Why these changes are being introduced:
* Create a Config object to hold run configurations,
applying current standards for config management

How this addresses that need:
* Add Config module
* Migrate logger configs to new module and clean up cli.py
* Create method for loading source-specific configs

Side effects of this change:
* None

Relevant ticket(s):
* TBD
@jonavellecuerdo jonavellecuerdo force-pushed the refactor-dsaps-to-use-s3 branch from aa8acf3 to 1625a2f Compare March 19, 2024 17:00
@jonavellecuerdo jonavellecuerdo self-assigned this Mar 19, 2024
@jonavellecuerdo
Copy link
Author

Closing for now. Will reference for ideas re: optimizations!

@jonavellecuerdo jonavellecuerdo deleted the refactor-dsaps-to-use-s3 branch March 21, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant