Skip to content

Commit

Permalink
note updates, mostly about cloud storage, ref #2
Browse files Browse the repository at this point in the history
  • Loading branch information
phette23 committed Aug 24, 2023
1 parent ceaa478 commit fe4b54e
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 4 deletions.
33 changes: 33 additions & 0 deletions notes/configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,39 @@ To give an account admin permissions, run: `pipenv run invenio roles add <email>

See the [SAML Integration](https://inveniordm.docs.cern.ch/customize/authentication/#saml-integration) documentation.

## Storage

Invenio works with Amazon S3. We use a Google Storage Bucket with some interoperability considerations.

- Use appropriate Google Cloud project (e.g. staging versus prod)
- Under Cloud Storage > Buckets, create a storage bucket with Standard storage class and no public access. Invenio runs requests for files through the application, so we can have private items.
- @TODO should we use Autoclass instead of Standard? Is it worth it? Pending research.
- @TODO Object protection measures. If we use, for instance, object versioning do we need fewer backups?
- Under IAM > Service Accounts, create a service account with no project-level permissions and no user access, then go to the bucket you created > Permissions > Grant Access and enter the service account, give it Storage Object Admin role
- Create a [HMAC key](https://cloud.google.com/storage/docs/authentication/hmackeys) for the service account, save the key and secret to Dashlane (**this is the only time the secret is shown**)
- Add S3 storage configuration to invenio.cfg (see below)

```ini
# Invenio-Files-Rest
# ==================
FILES_REST_STORAGE_FACTORY='invenio_s3.s3fs_storage_factory'

# Invenio-S3
# ==========
S3_ENDPOINT_URL=f'https://storage.googleapis.com/BUCKET_NAME'
S3_ACCESS_KEY_ID='HMAC key'
S3_SECRET_ACCESS_KEY='HMAC secret'

# Allow S3 endpoint in the CSP rules
APP_DEFAULT_SECURE_HEADERS['content_security_policy']['default-src'].append(
S3_ENDPOINT_URL
)
```

The .invenio file also has `file_storage = S3` but that file might just be used when invenio-cli bootstraps a new instance.

@TODO When choose S3 storage during `invenio-cli init` you get a Minio service too, we need to [follow the steps](https://inveniordm.docs.cern.ch/customize/s3/#set-your-minio-credentials) to change the admin account credentials and hook it up to GSB.

## Custom Fields

Simplest: https://inveniordm.docs.cern.ch/customize/custom_fields/records/
Expand Down
2 changes: 1 addition & 1 deletion notes/develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Invenio initializes fixtures (basically, the static app_data files) asynchronous

Once running, visit https://127.0.0.1:5000 in a web browser. **Note**: The server is using a self-signed SSL certificate, so your browser will issue a warning that you will have to by-pass.

The super admin is vault@cca.edu with password "password", this comes from app_data/users.yaml.
The super admin is vault@cca.edu with password "password", this comes from app_data/users.yaml. You may need to `invenio users activate vault@cca.edu` the admin account.

## Theme & Templates

Expand Down
7 changes: 4 additions & 3 deletions notes/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This document is about managing a running Invenio instance. See **Getting Starte
| Elasticsearch | http://localhost:9200/_cat/indices?v |
| Postgres db | localhost:5432 | username, password, & db name are all "invenio-vault", run `./notes/code-samples/dbconnect`
| pgAdmin (db) | http://127.0.0.1:5050/login | credentials "ephetteplace@cca.edu/invenio-vault" or look in docker-services.yml
| Minio | http://localhost:9001/browser | credentials "CHANGE_ME/CHANGE_ME"
| API | https://127.0.0.1:5000/api/records | same port as app if running locally

The Postgres database is another service but is not exposed, use pgAdmin to interact with it.
Expand All @@ -20,11 +21,11 @@ You may need to set the postgres host to "host.docker.internal" e.g. in docker/p

If you're running the app locally the main URLs (for website and REST API) are localhost:5000 while if you run the fully containerized app then you do not need the port and the website, background worker, and API are all on different containers. Each of these three has the application code, but there are no static files for the worker & API.

## Elasticsearch vs. OpenSearch
## OpenSearch vs. Elasticsearch

The project is transitioning from Elasticsearch (licensing concerns) to OpenSearch (AWS fork of ES). We may want to stick with ES anyways. @TODO confirm ES will be supported going forward
The project is transitioning from Elasticsearch to OpenSearch (AWS fork of ES with more permissive licensing). ES will not be supported in a future version of InvenioRDM.

The instructions on ES docker configuration are outdated, link to this ES which is more current https://www.elastic.co/guide/en/elasticsearch/reference/7.9/docker.html#docker-prod-prerequisites but `docker-machine` is deprecated and even after installing it I'm not able to successfully run `docker-machine ssh`
I've been able to ignore the extra setup instructions on configuring Docker to work with ES, which were outdated and did not work anyways.

## Setup Troubles

Expand Down

0 comments on commit fe4b54e

Please sign in to comment.