Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📐 Add ADR proposals #3107

Merged
merged 35 commits into from
Feb 15, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
6421207
:memo: Change 000 to use page page.title
bagg3rs Jan 17, 2024
0acbd58
Add ADR-007 and ADR-008 for consideration
bagg3rs Jan 17, 2024
ad1f39e
Update QuickSight documentation and add AWS Bedrock for language mode…
bagg3rs Jan 19, 2024
f6ea2c7
Update documentation and use separate AWS accounts for data storage
bagg3rs Jan 23, 2024
bbde7f5
Update formatting
bagg3rs Jan 23, 2024
cc31be8
Add SCP and OU info
bagg3rs Jan 27, 2024
0c7330d
Refactor AWS account structure for improved governance and security
bagg3rs Jan 27, 2024
78dc87d
Update vendor or partner access in ADR-011
bagg3rs Jan 27, 2024
8780929
Merge branch 'main' into add-adrs
bagg3rs Jan 27, 2024
62108e5
📝 ⬆️ ⬇️ move and 🔥 words
bagg3rs Jan 27, 2024
3d37b42
🔥 remove temp file
bagg3rs Jan 27, 2024
8f0027b
📝 clear up context
bagg3rs Jan 27, 2024
7b05da8
📝 tidy up
bagg3rs Jan 27, 2024
74e918d
🔄 accept bedrock 🤔 and formatting and clarity for quicksight
bagg3rs Jan 28, 2024
d03f740
🔄 clean up the mess a little more
bagg3rs Jan 28, 2024
eaffe9e
spelling
bagg3rs Jan 28, 2024
2516b30
Update ADR-008 AWS Bedrock documentation
bagg3rs Jan 28, 2024
f7c052b
fix linter issues
bagg3rs Jan 30, 2024
fcd9477
Bedrock status ✅ -> 🤔
bagg3rs Jan 31, 2024
9744ea6
Update ADR-009: Use AWS SageMaker for analytical tooling
bagg3rs Feb 2, 2024
218af22
Update ADR-009 to include benefits of using AWS SageMaker for analyti…
bagg3rs Feb 2, 2024
83edb30
Update tooling from EKS to AWS SageMaker for improved efficiency and …
bagg3rs Feb 2, 2024
7488003
Update ADR-009 remove duplicate consequences
bagg3rs Feb 3, 2024
06d5a5c
clarify
bagg3rs Feb 3, 2024
07ce3d8
⬆️ update review date
bagg3rs Feb 5, 2024
b5c5bc9
✅ accepted identity and updated consequences
bagg3rs Feb 5, 2024
6e024ee
spelling
bagg3rs Feb 5, 2024
07f9607
update review dates and rename Azure to EntraID
bagg3rs Feb 6, 2024
ba51c22
🚨 fix space
bagg3rs Feb 6, 2024
a922406
📝 Review and Update ADR-007 @julialawrence
bagg3rs Feb 12, 2024
ad67304
📝 Update ADR-009 to use AWS SageMaker for analytical tooling
bagg3rs Feb 13, 2024
09442f5
Update ADR-008: Add information about Amazon Bedrock
bagg3rs Feb 15, 2024
52f2b74
Add data sovereignty issue note to ADR-008 AWS Bedrock
bagg3rs Feb 15, 2024
68bdeef
Update ADR-009: Use AWS SageMaker for analytical tooling
bagg3rs Feb 15, 2024
51e6dd7
Merge branch 'main' into add-adrs
julialawrence Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ last_reviewed_on: 2023-08-17
review_in: 6 months
---

# ADR-000 Record Architecture Decisions
# <%= current_page.data.title %>

## Status

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
owner_slack: "#data-platform-notifications"
title: ADR-007 Use AWS QuickSight-for-data-visualisation
last_reviewed_on: 2024-01-17
review_in: 6 months
---

# <%= current_page.data.title %>

## Status

🤔 Proposed

## Context

We do not offer a managed data visualisation and reporting tool. Users need to build and run these applications themselves using [R](https://en.wikipedia.org/wiki/R_(programming_language)) or [Python](https://en.wikipedia.org/wiki/Python_(programming_language)). [PowerBI](https://en.wikipedia.org/wiki/Microsoft_Power_BI) comes part of our Microsoft 365 subscription, but connecting to data on our platform requires additional [infrastructure](https://docs.aws.amazon.com/whitepapers/latest/using-power-bi-with-aws-cloud/connecting-the-microsoft-power-bi-service-to-aws-data-sources.html).

If we offered AWS QuickSight, we can reduce our current support burden from new deployments, and give new and existing users simpler visualisation and reporting capabilities.

## Decision

- _proposed - We will offer AWS QuickSight to our users. QuickSight is fully managed and can be integrated into our identity management system._

## Consequences

### General consequences

- Users can build and share dashboards from data stored on our platform
- Operates on a pay-as-you-go [pricing](https://aws.amazon.com/quicksight/pricing/) model, which means we are billed based on actual usage
- QuickSight is designed to be user-friendly, but users might face issues when dealing with more advanced or complex use cases
- We will need to start a QuickSight community for users to help and share their experiences and knowledge
- There is already a public [QuickSight community](https://community.amazonquicksight.com/) and MoJ can get immersion days and free training for our users

### Advantages

- Serverless BI service, meaning we do not need to patch or maintain and [security and compliance](https://docs.aws.amazon.com/quicksight/latest/user/QS-compliance.html) is maintained by AWS
- User friendly interface and extensive online training materials, we won't need to produce extensive documentation to support, AWS provides many resources for building and sharing dashboards
- Reduced operational cost and complexity for users to create reports and visualisations
- Cost transparency, the total cost of ownership and management of RShiny and other hosted solutions is hard to calculate

### Disadvantages

- Creating [dashboards as code](https://github.com/aws-samples/amazon-quicksight-assets-as-code-sample) does not support all QuickSight resources and visual types
37 changes: 37 additions & 0 deletions docs/source/documentation/adrs/adr-008-aws-bedrock.html.md.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
owner_slack: "#data-platform-notifications"
title: ADR-008 AWS Bedrock
last_reviewed_on: 2024-01-17
review_in: 6 months
---

# <%= current_page.data.title %>

## Status

🤔 Proposed

## Context

Our users want to explore and leverage [large language model](https://en.wikipedia.org/wiki/Large_language_model) (LLM) for various use cases. Our platform lacks the resources required to run these models.

## Decision

We will offer [Amazon Bedrock](https://aws.amazon.com/bedrock/) to our users.
Amazon Bedrock is fully managed large language model, which offers many foundation models which be customised privately using techniques such as fine tuning and [retrieval-augmented generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html).

## Consequences

### General consequences

- Bedrock provides pre-trained models with limited ability to customise or tune the models
- Bedrock pricing is based on usage and can vary significantly month-to-month depending on your application's traffic and costs could spike unexpectedly. Usage is metered and billed per inference request based on factors like model used, input length, and response length
- Bedrock models are accessed via an API using AWS authentication keys

### Advantages

- Serverless access to large language models meaning that our platform and users don't need to manage and maintain infrastructure

### Disadvantages

- Limited model selection, Bedrock offers a few pre-trained models and new models can take time to reach all AWS regions
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
owner_slack: "#data-platform-notifications"
title: ADR-009 Use AWS Sagemaker for analytical tooling
last_reviewed_on: 2024-01-17
review_in: 6 months
---

# <%= current_page.data.title %>

## Status

🤔 Proposed

## Context

Our user want features not available on our existing platform. SageMaker provides a managed service for these tools and provides instances with higher resources and GPU to aid research.

## Decision

- _proposed - We will look to offer [Amazon SageMaker](https://aws.amazon.com/bedrock/) to our users_

## Proposal Consequences

- Leverage managed services like SageMaker Studio
- SageMaker cost is based on usage and can vary significantly month-to-month depending on your application's usage, instance type and costs could spike unexpectedly
- Improved Security, workloads will run in isolation mode (without access to internet) to further secure sensitive data
- Users can make use of services when available and require development of front and backend service
- Greater choice in open source Foundation Models and VPC isolation for highly sensitive data
- Reduced operational cost and complexity
- Agility and change readiness, additional analytical services can be offered when available
- Better cost transparency, we will understand our tooling compute costs
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
owner_slack: "#data-platform-notifications"
title: ADR-010 Documentation
last_reviewed_on: 2024-01-17
review_in: 6 months
---

# <%= current_page.data.title %>

## Status

🤔 Proposed

## Context

We need to document how our platform is built, and provide guidance where needed for our users.

We have many places to store documents and this creates a challenge for our existing and new members of our team.

Google Workspaces is being retired in favour or Office365 so we need an alternative for google docs.

## Decision

The following locations will be used for documenting all things related to the Data Platform.

### Team and Technical

>Team information, ways of working and ADRs should be stored in the open in our technical documentation [here](https://technical-documentation.data-platform.service.justice.gov.uk/)
>Documentation directly relating to code should be stored in a `README.MD` next to the code in its repository

### Sensitive information

>Sensitive information, or information on users of the platform which should be stored in our internal repository [here](https://github.com/ministryofjustice/data-platform-internal-documentation)

### Diagrams

>_diagrams_

## Consequences

>Since the majority of our documentation and guidance is published in the open, we need to ensure that we do not publish any sensitive details or user data in text or screenshots.
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
owner_slack: "#data-platform-notifications"
julialawrence marked this conversation as resolved.
Show resolved Hide resolved
title: ADR-011 Use separate AWS accounts for data domains and products
last_reviewed_on: 2024-01-17
review_in: 6 months
---

# <%= current_page.data.title %>

## Status

🤔 Proposed

## Context

The Data Platform will need to provide a secure location to store and share data to those who have been granted access. The use of a multi-account strategy will give the Data Platform a scalable storage architecture which adheres to the [AWS Well-Architected Framework](https://aws.amazon.com/architecture/well-architected/) pillars on operational excellence, security, reliability, and cost optimisation.

**/tldr**
Our current architecture is overly permissive in design and makes understanding responsibility and cost difficult.

Using separate AWS (Amazon Web Services) accounts for storing data will serve several purposes for MoJ, each contributing to improved governance, security and manageability.

## Decision

- _proposed_

## Proposal Consequences

### General consequences

- A shift in ownership and responsibility of cloud resources back to the teams that own the data
- We will need to understand what account owners need outside of single sign on, and account bootstrap
- Cost will be visible to owners and aligns with the Technology Code of Practice point 12, [make your service sustainable](https://www.gov.uk/guidance/the-technology-code-of-practice#make-your-technology-sustainable)
- Align with [NCSC cloud security guidance](https://www.ncsc.gov.uk/collection/cloud/the-cloud-security-principles/principle-3-separation-between-customers) on separation between customers (in our case domains) to defend against another customer having e.g. malicious code execution
- We will need to work with Modernisation Platform on improving our ability to offer the dispensing accounts, ensure that we do no impact their own operations or support
- We will define [Service Control Policies](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html) against [AWS Organizations](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_introduction.html)
- We will need to add functionality to our platform for users to request and manage data requests
- We will be able to give teams access to a project or temporary accounts for research (this could include other managed analytical tooling) which then can be securely closed down with all associated resources
- Observability of data is simplified for account owners

Using separate AWS (Amazon Web Services) accounts for storing data will serve several purposes for MoJ, each contributing to improved governance, security, manageability, and efficiency.

Other reasons for using separate AWS accounts for data storage:

1. **Security Isolation:**
- **Data Segmentation:** Different types of data may have varying sensitivity levels. By using separate accounts, you can isolate highly sensitive data from less critical information, reducing the risk of unauthorised access or data breaches.
- **Access Control:** AWS Identity and Access Management (IAM) allows fine-grained control over who can access resources within an AWS account. Using separate accounts allows for better control and segregation of access permissions, limiting potential security vulnerabilities.
2. **Compliance Requirements:**
- **Regulatory Compliance:** Certain industries and regions have specific regulatory requirements regarding data storage and processing. Using separate AWS accounts can help you adhere to these compliance standards by providing clear boundaries and controls around data.
3. **Resource Management:**
- **Isolation of Resources:** Different business units or projects within an organisation may require their own set of AWS resources. Using separate accounts makes it easier to manage and isolate these resources, preventing interference or resource contention.
- **Resource Scaling:** Each AWS account has its own resource limits and can be independently scaled. This allows for better resource optimisation and avoids the risk of reaching account-wide limits.
4. **Cost Management:**
- **Billing and Budgeting:** AWS provides detailed billing reports for each account. By using separate accounts, you can better track and allocate costs to specific projects, teams, or departments. This facilitates more accurate budgeting and financial management. Tags provide some of these capabilities but are limited in their scope as they cannot be applied to all resources.
5. **Disaster Recovery:**
- **Isolation for Redundancy:** In the event of a disaster, having data stored in separate AWS accounts can act as a form of redundancy. If one account experiences issues, the others may remain unaffected, providing a level of data resilience.
6. **Third-Party Access:**
- **Vendor or Partner Access:** If external vendors or partners need access to specific data or services, setting up a separate account for them. can facilitate controlled and secure access without compromising other data in that account. if further restrictions on data access is required [AWS Clean Rooms](https://docs.aws.amazon.com/clean-rooms/latest/userguide/what-is.html) can be explored
7. **Ownership**
- **Responsibility:** We need for our users to take responsibility for storing data, and to meet point 12 of the The Technology Code of Practice of [Make your technology sustainable](https://www.gov.uk/guidance/the-technology-code-of-practice#make-your-technology-sustainable) and to inform our users of the cost associated with storing data, which in our current architecture is very difficult to deduct.
7 changes: 6 additions & 1 deletion docs/source/documentation/adrs/adr-index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,15 @@ To understand why we are recording decisions and how we are doing it, please see
| ADR-000 | ✅ | [Record Architecture Decisions](/documentation/adrs/adr-000-record-architecture-decisions.html) |
| ADR-001 | ✅ | [Use Cloud Platform for hosting infrastructure](/documentation/adrs/adr-001-use-cloud-platform-for-hosting-infrastructure.html) |
| ADR-002 | ✅ | [Use Modernisation Platform for hosting infrastructure unsuitable for Cloud Platform](/documentation/adrs/adr-002-use-moderisation-platform-for-hosting-infrastructure-not-suitable-for-cloud-platform.html) |
| ADR-003 | 🤔 | [Use AzureAD for Identity and Access Managment](/documentation/adrs/adr-003-use-azuread-for-identity-and-access-management.html) |
| ADR-003 | 🤔 | [Use AzureAD for Identity and Access Management](/documentation/adrs/adr-003-use-azuread-for-identity-and-access-management.html) |
| ADR-004 | ✅ | [Data should be pushed into Data Products](/documentation/adrs/adr-004-data-should-be-pushed-into-data-products.html) |
| ADR-005 | ✅ | [Use AWS Secrets Manager for Secrets](/documentation/adrs/adr-005-use-aws-secrets-manager-for-secrets.html) |
| ADR-006 | ✅ | [Use GOV.UK Eleventy Plugin for user documentation and front door](/documentation/adrs/adr-006-use-gov-uk-eleventy-plugin-for-user-documentation-and-front-door.html) |
| ADR-007 | 🤔 | [Use AWS Quicksight for data visualisation](/documentation/adrs/adr-007-use-aws-quicksight-for-data-visualisation.html) |
| ADR-008 | 🤔 | [AWS Bedrock](/documentation/adrs/adr-008-aws-bedrock.html) |
| ADR-009 | 🤔 | [Use AWS SageMaker for analytical tooling](/documentation/adrs/adr-009-use-aws-sagemaker-for-analytical-tooling.html) |
| ADR-010 | 🤔 | [Documentation](/documentation/adrs/adr-010-documentation.html) |
| ADR-011 | 🤔 | [Use separate AWS accounts for data domains and products](/documentation/adrs/adr-011-use-separate-aws-accounts-for-data.html) |

**Statuses:**

Expand Down
Loading