Skip to content

Commit

Permalink
Documentation: correct grammar & rewording for better readability (pa…
Browse files Browse the repository at this point in the history
…rt 1) (#2781)
  • Loading branch information
alexott authored Oct 19, 2023
1 parent cff5d80 commit 6f70477
Show file tree
Hide file tree
Showing 12 changed files with 191 additions and 190 deletions.
36 changes: 18 additions & 18 deletions docs/guides/aws-e2-firewall-hub-and-spoke.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ page_title: "Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data

# Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection

You can provision multiple Databricks workspaces with Terraform and where many Databricks workspaces are deployed, we recommend a hub and spoke topology reference architecture, powered by AWS Transit Gateway. The hub will consist of a central inspection and egress virtual private cloud (VPC), while the Spoke VPC houses federated Databricks workspaces for different business units or segregated teams. In this way, you create your own version of a centralized deployment model for your egress architecture, as is recommended for large enterprises. For more information please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html).
You can provision multiple Databricks workspaces with Terraform, and where many Databricks workspaces are deployed, we recommend a hub and spoke topology reference architecture powered by AWS Transit Gateway. The hub will consist of a central inspection and egress virtual private cloud (VPC), while the Spoke VPC houses federated Databricks workspaces for different business units or segregated teams. In this way, you create your version of a centralized deployment model for your egress architecture, as is recommended for large enterprises. For more information, please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html).

![Data Exfiltration](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-exfiltration-replace-1.png)

## Provider initialization for E2 workspaces

This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret and `databricks_account_id` which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup.. If you are using AWS Firewall to block most traffic but allow the URLs that Databricks needs to connect to please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document.
This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret, and `databricks_account_id`, which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. If you use AWS Firewall to block most traffic but allow the URLs to which Databricks needs to connect, please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document.

```hcl
variable "client_id" {}
Expand Down Expand Up @@ -85,7 +85,7 @@ Before [managing workspace](workspace-management.md), you have to create:
- [Databricks E2 workspace](aws-workspace.md#databricks-e2-workspace)
- [Host and Token outputs](aws-workspace.md#provider-configuration)

> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within it's own dedicated terraform module of your environment. Usually this module creates VPC and IAM roles as well.
> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within its own dedicated terraform module of your environment. Usually, this module creates VPC and IAM roles as well.
```hcl
terraform {
Expand Down Expand Up @@ -120,7 +120,7 @@ The very first step is Hub & Spoke VPC creation. Please consult [main documentat

### Spoke VPC for Databricks Workspace

First step is to create Spoke VPC which houses federated Databricks workspaces for different business units or segregated teams.
The first step is to create a Spoke VPC, which houses federated Databricks workspaces for different business units or segregated teams.

![SpokeVPC](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-spoke-vpc.png)

Expand Down Expand Up @@ -198,8 +198,8 @@ Security groups must have the following rules:

***Ingress (inbound):***:

- Allow TCP on all ports when traffic source uses the same security group
- Allow UDP on all ports when traffic source uses the same security group
- Allow TCP on all ports when the traffic source uses the same security group
- Allow UDP on all ports when the traffic source uses the same security group

```hcl
/* VPC's Default Security Group */
Expand Down Expand Up @@ -260,7 +260,7 @@ resource "databricks_mws_networks" "this" {

### VPC Endpoint for Spoke VPC

For STS, S3 and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network, for more direct connections and reduced cost compared to AWS global endpoints.
For STS, S3, and Kinesis, it's important to create a VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network for more direct connections and reduced cost compared to AWS global endpoints.

```hcl
/* Create VPC Endpoint */
Expand Down Expand Up @@ -307,7 +307,7 @@ module "vpc_endpoints" {

### Hub VPC

The hub will consist of a central inspection and egress virtual private cloud (VPC). We're going to create a central inspection/egress VPC, which once we’ve finished should look like this:
The hub will consist of a central inspection and egress virtual private cloud (VPC). We're going to create a central inspection/egress VPC, which, once we’ve finished, should look like this:

![HubVPC](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-hub-vpc.png)

Expand Down Expand Up @@ -385,7 +385,7 @@ resource "aws_nat_gateway" "hub_nat" {

### Route Tables for Hub

Next, we're going to create route tables for Hub VPC subnets, NAT gateway, Internet Gateway and add some routes.
Next, we will create route tables for Hub VPC subnets, NAT gateway, and Internet Gateway and add some routes.

```hcl
/* Routing table for hub private subnet */
Expand Down Expand Up @@ -467,8 +467,8 @@ resource "aws_main_route_table_association" "set-worker-default-rt-assoc" {
## AWS Transit Gateway

Now that our spoke and inspection/egress VPCs are ready to go, all you need to do is link them all together, and AWS Transit Gateway is the perfect solution for that.
First, we're going to create a Transit Gateway and link our Databricks data plane via TGW subnets.
All of the logic that determines what routes are going via a Transit Gateway is encapsulated within Transit Gateway Route Tables. We’re going to create some TGW routes tables for our Hub & Spoke networks.
First, we will create a Transit Gateway and link our Databricks data plane via TGW subnets.
All of the logic that determines what routes are going via a Transit Gateway is encapsulated within Transit Gateway Route Tables. We will create some TGW route tables for our Hub & Spoke networks.

![TransitGateway](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-tgw.png)

Expand Down Expand Up @@ -518,7 +518,7 @@ resource "aws_ec2_transit_gateway_vpc_attachment" "spoke" {

### Route Table Configurations for Transit Gateway

The Transit Gateway should be set up and ready to go, now all that needs to be done is update the route tables in each of the subnets so that traffic flows through it.
The Transit Gateway should be set up and ready to go. Now, all that needs to be done is update the route tables in each subnet so traffic flows through it.

```hcl
# Create Route to Internet
Expand Down Expand Up @@ -559,7 +559,7 @@ Once [VPC](#vpc) is ready, we're going to create AWS Network Firewall for your V

### AWS Firewall Rule Groups

First we're going to create a Firewall Rule group for accessing hive metastore and public repositories.
First, we will create a Firewall Rule group for accessing hive metastore and public repositories.

```hcl
/*Firewall Rule group for accessing hive metastore and public repositories*/
Expand Down Expand Up @@ -589,7 +589,7 @@ resource "aws_networkfirewall_rule_group" "databricks_fqdns_rg" {
```

As next step, we're going to create Firewall Rule group that allows control plane traffic from the VPC.
As the next step, we will create a Firewall Rule group that allows control plane traffic from the VPC.

```hcl
locals {
Expand Down Expand Up @@ -636,7 +636,7 @@ resource "aws_networkfirewall_rule_group" "allow_db_cpl_protocols_rg" {
```

Next, we're going to create basic deny rules to cater for common firewall scenarios such as preventing the use of protocols like SSH/SFTP, FTP and ICMP.
Next, we will create basic deny rules to cater for common firewall scenarios, such as preventing the use of protocols like SSH/SFTP, FTP, and ICMP.

```hcl
/* Firewall Rule group for dropping ICMP, FTP, SSH*/
Expand Down Expand Up @@ -681,7 +681,7 @@ resource "aws_networkfirewall_rule_group" "deny_protocols_rg" {

### AWS Network Firewall Policy

Now we can create AWS Firewall Policy and include stateful firewall rule groups created in previous steps.
Now, we can create an AWS Firewall Policy and include stateful firewall rule groups created in previous steps.

```hcl
resource "aws_networkfirewall_firewall_policy" "egress_policy" {
Expand All @@ -705,7 +705,7 @@ resource "aws_networkfirewall_firewall_policy" "egress_policy" {

### AWS Firewall

Next step is to create an AWS Network Firewall with the Firewall Policy we defined in the previous step.
The next step is to create an AWS Network Firewall with the Firewall Policy we defined in the previous step.

```hcl
/* Create Firewall*/
Expand Down Expand Up @@ -736,7 +736,7 @@ data "aws_vpc_endpoint" "firewall" {
```

Finally, AWS Network Firewall is now deployed and configured, all you need to do now is route traffic to it.
Finally, the AWS Network Firewall is now deployed and configured; all you need to do now is route traffic to it.

```hcl
/* Add Route from Nat Gateway to Firewall */
Expand Down
30 changes: 15 additions & 15 deletions docs/guides/aws-e2-firewall-workspace.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ page_title: "Provisioning AWS Databricks E2 with a AWS Firewall"

# Provisioning AWS Databricks E2 with a AWS Firewall

You can provision multiple Databricks workspaces with Terraform. This example shows how to deploy a Databricks workspace into a VPC which uses AWS Network firewall to manage egress out to the public network. For smaller Databricks deployments this would be our recommended configuration. For larger deployments see [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](aws-e2-firewall-hub-and-spoke.md).
You can provision multiple Databricks workspaces with Terraform. This example shows how to deploy a Databricks workspace into a VPC, which uses an AWS Network firewall to manage egress out to the public network. For smaller Databricks deployments, this is our recommended configuration; for larger deployments, see [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](aws-e2-firewall-hub-and-spoke.md).

For more information please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html).
For more information, please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html).

![Data Exfiltration_Workspace](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-workspace.png)

## Provider initialization for E2 workspaces

This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret and `databricks_account_id` which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. If you are using AWS Firewall to block most traffic but allow the URLs that Databricks needs to connect to please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document.
This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret, and `databricks_account_id`, which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. If you are using AWS Firewall to block most traffic but allow the URLs that Databricks needs to connect to, please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document.

```hcl
variable "client_id" {}
Expand Down Expand Up @@ -83,7 +83,7 @@ Before [managing workspace](workspace-management.md), you have to create:
- [Databricks E2 workspace](aws-workspace.md#databricks-e2-workspace)
- [Host and Token outputs](aws-workspace.md#provider-configuration)

> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within it's own dedicated terraform module of your environment. Usually this module creates VPC and IAM roles as well.
> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within its own dedicated terraform module of your environment. Usually, this module creates VPC and IAM roles as well.
```hcl
terraform {
Expand Down Expand Up @@ -206,8 +206,8 @@ Security groups must have the following rules:

***Ingress (inbound):*** Required for all workspaces (these can be separate rules or combined into one):

- Allow TCP on all ports when traffic source uses the same security group
- Allow UDP on all ports when traffic source uses the same security group
- Allow TCP on all ports when the traffic source uses the same security group
- Allow UDP on all ports when the traffic source uses the same security group

```hcl
/* VPC's Default Security Group */
Expand Down Expand Up @@ -268,7 +268,7 @@ resource "databricks_mws_networks" "this" {

### Route Tables

Next, we're going to create route tables for VPC subnets, NAT gateway, Internet Gateway and add some routes.
Next, we will create route tables for VPC subnets, NAT gateway, and Internet Gateway and add some routes.

```hcl
/* Routing table for private subnet */
Expand Down Expand Up @@ -349,7 +349,7 @@ resource "aws_main_route_table_association" "set-worker-default-rt-assoc" {

### VPC Endpoints

For STS, S3 and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network, for more direct connections and reduced cost compared to AWS global endpoints.
For STS, S3, and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network for more direct connections and reduced cost compared to AWS global endpoints.

```hcl
module "vpc_endpoints" {
Expand Down Expand Up @@ -396,11 +396,11 @@ module "vpc_endpoints" {

## AWS Network Firewall

Once [VPC](#vpc) is ready, create AWS Network Firewall for your VPC that restricts outbound http/s traffic to an approved set of Fully Qualified Domain Names (FQDNs).
Once [VPC](#vpc) is ready, create an AWS Network Firewall for your VPC that restricts outbound http/s traffic to an approved set of Fully Qualified Domain Names (FQDNs).

### AWS Firewall Rule Groups

First we are going to create a Firewall Rule group for accessing hive metastore and public repositories.
First, we will create a Firewall Rule group for accessing hive metastore and public repositories.

```hcl
resource "aws_networkfirewall_rule_group" "databricks_fqdns_rg" {
Expand Down Expand Up @@ -428,7 +428,7 @@ resource "aws_networkfirewall_rule_group" "databricks_fqdns_rg" {
}
```

As the next step, we are going to create a Firewall Rule group that allows control plane traffic from the VPC.
As the next step, we will create a Firewall Rule group that allows control plane traffic from the VPC.

```hcl
resource "aws_networkfirewall_rule_group" "allow_db_cpl_protocols_rg" {
Expand Down Expand Up @@ -474,7 +474,7 @@ locals {
}
```

Finally, we are going to add some basic deny rules to cater for common firewall scenarios such as preventing the use of protocols like SSH/SFTP, FTP and ICMP.
Finally, we will add some basic deny rules to cater for common firewall scenarios, such as preventing the use of protocols like SSH/SFTP, FTP, and ICMP.

```hcl
/* Firewall Rule group for dropping ICMP, FTP, SSH*/
Expand Down Expand Up @@ -519,7 +519,7 @@ resource "aws_networkfirewall_rule_group" "deny_protocols_rg" {

### AWS Network Firewall Policy

First, we are going to create AWS Firewall Policy and include stateful firewall rule groups created in previous steps.
First, we will create an AWS Firewall Policy and include stateful firewall rule groups created in previous steps.

```hcl
resource "aws_networkfirewall_firewall_policy" "egress_policy" {
Expand All @@ -543,7 +543,7 @@ resource "aws_networkfirewall_firewall_policy" "egress_policy" {

### AWS Firewall

As the next step, we are going to create an AWS Network Firewall with the Firewall Policy that we defined in the previous step.
As the next step, we will create an AWS Network Firewall with the Firewall Policy we defined in the previous step.

```hcl
resource "aws_networkfirewall_firewall" "exfiltration_firewall" {
Expand Down Expand Up @@ -573,7 +573,7 @@ data "aws_vpc_endpoint" "firewall" {
```

Finally, AWS Network Firewall is now deployed and configured, all you need to do now is route traffic to it.
Finally, the AWS Network Firewall is now deployed and configured - all you need to do now is route traffic to it.

```hcl
/* Add Route from Nat Gateway to Firewall */
Expand Down
Loading

0 comments on commit 6f70477

Please sign in to comment.