Skip to content

This repository contains terraform code to deploy Databricks workspace for training purpose in Azure.

License

Notifications You must be signed in to change notification settings

WilliamWsyHK/AzureDatabricksTrainingWorkspacePreparation-Public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository contains terraform code to deploy Databricks workspace for training purpose in Azure.

Resources to be created by this script

  1. Microsoft Entra ID Users and Groups (region-agnostic)
    • Instructors
    • Students
  2. Azure Storage Account for Databricks Unity Catalog (region-specific)
    • Important! One Azure region can only setup one Databricks Unity Catalog. If you want to reuse the existing Databricks Unity Catalog, then change the terraform code accordingly.
  3. Azure Databricks Workspace (region-specific)
  4. Azure Databricks Clusters
    • Instructors' Clusters
      • Data Engineering
      • Machine Learning
    • Students' Clusters
      • Data Engineering
      • Machine Learning
  5. Azure Databricks Training Materials ((c) Databricks)

Required Azure resources and accesses

  1. Azure Service Principal with access granted below.
    • Domain.Read.All
    • Group.ReadWrite.All
    • User.ReadWrite.All
  2. Azure Subscription with resource provider registered below.
    • Microsoft.Compute
    • Microsoft.Databricks
    • Microsoft.ManagedIdentity
    • Microsoft.Storage
  3. The Azure Service Principal from step 2 has access to manage resources in Azure Subscription from step 3.
  4. Databricks account on Azure (can be found with link here), which is already created by following this documentation.
  5. Databricks Group Databricks Unity Catalog Administrators (this is created separately from this project).
  6. Azure Service Principal have been added to Databricks Account.

Preparing secrets.tfvars for deploying with Service Principal

region = "<Azure region>"
tenant_id = "<Azure tenant ID>"
subscription_id = "<Azure subscription ID that contains all resources>"
client_id = "<Azure client (app) ID>"
client_secret = "<Azure client (app) secret>"
databricks_account_id = "<Azure Databricks account ID>"

Deployment Steps

  1. Install Azure CLI az & terraform
  2. Login Azure CLI, run az login --service-principal -u <app-id> -p <password-or-cert> --tenant <tenant-id>
  3. cd to the correct sub-folder first, e.g. cd ./20231101
  4. Install terraform providers, run terraform init
  5. Check and see if there is anything wrong, run terraform plan -var-file='<file>.tfvars' -out='<file>.tfplan'
  6. Deploy the infra, run terraform apply '<file>.tfplan'
  7. To remove the whole deployment, run terraform plan -destroy -var-file='<file>.tfvars' -out='<file-destroy>.tfplan' and then terraform apply '<file-destroy>.tfplan'

Caveats

In region eastasia, there is an issue to create Unity Catalog directly with terraform, thus requires manual creation in Databricks Account page, and then terraform import -var-file='<file>.tfvars' module.databricks.databricks_metastore.this '<metastore_id>'

Databricks users

The user list can be modified to suit your needs, e.g. number of users required. As this repository is served for creating training workspace, therefore the users are divided into 2 groups, Instructors and Students. The example format of the users are student01.databricks.<training-date-yyyyMMdd>@<your Azure domain>

Reference

Pre-requisite steps documents are listed in the links below.

Links

Terraform Providers

  • hashicorp/azuread
  • hashicorp/azurerm
  • databricks/databricks

About

This repository contains terraform code to deploy Databricks workspace for training purpose in Azure.

Topics

Resources

License

Stars

Watchers

Forks

Languages