-
Create a new resource group
-
In the resource group, create a regular blob storage account. In the storage account, create two private containers named
staging
andmodels
. -
In the resource group, create an empty Azure Synapse Analytics workspace.
-
Create the following file systems in the primary storage account of the workspace:
dev
,staging
, andwwi
. -
Create a linked service to the first pool of the workspace. Configure it to connect with username and password and use the workspace's SQL admin account credentials. It is recommended to name the linked service
sqlpool01
to simplify importing datasets, data flows, and pipelines later. -
Create a linked service to the primary storage account. Configure it to connect with the account key. It is recommended to name the linked service
asadatalake01
to simplify importing datasets, data flows, and pipelines later. -
Create a linked service to the blob storage account. Configure it to connect with the account key. It is recommended to name the linked service
asastore01
to simplify importing datasets, data flows, and pipelines later. -
Create an Apache Spark pool. Ensure Apache Spark version is set to 3.0.
-
For the remainder of this guide, the following terms will be used for various ASA-related resources (make sure you replace them with actual names and values):
ASA resource To be referred to as Workspace resource group WorkspaceResourceGroup
Workspace Workspace
Identity used to create Workspace
MasterUser
Primary storage account PrimaryStorage
Blob storage account BlobStorage
First Spark pool SparkPool01
First SQL pool SQLPool01
SQL admin account asa.sql.admin
Linked service to first SQL pool sqlpool01
Linked service to primary storage account asadatalake01
Linked service to blob storage account asastore01
-
Ensure the
Workspace
security principal (which has the same name as theWorkspace
) and theMasterUser
(the one used to create theWorkspace
) are added with theStorage Blob Data Owner
role to thePrimaryStorage
.
-
Create a folder named
bronze
in thedev
file system ofPrimaryStorage
. -
Create a folder named
sentiment
in thebronze
folder in thedev
file system ofPrimaryStorage
. -
Upload the following data files to the
bronze
folder created above:
-
Navigate to Create Cognitive Services page and fill in the appropriate values exemplified below to create an All-In-One Cognitive Services account.
-
Go to Cognitive Service > Keys and Endpoint (1). Copy Key 1 (2) and Endpoint (3) to a text editor of your choice to be used in the following steps.
-
Navigate to Create Key Vault page and fill in the appropriate values exemplified below to create a Key Vault account.
-
Go to Key Vault > Access policies (1), and grant (2) the Azure Synapse workspace Managed Service Identity permissions (3) to read secrets from Azure Key Vault.
-
Go to Key Vault > Secret to create a new secret. Select +Generate/Import to continue.
-
Set the secret's name to CognitiveKey, and then paste the key from the previous step into the Value field. Finally, select Create.
-
In the Synapse Workspace, create an Azure Key Vault linked service by pointing to the key vault you just created. It is recommended to name the linked service
KeyVault
. Verify the connection by selecting the Test connection button. If the connection is green, select Create and then select Publish all to save your change. -
In the Synapse Workspace, create an Azure Cognitive Service linked service by pointing to the cognitive service and key vault you just created. Make sure Secret name is set to CognitiveKey. It is recommended to name the linked service
CognitiveService
. Select Create (2) and then select Publish all to save your change. -
In the Synapse Workspace, create a REST linked service. Set Base URL to the previously copied Cognitive Service Endpoint value. Set Authentication Type to Anonymous (3). It is recommended to name the linked service
CognitiveRESTEndpoint
. Select Create (2) and then select Publish all to save your change. -
Create a new, empty dataset with the
cognitive_rest_dataset
name. -
Switch to code view and replace the code with the content of the cognitive_rest_dataset.json JSON file.
-
If the name used for the linked service to the REST Linked Service is not
CognitiveRESTEndpoint
, replace theproperties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset.
-
Open the Exercise 2 - Enrich Data.json file in a text editor. Find the
REPLACE-WITH-YOUR-COGNITIVE-KEY
text and replace it with the previously copied Cognitive Service Key 1. Save the file for future use. This is the new pipeline definition file you will when creating pipeline during the following steps.
Perform the following steps for each dataset to be imported:
-
Create a new, empty dataset with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
If the name used for the linked service to
PrimaryStorage
is notasadatalake01
, replace theproperties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset. Optionally, you can publish all datasets at once at the end of the import procedure.
The following datasets pointing to PrimaryStorage
must be imported:
Perform the following steps for each dataset to be imported:
-
Create a new, empty dataset with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
If the name used for the linked service to
SQLPool01
is notsqlpool01
, replace theproperties.linkedServiceName.referenceName
value in JSON with the actual name of the linked service. -
Save and publish the dataset. Optionally, you can publish all datasets at once at the end of the import procedure.
The following datasets pointing to SQLPool01
must be imported:
Dataset | Source code |
---|---|
wwi_dimcity_asa |
wwi_dimcity_asa.json |
wwi_dimcustomer_asa |
wwi_dimcustomer_asa.json |
wwi_dimdate_asa |
wwi_dimdate_asa.json |
wwi_dimemployee_asa |
wwi_dimemployee_asa.json |
wwi_dimpaymentmethod_asa |
wwi_dimpaymentmethod_asa.json |
wwi_dimstockitem_asa |
wwi_dimstockitem_asa.json |
wwi_dimsupplier_asa |
wwi_dimsupplier_asa.json |
wwi_dimtransactiontype_asa |
wwi_dimtransactiontype_asa.json |
wwi_factmovement_asa |
wwi_factmovement_asa.json |
wwi_factorder_asa |
wwi_factorder_asa.json |
wwi_factpurchase_asa |
wwi_factpurchase_asa.json |
wwi_factsale_asa |
wwi_factsale_asa.json |
wwi_factstockholding_asa |
wwi_factstockholding_asa.json |
wwi_facttransaction_asa |
wwi_facttransaction_asa.json |
wwi_perf_factsale_fast_asa |
wwi_perf_factsale_fast_asa.json |
wwi_perf_factsale_slow_asa |
wwi_perf_factsale_slow_asa.json |
wwi_staging_dimcustomer_asa |
wwi_staging_dimcustomer_asa.json |
wwi_staging_enrichedcustomer_asa |
wwi_staging_enrichedcustomer_asa.json |
Perform the following steps for each data flow to be imported:
-
Create a new, empty data flow with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
Save and publish the data flow. Optionally, you can publish all data flows at once at the end of the import procedure.
The following data-flows must be imported:
Data flow | Source code |
---|---|
EnrichCustomerData |
EnrichCustomerData.json |
Perform the following steps for each pipeline to be imported:
-
Create a new, empty pipeline with the same name as the one to be imported.
-
Switch to code view and replace the code with the content of the associated JSON file.
-
Save and publish the pipeline. Optionally, you can publish all pipelines at once at the end of the import procedure.
The following pipelines must be imported:
Pipeline | Source code |
---|---|
Exercise 2 - Enrich Data * |
Exercise 2 - Enrich Data.json |
Import WWI Data |
Import WWI Data.json |
Import WWI Data - Fact Sale Full |
Import WWI Data - Fact Sale Full.json |
Import WWI Perf Data - Fact Sale Fast |
Import WWI Perf Data - Fact Sale Fast.json |
Import WWI Perf Data - Fact Sale Slow |
Import WWI Perf Data - Fact Sale Slow.json |
* Make sure you are using the updated Pipeline file where you replaced the REPLACE-WITH-YOUR-COGNITIVE-KEY
value with the actual Cognitive Services Key 1 value.
-
Import the Setup - Export Sales to Data Lake notebook.
-
Replace
<primary_storage>
with the actual data lake account name ofPrimaryStorage
in cells 1, 3, 4 and 5. -
Run the notebook to populate
PrimaryStorage
with data.
-
Create a SQL on-demand database running the following script on the
master
database of the SQL on-demand pool:CREATE DATABASE SQLOnDemand01
-
Ensure the SQL on-demand pool can query the storage account using the following script:
CREATE CREDENTIAL [https://<primary_storage>.dfs.core.windows.net] WITH IDENTITY='User Identity';
In the script above, replace
<primary_storage>
with the name ofPrimaryStorage
.
-
Connect with either the SQL Active Directory admin or the
asa.sql.admin
account toSQLPool01
using the tool of your choice. -
Run the WWI-Reset-SQLPoolSchema SQL script to initialize the schema of the SQL pool.
-
Create the
asa.sql.staging
login in themaster
database ofWorkspace
using the following script:CREATE LOGIN [asa.sql.staging] WITH PASSWORD = '<password>' GO
In the script above, replace
<password>
with the actual password of the login. -
Create the
asa.sql.staging
user inSQLPool01
using the following script:CREATE USER [asa.sql.staging] FOR LOGIN [asa.sql.staging] WITH DEFAULT_SCHEMA = dbo GO -- Add user to the required roles EXEC sp_addrolemember N'db_datareader', N'asa.sql.staging' GO EXEC sp_addrolemember N'db_datawriter', N'asa.sql.staging' GO EXEC sp_addrolemember N'db_ddladmin', N'asa.sql.staging' GO
-
Configure access control to workspace pipeline runs in
SQLPool01
using the following script:--Create user in DB CREATE USER [<workspace>] FROM EXTERNAL PROVIDER; --Granting permission to the identity GRANT CONTROL ON DATABASE::<sqlpool> TO [<workspace>];
In the script above, replace
<workspace>
with the actual name ofWorkspace
and<sqlpool>
with the actual name ofSQLPool01
. -
Run the
Import WWI Data
pipeline to import all data except the sale facts intoSQLPool01
. -
Run the
Import WWI Data - Fact Sale Full
pipeline to import the sale facts intoSQLPool01
. -
Run the
Import WWI Perf Data - Fact Sale Fast
andImport WWI Perf Data - Fact Sale Slow
pipelines to import the large-sized sale facts intoSQLPool01
.
-
Ensure the
MasterUser
has a Power BI Pro subscription assigned. -
Sign in to the Power BI portal using the credentials of
MasterUser
and create a new Power BI workspace. In the remainder of this guide, this workspace will be referred to asPowerBIWorkspace
. -
Perform all the steps described in Exercise 3 - Task 1. In step 11, instead of using the suggested naming convention, name your dataset
wwifactsales
. -
In the Power BI portal, edit the security settings of the
wwifactsales
dataset and configure it to authenticate toSQLPool01
using the credentials of theasa.sql.admin
account. This allows theDirect Query
option to work correctly for all participants in the lab.
Import the following SQL scripts into Workspace
:
SQL script name | Source code | Replacements |
---|---|---|
Exercise 1 - Read with serverless SQL Pool |
Exercise 1 - Read with serverless SQL Pool.sql | <primary_storage> with the actual name of PrimaryStorage |
Exercise 4 - Analyze Transactions |
Exercise 4 - Analyze Transactions.sql | None |
Exercise 4 - Investigate query performance |
Exercise 4 - Investigate query performance.sql | None |
Exercise 5 - Create Sample Data for Predict |
Exercise 5 - Create Sample Data for Predict.sql | None |
Exercise 5 - Predict with model |
Exercise 5 - Predict with model.sql | None |
Exercise 5 - Register model |
Exercise 5 - Register model.sql | <blob_storage_account_key> with the storage account key of BlobStorage ; <blob_storage> with the storage account name of BlobStorage |
Import the following Spark notebooks into Workspace
:
Spark notebook name | Source code | Replacements |
---|---|---|
Exercise 1 - Read with Spark |
Exercise 1 - Read with Spark.ipynb | <primary_storage> with the actual name of PrimaryStorage |
Exercise 2 - Ingest Sales Data |
Exercise 2 - Ingest Sales Data.ipynb | In cell 1 - <primary_storage> with the actual name of PrimaryStorage |
Exercise 2 - Bonus Notebook with CSharp |
Exercise 2 - Bonus Notebook with CSharp.ipynb | In cell 1 - <primary_storage> with the actual name of PrimaryStorage ; In cell 3 - <sql_staging_password> with the password of asa.sql.staging created above in Task 4, step 3; In cell 3 - <workspace> with the name of the Workspace ; In cell 3 - <sql_pool> with the name of SQLPool1 |
Prepare the models
container in BlobStorage
by creating two folders: onnx
and hex
.
To prepare the machine learning model for Exercise 5, you have two options:
- Use the already trained and converted machine learning model (available as a starter artifact)
- Train and convert a new machine learning model
-
Upload the model.onnx.hex file to the
hex
folder in themodels
container ofBlobStorage
. -
Run the
Exercise 5 - Create Sample Data for Predict
SQL script to create sample data for machine learning predictions. -
Run the
Exercise 5 - Register model
SQL script to register the model with theSQLPool01
SQL pool.
-
Run the
Exercise 5 - Model training
Spark notebook to train the machine learning model and save it in ONNX format. The model will be saved asmodel.onnx
in theonnx
folder in themodels
container ofBlobStorage
. -
Use the convertion PowerShell script to transform
model.onnx
intomodel.onnx.hex
. -
Perform steps 1, 2, and 3 described in the previous section.
For each additional user that needs to have access to Workspace
and run exercises 1 through 5, the following steps must be performed:
-
Assign the
Reader
role on theWorkspaceResourceGroup
to the user. -
Assign the
Reader
andBlob Data Contributor
roles on thePrimaryStorage
to the user. -
Assign the
Workspace admin
role in theWorkspace
to the user. -
Grant access to the
SQLPool01
to the user using the script below. You must be signed in with theMasterUser
credentials (use SQL Server Management Studio if the script fails in Synapse Studio).CREATE USER [<user_principal_name>] FROM EXTERNAL PROVIDER; EXEC sp_addrolemember 'db_owner', '<user_principal_name>';
In the script above, replace
<user_principal_name>
with Azure Active Directory user principal name of the user. -
Assign the
Contributor
role on the Power BI workspace of theMasterUser
created in Task 5, step 2.
-
In the resource group, create a new Azure Databricks workspace.
-
In the Azure Databricks workspace, create a new cluster with the following configuration:
-
On the newly created cluster, install the
onnxmltools
PyPi library: -
Import the following Databricks notebooks into the
Shared
section of the Databricks workspace:Spark notebook name Source code Replacements Exercise 5 - Model Training
Exercise 5 - Model Training.dbc
-
In Synapse Workspace, open Exercise 2 - Enrich Data (2) pipeline. Select the Edit (3) button for Activities in the ForEachComment ForEach activity.
-
Select Sentiment Analysis (1) copy data activity. Switch to the Source (2) tab. Paste the Cognitive Service access key previously copied into a text editor.