forked from jhudsl/AnVIL_Book_Getting_Started
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-pis.Rmd
313 lines (204 loc) · 21 KB
/
02-pis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
# (PART\*) Account Setup {-}
# PIs and Lab Managers
This chapter is targeted towards people who are responsible for bringing a team to AnVIL. Broadly targeted towards principal investigators (PIs), but also relevant to team leads or lab managers, you will find here:
- **Account Setup Overview** -- Design philosophy and goals for this guide - is this a good fit for your team? What should you know before you start?
- **Account Setup Steps** -- Step-by-step instructions to create your first accounts on AnVIL and connect your team members
The Appendices of this book contain additional information that may be of interest, including:
- Templates for including AnVIL in grant applications ([Budget Templates], [IRB Templates])
- Information regarding AnVIL's security features for protecting sensitive research ([Authorization Domains])
::: {.fyi}
Please click on the subsection headers in the left hand
navigation bar (e.g., 2.1, 4.3) a second time to expand the
table of contents and enable the `scroll_highlight` feature
([see more](introduction.html#scroll-highlight)).
:::
## Account Setup Overview {#overview-pis}
### Goals for This Guide
```{r, echo=FALSE, fig.alt="List of goals for this guide: 1) get your accounts, 2) set up billing, 3) set up your lab members to do research on AnVIL, and 4) monitor and manage spending."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gd5c49c5c55_0_165")
```
### Design Philosophy
This guide provides an opinionated walkthrough on how to set up AnVIL for your lab, based on experiences from many labs actively using AnVIL. These step-by-step instructions take team leads that are completely new to the AnVIL through account setup to the point where team members can start working on AnVIL. Following the recommendations in this guide will help you more clearly see where charges are coming from and have greater control over which users can spend your money and access your data. In support of these goals we have made the following design decisions:
1. COST CONTROL
a. Prevent charges to your funding account until you explicitly give authorization by starting with Google’s free $300 credit program
b. Control who can charge to your account by limiting who can “share” permission to compute - yourself and any designated "Lab Managers"
2. COST TRANSPARENCY
a. Allow fine-grain accounting of who spent what by creating individual "Billing Projects" for each user
b. Monitor costs by setting up email alerts to warn you when you reach spending thresholds
c. Enable detailed analysis of costs by exporting cost data using BigQuery
3. DATA ACCESS CONTROLS
a. Reduce unwanted access by limiting who can "share" your data and analyses - yourself and any designated "Lab Managers"
b. Stricter data access management can be enforced through "Authorization Domains"; however this can make future sharing and publication difficult. This guide recommends avoiding Authorization Domains for most uses, especially as you are starting out. If you are working with highly sensitive data, see [this documentation](https://support.terra.bio/hc/en-us/articles/360026775691-Managing-data-privacy-and-access-with-Authorization-Domains) for more information.
These design decisions are made to help you get up and running as quickly as possible without overwhelming new users. As your experience and comfort with AnVIL grows, you will likely change your design to better match your unique needs e.g. enabling Authorization Domains when working with protected data.
### Before You Start
- You will need a **credit card or bank account** to activate your free trial and get started. Don't worry! **You won't be billed until you explicitly turn on automatic billing**, but payment information is needed for verification purposes.
- Before setting up billing yourself, you may want to check with your institutional procurement office and see if they have a preferred account set-up method with Google (such as a third party reseller or an existing account).
- To add lab members, you will need to know the Google account they will use to access Terra. You can send lab members to the [Data Analysts] chapter for instructions on how they can sign up and start working on AnVIL. You can complete most setup steps without this information and then add them once you know the correct accounts.
### Starting Setup {#account-setup-pis}
AnVIL uses [Terra](https://anvil.terra.bio/) to run analyses. Terra operates on Google Cloud Platform (GCP), so you’ll pay for all storage and analysis costs through a Google account linked to Terra. The costs are the standard Google Cloud Platform fees for storing and moving data as well as executing an analysis. These costs are passed along through Terra without any markup.
```{r, echo=FALSE, fig.alt="Diagram outlining the roles of Google and Terra for AnVIL. A 'PI' signs in with a Google ID, which lets them create a Google Billing Account. Money flows from the Google Billing Account to a Terra Billing Project, and then to individual Terra Workspaces"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gd84a304855_0_138")
```
1. Create a Google account
1. Set up Google Billing (and claim your free credits!).
+ Add an administrator or viewer (optional)
1. Link Terra to the Google Billing Account
1. Create Terra Billing Projects
1. Set budgets and alerts (optional, but highly recommended)
1. Add users and Workspaces
### Lab Management Roles
While there are many ways to configure your lab, this guide defines the following roles and responsibilities:
- **PI** - The PI sets up the lab’s Google Cloud Account, creates its Google Billing Account(s), and Google Payment Method(s), links Terra with GCP, and invites Lab Managers to be Google Cloud “Billing Account Users.”
- **Lab Manager** (Optional) - A Lab Manager creates or clones Terra Workspaces and manages who can use those Workspaces. The Lab Manager is also responsible for creating one or more Terra Billing Projects configuring GCP budgets and alerts. Importantly, **lab managers control who can spend lab money** and should have an understanding of Google Cloud Billing and Terra Billing Projects. Depending on your lab, the PI may choose to be the only Lab Manager, or may appoint trusted lab members to assist.
- **Data Analyst** - A lab member who is granted write + can-compute access on one or more Terra Workspaces by a Lab Manager and who will run analyses in Terra. Data Analysts cannot share Terra Workspaces (this prevents them from enabling others to spend lab money).
## Step 1: Create a Google Account {#pis-step-1}
```{r, echo=FALSE, fig.alt="Diagram showing an overview of the six steps. Step 1 is highlighted."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gd5c49c5c55_0_160")
```
Terra operates on Google Cloud Platform, so you will need a (free) Google account which will allow you to
- Access the Terra platform to manage team members, data, and analyses
- Access Google Cloud Platform to manage billing
- Receive alerts when spending reaches specified thresholds
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_google_create_account.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
## Step 2: Set Up Google Billing
```{r, echo=FALSE, fig.alt="Diagram showing an overview of the six steps. Step 2 is highlighted."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gd5c49c5c55_0_170")
```
Terra operates on Google Cloud Platform, and does not charge any markup. Rather than paying Terra or AnVIL, users set up billing directly with Google Cloud Platform.
**Make sure to use the same Google account ID you use to log into Terra for Google Cloud Billing.**
To set up billing, you must first create a **Google “Billing Account”**.
You can create multiple Billing Accounts associated with your Google ID. We recommend creating separate Billing Accounts for different funding sources.
### Create a Google Billing Account
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_google_billing_create_account.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
### Add Users or Viewers (optional)
If you have a project manager or finance administrator who needs access to a Billing Account, you can add them with a few different levels of permissions. Generally the most useful are:
- **Users** have a great deal of power over spending - they can create new "Billing Projects" and control who can spend money on those projects. If you have a lab or accounts manager responsible for expenses, it may make sense to add them as a Billing Account User. If you wish to retain full control over who can spend money on GCP, you should not add any Users.
- **Viewers** can see the activity in the Billing Account but can’t make any changes. This can be useful for finance staff who need access to the reports, or for lab members to be able to see what their analyses are costing.
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_google_billing_add_member.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
## Step 3: Add Terra to Google Billing Account
```{r, echo=FALSE, fig.alt="Diagram showing an overview of the six steps. Step 3 is highlighted."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gd84a304855_0_198")
```
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_google_billing_add_terra.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
## Step 4: Create Terra Billing Projects
```{r, echo=FALSE, fig.alt="Diagram showing an overview of the six steps. Step 4 is highlighted."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gd84a304855_0_217")
```
This is how you enable Terra users to charge to the Google Billing Account.
Note that Google will report charges at the level of Billing Projects. **If you create only one Billing Project for your lab, you will not be able to see a breakdown of where charges are coming from**.
It is highly recommended that you create separate Billing Projects for each category of spending you would like to track. For example:
- A Billing Project for each **lab member**, if you would like to track individual spending
- A Billing Project for each **analysis type**, if you would like to track spending on e.g. RNA-seq vs. variant calling.
- A Billing Project for each **cohort**, if you would like to track spending per data set
If you are uncertain, **we recommend starting by setting up a Billing Project per lab member**. This makes it easy to track lab member spending, and also makes it easier to cleanly shut down projects when a member leaves the lab.
### Create a Billing Project
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_terra_create_billing_project.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
As mentioned above, we recommend creating separate Terra Billing Projects for each of your team members so you can track their spending. These Billing Projects can all be associated with the same Google Billing Account if they are all funded by the same source.
**Having trouble?**
- Check out the [Troubleshooting] appendix
- Visit our community support forum at [`help.anvilproject.org`](https://help.anvilproject.org) with any questions.
## Step 5: Set Budgets and Alerts
```{r, echo=FALSE, fig.alt="Diagram showing an overview of the six steps. Step 5 is highlighted."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gda79c11827_0_0")
```
Cloud computing can save a great deal of money, time and effort by providing compute on an as-needed basis. However, care must be taken that users do not accidentally request excessive resources, or leave resources running when not needed.
Unfortunately, there are two issues that make direct cost control difficult:
- **The Google Cloud billing interface does not provide a way to automatically cancel computations when a spending threshold is reached**
- **Compute costs are reported with a delay (~1 day)**
As a PI or lab manager, there are some steps you can take to help monitor and limit spending:
- Be careful with members and permissions in your Billing Projects and Workspaces on Terra (see [Adding Users and Workspaces](#step-6-add-users-and-workspaces) for recommended setup)
- Most importantly, **monitor your spending** so you can shut down unnecessary expensive activities before they have time to accumulate.
- Terra provides [extensive documentation and examples](https://support.terra.bio/hc/en-us/sections/360006459511-Controlling-Cloud-costs) regarding cost management while working in the cloud
**We highly recommended you set budgets and alerts to notify you if spending starts to exceed expectations**. This will make it easier to notice and shut down any accidental overspending. A good starting point is to set a monthly budget, and then set alerts at **50 percent** and **90 percent** of expected spend. You can add additional alerts if you desire.
You can set a single Budget for your entire lab, set up individual budgets for each Billing Project, or even set budgets for certain subsets of your Billing Projects. This will depend on the size of your lab and how closely you want to monitor spending. More granular budgets make it quicker to notice and track down overspending from a particular project but mean you will get more emails every month. When setting budgets with broader scope, you can always find out which particular Billing Project is spending the money by checking in the GCP Billing interface. **NOTE: that there may be some restrictions on the budgets and alerts you can set while you’re using GCP’s free credits.** At the time of writing (Feb 2021) you are not able to set budgets for individual projects while you are using the GCP free credits, but can still set an overall budget. Any restrictions should be lifted when you upgrade to a paid account.
### Set Alerts
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_google_billing_set_alerts.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
### View spend
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_google_billing_view_spend.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
### Export Cost Data to BigQuery
Coming soon -- instructions on how to export your cost data so you can better analyze and control your expenses.
## Step 6: Add Users and Workspaces
```{r, echo=FALSE, fig.alt="Diagram showing an overview of the six steps. Step 6 is highlighted."}
ottrpal::include_slide("https://docs.google.com/presentation/d/1c272-o1y4OdLu0hzr-5xDyTyrJVEmp8Jg55TPDgGZik/edit#slide=id.gda79c11827_0_64")
```
Finally, back on Terra, you can add lab members and give them permission to run analyses funded through your Billing Projects.
There are two primary ways to permit users to charge to your Billing Projects:
- **Add them directly to the Billing Project**. This gives them flexibility to create and manage their own Workspaces, but reduces your control over spending. Anyone they add to their Workspaces with sufficient permissions (i.e. permission to compute) can charge to your Billing Project.
- **Create a Workspace yourself, and add them to the Workspace** (or have a designated Lab Manager responsible for managing Workspaces). This gives you much more control over who can charge to your Billing Project.
Billing permissions on Terra can be confusing. For this reason, **We recommend starting by having a single person responsible for managing all Workspaces (either yourself or a trusted "lab manager"). This person should create all Workspaces and add lab members as Writers (not Owners) to the Workspaces**. This provides the greatest control over spending. Once you are familiar with the permissions system and are certain your lab members understand the implication of different permission settings, you may decide to give them greater control over Workspace access.
### Create a New Workspace
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_workspace_create.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
**To start, we recommend creating one Workspace for each lab member** (associated with that lab member’s Billing Project, with separate Billing Projects for your lab members). This will enable you and your lab members to familiarize yourself with Workspaces and decide how best to organize your work. You can then create additional Workspaces as needed.
### Add Members to Workspaces
Lab members must have logged in to Terra at least once before they can be added to your Billing Projects and Workspaces (they do not need to log in to Google Cloud Console). You can send lab members to the [Data Analysts] guide for instructions on how they can sign up and start working on AnVIL.
Lab members can be added to a Workspace with a few different permission levels:
- **Readers** can view the Workspace but not make edits or run analyses (i.e. they **cannot spend your money**)
- **Writers** can make edits and run analyses (i.e. they **can spend your money**)
- **Owners** can make edits and run analyses and can also manage the permissions of other users (i.e. they **can enable others to spend your money**)
```{r, echo=FALSE, fig.alt='Table summarizing Workspace permission levels.'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1hhdPNfuAhbwkl5LlNVlJiCIx_rbzVp3jSJJeksqiR5I/edit#slide=id.g117dd5f15db_0_584")
```
More details about the permissions associated with each Access Level can be found in the [Terra documentation](https://support.terra.bio/hc/en-us/articles/360025851892-Reader-writer-or-owner-Workspace-access-controls-explained).
Managing permissions for a Workspace has important implications:
- **Billing**: Terra charges are associated with Workspaces rather than users. Any billable activity that takes place in a given Workspace will be charged to the associated Billing Project, regardless of who conducted the activity. If there are multiple users with permission to compute, it is impossible to tell who conducted the activity.
- **Data access**: Especially when working with protected data, it’s important to ensure that users have proper authorization to view the data before giving them access to a Workspace containing the data. Terra provides **Authorization Domains** to assist with this.
In general we recommend:
- **Writers: Lab members who need permission to compute** (and charge to your Billing Project). This gives them permission to freely use the Workspace, (adding and removing data, conducting analyses, etc.) but prevents them from adding additional members who could charge to your Billing Project. This ensures you have control over *who* is doing the spending.
- **Readers: All other users** (i.e. users who need to see the Workspace but should not charge to your Billing Project). Readers can always “clone” the Workspace (creating a copy of it associated with their own Billing Project) if they want to run computations themselves.
- If working with protected data, take advantage of Authorization Domains to increase security.
To add a member to a Workspace:
```{r, echo = FALSE, results='asis'}
cow::borrow_chapter(
doc_path = "child/_child_workspace_share.Rmd",
repo_name = "jhudsl/AnVIL_Template"
)
```
### Request Quota Increase
To prevent abuse, new users of GCP are only permitted to create a few Google Cloud "Projects". When working on Terra, each Terra Workspace is associated with its own Google Cloud Project, so if your team has multiple members you can bump up against this limit fairly quickly and won't be able to create more Workspaces.
Since this limit is imposed by Google, you will need to contact them directly to request a quota increase, using [this form](https://support.google.com/code/contact/billing_quota_increase).
At the time of writing (April 2022) Terra is working to expedite this process for Terra users; we recommend checking the [relevant Terra documentation](https://support.terra.bio/hc/en-us/articles/360029071251#h_01FFNCK82NB0YMAH5BTP41GYSY) for the latest information as well as recommendations about how to fill out the form.
## Wrap-Up {#pis-wrap-up}
**Congratulations! You have successfully set up AnVIL for your lab!**
Your lab members should be free to carry out analyses in the Workspaces you created. You should not need to do any further configuration through Terra until you decide to add or change user permissions for your Billing Projects and Workspaces.
You can view costs at any time through [Google Cloud Billing](https://console.cloud.google.com/billing). Note that costs are reported with a delay (~1 day).
To learn more about billing and setup, we recommend checking out this [Leanpub course](https://leanpub.com/universities/courses/terra/billing-and-collaboration).