-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCh10_DeepLearning_Python.Rmd
351 lines (235 loc) · 8.03 KB
/
Ch10_DeepLearning_Python.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
---
title: "Deep Learning"
author: "Your Name"
date: "2022-12-21"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,message=FALSE,fig.align="center",fig.width=7,fig.height=2.5)
```
```{css,echo=FALSE}
.btn {
border-width: 0 0px 0px 0px;
font-weight: normal;
text-transform: ;
}
.btn-default {
color: #2ecc71;
background-color: #ffffff;
border-color: #ffffff;
}
```
```{r,echo=FALSE}
# Global parameter
show_code <- FALSE
```
# Class Workbook {.tabset .tabset-fade .tabset-pills}
## In class activity
```{python}
import numpy as np
import pandas as pd
import math
from matplotlib.pyplot import subplots
#import statsmodels.api as sm
from plotnine import *
import plotly.express as px
import statsmodels.formula.api as sm
#import ISLP as islp
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale
from sklearn.linear_model import LinearRegression
```
### Ames Housing data
Please take a look at the Ames Hoursing data.
```{python,echo=show_code}
ames_raw=pd.read_csv("ames_raw.csv")
```
Use data of `ames_raw` up to 2008 predict the housing price for the later years.
```{python,echo=show_code}
ames_raw_2009, ames_raw_2008= ames_raw.query('`Yr Sold`>=2008').copy(), ames_raw.query('`Yr Sold` <2008').copy()
```
Use the following loss function calculator.
```{python,echo=show_code}
def calc_loss(prediction,actual):
difpred = actual-prediction
RMSE =pow(difpred.pow(2).mean(),1/2)
operation_loss=abs(sum(difpred[difpred<0]))+sum(0.1*actual[difpred>0])
return RMSE,operation_loss
```
Use a simple neural network model.
```{python,eval=FALSE,echo=show_code}
nnfit_2008= # ["your model here"] # use ames_raw_2008
```
When you decide on your model use the following to come up with your test loss.
```{python,eval=FALSE,echo=show_code}
pred_2009=# Predict using ames_raw_2009
calc_loss(pred_2009,ames_raw_2009.SalePrice)
```
Try to answer the following additional questions.
- Does your model indicate a good fit?
- How does your model result compare to the previous models you fit?
- Can you explain what feature was important determinant of the price?
### COVID 19 Survival in Mexico
Let's revisit COVID-19 in Mexico dataset from the [Mexican government](https://datos.gob.mx/busca/dataset/informacion-referente-a-casos-covid-19-en-mexico). This data is a version downloaded from [Kaggle](https://www.kaggle.com/datasets/meirnizri/covid19-dataset?resource=download). The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.
- sex: 1 for female and 2 for male.
- age: of the patient.
- classification: COVID test findings. Values 1-3 mean that the patient was diagnosed with COVID in different degrees. 4 or higher means that the patient is not a carrier of COVID or that the test is inconclusive.
- patient type: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.
- pneumonia: whether the patient already have air sacs inflammation or not.
- pregnancy: whether the patient is pregnant or not.
- diabetes: whether the patient has diabetes or not.
- copd: Indicates whether the patient has Chronic obstructive pulmonary disease or not.
- asthma: whether the patient has asthma or not.
- inmsupr: whether the patient is immunosuppressed or not.
- hypertension: whether the patient has hypertension or not.
- cardiovascular: whether the patient has heart or blood vessels related disease.
- renal chronic: whether the patient has chronic renal disease or not.
- other disease: whether the patient has other disease or not.
- obesity: whether the patient is obese or not.
- tobacco: whether the patient is a tobacco user.
- usmr: Indicates whether the patient treated medical units of the first, second or third level.
- medical unit: type of institution of the National Health System that provided the care.
- intubed: whether the patient was connected to the ventilator.
- icu: Indicates whether the patient had been admitted to an Intensive Care Unit.
- date died: If the patient died indicate the date of death, and 9999-99-99 otherwise.
```{python}
import zipfile
Train_COVID= pd.read_csv('Train_COVID.zip',compression='zip')
Test_COVID= pd.read_csv('Test_COVID.zip',compression='zip')
```
- Fit a sequence model that predicts the number of cases a week a head.
- Modify your model to make prediction for different gender.
Your code:
```{python,echo=TRUE}
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
## Problem set
### Writing your own gradient decent
Consider the simple function $R(\beta) = sin(\beta) + \beta/10$.
(a) Draw a graph of this function over the range $\beta \in [−6, 6]$.
Your code:
```{python,echo=TRUE}
#
#
```
(b) What is the derivative of this function?
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
(c) Given $\beta_0 = 2.3$, run gradient descent to find a local minimum of $R(\beat)$ using a learning rate of $\rho= 0.1$. Show each of $\beta_0,\beta_1,\dots$ in your plot, as well as the final answer.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
(d) Repeat with $\beta_0 = 1.4$.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
### Default
Fit a neural network to the Default data. Use a single hidden layer with 10 units, and dropout regularization. Have a look at Labs 10.9.1–10.9.2 for guidance. Compare the classification performance of your model with that of linear logistic regression.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
### IMDb
Repeat the analysis of Lab 10.9.5 on the IMDb data using a similarly structured neural network. We used 16 hidden units at each of two hidden layers. Explore the effect of increasing this to 32 and 64 units per layer, with and without 30% dropout regularization.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
### NYSE
Fit a lag-5 autoregressive model to the NYSE data, as described in the text and Lab 10.9.6. Refit the model with a 12-level factor representing the month. Does this factor improve the performance of the model?
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
### NYSE 2
In Section 10.9.6, we showed how to fit a linear AR model to the
NYSE data using the `LinearRegression()` function. However, we also
mentioned that we can “flatten” the short sequences produced for
the RNN model in order to fit a linear AR model. Use this latter
approach to fit a linear AR model to the NYSE data. Compare the test
R2 of this linear AR model to that of the linear AR model that we fit
in the lab. What are the advantages/disadvantages of each approach?
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
Repeat the previous exercise, but now fit a nonlinear AR model by
“flattening” the short sequences produced for the RNN model.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
### NYSE 3
Consider the RNN fit to the NYSE data in Section 10.9.6. Modify the code to allow inclusion of the variable day_of_week, and fit the RNN. Compute the test $R^2$.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~
### CNN on photo
From your collection of personal photographs, pick 10 images of animals
(such as dogs, cats, birds, farm animals, etc.). If the subject
does not occupy a reasonable part of the image, then crop the image.
Now use a pretrained image classification CNN as in Lab 10.9.4 to
predict the class of each of your images, and report the probabilities
for the top five predicted classes for each image.
Your code:
```{python,echo=TRUE}
#
#
```
Your answer:
~~~
Please write your answer in full sentences.
~~~