Write a function generate_data(N)
that produces N
samples from the following model
with the following true" underlying polynomial noisy model
with
Hint: you can use np.polyval
to evaluate a polynomial with a fixed set of coefficients (but watch out for the order)
The function should return a array of x
values and an array of y
values
Write a function plot(ax,train_x,train_y,p_trained,p_true)
that
takes a matplotlib axis object and plots
- plot the true function
- plot a second (trained or random) function
- plot the samples
In the end you should be able to call it like this:
f = plt.figure()
x,y = generate_data(10)
plot(f.gca(),x,y,np.random.normal(size = (4,)), p_true)
One can show that given a Hypothesis Set of Polynomial functions
and a risk function of the following form
there is a closed form solution for finding the empirical risk minimization, where the best fit coefficients
where
- Write a function
learn(train_x, train_y, degree)
to return the$(d+1)$ optimal coefficients for a polynomial fit of degree$d$ . - Fit a sampled of 5 data points with degree 4
- Plot the Trained function together with the true function using the plotting method from the last step
- Try this multiple time to get a feel for how much the data affects the fit
- Try degree 1 and observe how the trained function is much less sensitive to the data
Write a function to evaluate the risk or loss of a sample. Use our loss function for which we have the training procedure above
and right a function risk(x,y_true, trained_coeffs)
to cmpute
- Draw a size 100 data sample and fit the result to obtain trained coefficients
- Draw 10000 samples of size 10 and compute their empirical risk under the trained coefficients
- Repeat the same but use the true coefficients of the underlying data-generating process
- Histogram the two sets of 10,000 risk evaluations. Which one has lower average risk?
Explore how the fit improves when adding more data. Plot the best fit model for data set sizes of
Explore how the fit changes when using more and more complex models. Plot the best fit model for degrees
Draw two datasets:
- A train dataset with
$N=10$ - A test dataset with
$N=1000$
Perform trainings on the train dataset for degrees
- Evaluate the risk under the various trainings for the train and the test dataset
- Plot the train and test risk as a function of the polynomial degree