-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introductory, model family, benchmarks, and technical specification vignettes #16
Conversation
This vignette features an example of fitting, plotting, and assessing a simple model using sgdnet, as well as a brief background.
This vignette provides a description of all the model families as well as examples for a model fit using them in each case.
…o algorithm-vignette
Merge branch 'algorithm-vignette' of https://github.com/jolars/sgdnet into algorithm-vignette # Conflicts: # data-raw/benchmarks.R
We are actually using vehicle, not segment, so change this
tests/testthat/test-families.R
Outdated
@@ -13,7 +13,7 @@ test_that("test that all combinations run without errors", { | |||
family = c("gaussian", "binomial", "multinomial"), | |||
intercept = TRUE, | |||
sparse = FALSE, | |||
alpha = c(0, 0.5, 1), | |||
alpha = c(0, 0.75, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I test with alpha at 0.5 elsewhere, so I just figured it might not hurt to change it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
vignettes/algorithm.Rmd
Outdated
@schmidt2017. SAGA handles both strongly and | ||
non-strongly convex objectives -- even in the composite case -- making it | ||
applicable to a wide range of problems such as generalized | ||
linear regression with elastic net regularization, which is the problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generalized linear models
is a bit more common terminology
Cite Zou and Hastie (2005) for the elastic net.
vignettes/algorithm.Rmd
Outdated
|
||
Before the algorithm is set loose on data, its parameters have to be set up | ||
properly and its data (possibly) preprocessed. For illustrative purposes, | ||
we will proceed by example and use Gaussian univariate multiple regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop multiple
here - it's easily confused with family = "mgaussian"
and the fact you are using sparsity / regularization should make the fact we have more than one predictor implicitly clear.
vignettes/algorithm.Rmd
Outdated
```{r} | ||
sgdnet_sd <- function(x) sqrt(sum((x - mean(x))^2)/length(x)) | ||
n <- nrow(x) | ||
x_bar <- colSums(x)/n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colMeans(x)
vignettes/algorithm.Rmd
Outdated
|
||
We then construct the $\lambda$ path to be a $\log_e$-spaced sequence starting | ||
at $\lambda_{\text{max}}$ and finishing at | ||
$\frac{\lambda_{\text{max}}}{\lambda_{\text{min ratio}}}$. Thus we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lambda_max
times lambda_min_ratio
, not divided by
vignettes/introduction.Rmd
Outdated
|
||
## Background and motivation | ||
|
||
**sgdnet** was developed as a Google Summer of Code 2018. Its goal was to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave GSoC mentions until the end as a funding acknowledgement. I expect you'll keep maintaining and improving this past the end of GSoC-2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
vignettes/models.Rmd
Outdated
where $\mathcal{L}(\beta_0,\beta; \mathbf{y}, \mathbf{X})$ is the | ||
log-likelihood of the model $\lambda$ is the regularization strength, and | ||
$\alpha$,is the elastic net mixing parameter, such that | ||
$\alpha = 1$ results in the lasso and $\alpha = 0$ the ridge penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Citations
vignettes/models.Rmd
Outdated
|
||
$$ | ||
\text{Pr}(Y_i = c) = | ||
\frac{e^{\beta_{0_c}+\beta_c^\intercal \mathbf{x}_i}}{\sum_{k = 1}^m{e^{\beta_{0_k}+\beta_k^\intercal \mathbf{x}_i}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have K
for the number of classes instead of m
vignettes/models.Rmd
Outdated
|
||
which is the overspecified version of this model. As in **glmnet** | ||
[@friedman2010], | ||
which much of this packages functinality is modeled after, however, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functionality
vignettes/models.Rmd
Outdated
[@friedman2010], | ||
which much of this packages functinality is modeled after, however, | ||
we rely on the regularization of model to take care of | ||
this [@hastie2015, pp. 36-7]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify this a bit more. (I believe you are saying that regularization implicitly induces a sum-to-zero constraint)
This is a good start - I only reviewed the vignettes since I think the other changes are already under review as #15. I particularly appreciated the technical write up - I finally (sort of) get the point of the lagged update structure. |
Did you want another round of comments on this? |
Hm, thanks but let's wait a little. I noticed that I had missed seeing to a couple of your comments too. |
Merge branch 'master' of github.com:jolars/sgdnet into algorithm-vignette # Conflicts: # tests/testthat/test-families.R
README.Rmd
Outdated
|
||
## Versioning | ||
|
||
**eulerr** uses [semantic versioning](https://semver.org/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong package name
README.Rmd
Outdated
[ridge](https://en.wikipedia.org/wiki/Tikhonov_regularization) | ||
and [lasso](https://en.wikipedia.org/wiki/Lasso_(statistics)) penalties. | ||
|
||
**sgdnet** automatically fits the model across an automatically computed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use "automatically" twice in one sentence.
inst/COPYRIGHTS
Outdated
@@ -19,7 +19,8 @@ along with this program. If not, see <https://www.gnu.org/licenses/>. | |||
# scikit-learn | |||
|
|||
Parts of this package are modified translations of source code from the | |||
Python package scikit-learn, which is licensed under the New BSD License | |||
Python package scikit-learn (including the contributed module lightning), | |||
which is licensed under the New BSD License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly add URLs here
vignettes/sgdnet.bib
Outdated
volume = {2}, | ||
shorttitle = {{{SAGA}}}, | ||
abstract = {In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA, SAGA supports non-strongly convex problems directly, and is adaptive to any inherent strong convexity of the problem. We give experimental results showing the effectiveness of our method.}, | ||
journal = {Advances in Neural Information Processing Systems}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIPS isn't a journal. Does Zotero support the InProceedings
type?
vignettes/sgdnet.bib
Outdated
abstract = {We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k$\surd$)O(1/k)O(1/$\backslash$sqrt\{k\}) to O(1 / k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1 / k) to a linear convergence rate of the form O($\rho$k)O($\rho$k)O($\backslash$rho \^k) for $\rho{}<$1$\rho{}<$1$\backslash$rho $<$ 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. This extends our earlier work Le Roux et al. (Adv Neural Inf Process Syst, 2012), which only lead to a faster rate for well-conditioned strongly-convex problems. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.}, | ||
language = {en}, | ||
number = {1-2}, | ||
journal = {Math. Program.}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full journal name
vignettes/models.Rmd
Outdated
penalty. | ||
|
||
When the lasso penalty is in place, the regularization imposed on the | ||
coefficients takes the shape of an octahedron. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not generally (only in
vignettes/introduction.Rmd
Outdated
|
||
The goal of **sgdnet** is to be | ||
a one-stop solution for fitting elastic net-penalized generalized linear | ||
models. This is of course not novel in and by itself. Several packages, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too broad - I think it's unlikely we'll outperform glmnet
in the p \gg n
setting or the p \approx n \gg 1
setting. I'd focus our attention on the n \gg p
setting.
vignettes/algorithm.Rmd
Outdated
|
||
By default, the feature matrix in **sgdnet** is centered and scaled to unit | ||
variance. Here, however, we used the *biased* population standard deviation | ||
and divide by $n$ rather than $n-1$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this to a footnote and rephrase. We're using the sample standard deviation here since we are only concerned with fitting the model to the data at hand. Since we aren't concerned with a population, biasedness (or not) is irrelevant
vignettes/algorithm.Rmd
Outdated
|
||
The step size, $\gamma$, in SAGA is constant throughout the algorithm. | ||
For non-strongly convex objectives it is set to $1/(3L)$, where $L$ is the | ||
*Lipschitz* constant. Since we are picking a single sample at a time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be precise here -
vignettes/algorithm.Rmd
Outdated
nz <- Nonzeros(X[j]) # nonzero indices of X[j] | ||
|
||
# Perform just-in-time updates of coefficients X is sparse | ||
X[j] <- LaggedUpdate(j, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, I might focus on the dense (no lagged update) case first and then describe the lagged updates near the end.
…es, remove file field, add doi to friedman2010
I'll merge this one in now and upload the pkgdown site. |
This pull request adds three vignettes to the package and updates one:
These are still a work in progress so I don't expect to merge this one right now, but I'd love for you to pitch in if you have any feedback.
Edit (18/13/8)
I have now also generated a pkgdown-site for the project, which I intend to link to as the final work product of gsoc.