-
Notifications
You must be signed in to change notification settings - Fork 2
Poisson family and line-search procedure #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…or adaptive step size [skip ci]
Hi @wzzlcss Do you have any references for the step-size optimization / line-search procedure scheme being used? I haven't seen any and would be interested in learning more about the approach being used here. |
Hi mentor, I am using the method from lightning: https://github.com/scikit-learn-contrib/lightning/blob/master/lightning/impl/sag_fast.pyx, which use theory from the section 4.6 of Schmidt, M., Roux, N., & Bach, F. (2013). "Minimizing finite sums with the stochastic average gradient". arXiv Preprint arXiv:1309.2388(https://arxiv.org/abs/1309.2388) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check all the step size stuff yet (haven't had a chance to read the paper you referenced) but the tests look solid so I'm hopeful.
R/score.R
Outdated
type.measure <- match.arg(type.measure) | ||
y <- as.vector(y) | ||
|
||
y_hat <- stats::predict(fit, x, s = s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe import the generic from stats
instead? I imagine we're using it pretty frequently already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi mentor, may I ask how to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add roxygen2
comment @importFrom stats predict
and update the NAMESPACE
file. It should "just work" after that.
R/sgdnet.R
Outdated
n_classes <- 1L | ||
|
||
if (length(unique(y)) == 1) | ||
stop("only one class in response.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Class" isn't the right name here - maybe "response is constant." would be a better phrasing.
src/families.h
Outdated
auto y_std = StandardDeviation(y, y_bar); | ||
|
||
for (decltype(n) i = 0; i < n; ++i) | ||
y_map(i) = (y(i) - y_bar(0))/y_std(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
braces around loop body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably my doing. I have a slight preference for braceless one-line loops (sticking loosely to the google c++ code style guide), but either is fine by me.
@@ -35,6 +35,6 @@ test_that("all combinations run without errors", { | |||
sfit <- do.call(sgdnet, pars) | |||
gfit <- do.call(glmnet, pars) | |||
|
|||
compare_predictions(sfit, gfit, x, tol = 1e-3) | |||
compare_predictions(sfit, gfit, x, tol = 1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? If only needed for the Poisson, maybe special case it? I don't like loosening tolerances unless we really have to.
Dumb question: how well does the Adaptive stuff work for the Gaussian and Binomial cases? I know we have general upper bounds, but I wonder if these might be too conservative in practice. |
Hi mentor, thanks for the comments! I will try this with Gaussian and Binomial. And lightning sets the check frequency for lipschitz constant to 10 and it sometimes goes wrong for small size dataset, so I set to 1 to make sure to have a solution. Should I make the check frequency an user input option? I think poisson is ok with small tolerance since I have manually run these tests, maybe just because of some seed I will check all tests. |
Can't hurt to make it an option, but I think if we are also suggesting decently large batches, the Lipschitz check frequency of 1 shouldn't be too high a cost to pay. |
I think I tried this out last summer without noticing any real benefit, but a more rigorous benchmark is probably warranted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, this is looking good. However, would you please like to
- Add examples to
sgdnet()
for poisson regression - Add a data set (to use in examples) for poisson. Or you could repurpose the abalone data set for this and find another data set for gaussian.
plus attend to the other comments in the review.
n_classes <- 1L | ||
|
||
if (length(unique(y)) == 1) | ||
stop("response is constant.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stop("response is constant.") | |
stop("response is constant") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi mentor, I see other stop()
has a .
, should I remove them?
src/families.h
Outdated
|
||
template <typename T> | ||
double | ||
LambdaMax(const T& x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire code block should not be indented.
tests/testthat/test-poisson.R
Outdated
@@ -0,0 +1,26 @@ | |||
context("poisson regression") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The authors of testthat now recommend not using context()
at all. Perhaps remove it everywhere?
tests/testthat/test-poisson.R
Outdated
test_that("solutions along regularization path are the same as glmnet", { | ||
set.seed(1) | ||
|
||
N=2000; p=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to keep good coding style here as well. Avoid ;
, use <-
instead of =
, add spacing between commas and so on.
This PR implements poisson regression using adaptive step size. Here is a comparison between glmnet using caddisfly dataset. This is the output from a version that previously under development and test: poisson_check.pdf.