You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using SCS 3.2.3 via cvx on a windows 10 machine. I was testing SCS on simple ML problems gradually increasing the problem size to see how it performs.
I observed that after certain size (500 features, 80 000 samples), SCS starts to fail to solve l2 regularized logistic regression problems. In addition, after certain size (500 features, 40 000 samples), the final objective value SCS reports & the objective value computed using the optimum parameters returned by cvx starts to become different.
cvx_use_solver = 'scs';
lam=1.0;
%Use SCS via cvx for l2 reg logistic
cvx_begin
cvx_precision high
cvx_solver(cvx_use_solver)
variable w(p)
minimize((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q+ lamsum_square(w))
cvx_end
%
fopt_cvx=((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q)+ lamsum_square(w);
fprintf(1,"Cvx objective value: %f\n", fopt_cvx)
%Use truncated Newton for comparison
Xh=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
Xh=full(Xh);
bh=[ones(kN1,1);-ones(kN2,1)];
lambda=1.0;
method='pcg';
ptol=1e-4;
pmaxi=1000;
tic
[w1,status1,history1] = l2_logreg(Xh,bh,lambda,method,ptol,pmaxi);
toc
%Compute the objective used in the truncated Newton code
[mx,nx] = size(Xh);
A = sparse(1:mx,1:mx,bh)Xh;
Aw = Aw1; expAw = exp(Aw); expmAw = exp(-Aw);
fobj_tnt=sum(log(1+expmAw))/mx+sum(lam.*w1.*w1);
%Compare the objective values
fprintf(1,"fobj_cvx: %f\n",fopt_cvx)
fprintf(1,"fobj_tnt: %f\n",fobj_tnt)
function [w,status,history] = l2_logreg(X,b,lambda,method,ptol,pmaxi)
%
% l2-regularized Logistic Regression Problem Solver
%
% L2_LOGREG solves problems of the following form:
%
% minimize (1/m) sum_i log(1+exp(-b_i(x_iw))) + sum_j lambda_iw_j^2,
%
% where variable is w and problem data are x_i, b_i and lambda_i.
%
% INPUT
%
% X : mxn matrix; input data. each row corresponds to each feature
% b : m vector; class label
% lambda : positive n-vector; regularization parameter
%
% method : string; search direction method type
% 'cg' : conjugate gradients method, 'pcg'
% 'pcg' : preconditioned conjugate gradients method
% 'exact': exact method (default value)
% ptol : scalar; pcg relative tolerance. if empty, use adaptive rule.
% pmaxi : scalar: pcg maximum iteration. if empty, use default value (500).
%
% OUTPUT
%
% w : n vector; classifier
% status : scalar; +1: success, -1: maxiter exceeded
% history :
% row 1) phi
% row 2) norm(gradient of phi)
% row 3) cumulative cg iterations
%
% USAGE EXAMPLE
%
% [w,status] = l2_logreg(X,b,lambda,'pcg');
%
%------------------------------------------------------------
% MAIN LOOP
%------------------------------------------------------------
for ntiter = 0:MAX_TNT_ITER
Aw = A*w; expAw = exp(Aw); expmAw = exp(-Aw);
g = -1/m./(1+expAw); h = 1/m./(2+expAw+expmAw);
gradphi = A'g+2lambda.*w;
normg = norm(gradphi);
phi = sum(log(1+expmAw))/m+sum(lambda.*w.*w);
disp(sprintf('%4d %15.6e %10.2e %10.2e %6d %10.2e %6d',...
ntiter,phi,s,normg,pflg,prelres,pitr));
history = [history [phi; normg; pcgiter]];
%------------------------------------------------------------
% STOPPING CRITERION
%------------------------------------------------------------
if (norm(gradphi) < ABSTOL)
status = 1;
disp('Absolute tolerance reached.');
disp(sprintf('%d/%d',sum(abs((A2'*h)./(2*lambda))<0.5),n));
return;
end
%------------------------------------------------------------
% CALCULATE NEWTON STEP
%------------------------------------------------------------
switch lower(method)
case 'pcg'
if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
[dw, pflg, prelres, pitr, presvec] = ...
pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,@Mfunc,[],[],...
A,h,2*lambda,1./(A2'*h+2*lambda));
if (pitr == 0) pitr = pcgmaxi; end
case 'cg'
if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
[dw, pflg, prelres, pitr, presvec] = ...
pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,[],[],[],...
A,h,2*lambda,[]);
if (pitr == 0) pitr = pcgmaxi; end
otherwise % exact method
hessphi = A'*sparse(1:m,1:m,h)*A+2*sparse(1:n,1:n,lambda);
dw = -hessphi\gradphi;
end
pcgiter = pcgiter+pitr;
%------------------------------------------------------------
% BACKTRACKING LINE SEARCH
%------------------------------------------------------------
s = 1;
for lsiter = 1:MAX_LS_ITER
new_w = w+s*dw;
newgradphi = A'*(-1/m./(1+exp(A*new_w)))+2*lambda.*new_w;
if (norm(newgradphi)<=(1-ALPHA*s)*normg) break; end
s = BETA*s;
end
if (lsiter == MAX_LS_ITER) break; end
w = new_w;
end
status = -1;
end
%------------------------------------------------------------
% CALL BACK FUNCTIONS FOR PCG
%------------------------------------------------------------
function y = AXfunc(x,A,h,d,p)
y = A'(h.(A*x))+d.*x;
end
function y = Mfunc(x,A,h,d,p)
y = x.*p;
end
The text was updated successfully, but these errors were encountered:
This is probably just requires tuning eps_abs and eps_rel for more/less accuracy, and reducing eps_infeas to prevent erroneously returning infeasible, which appears to be what has happened. SCS has found a certificate of infeasibility for the conic formulation of the problem to the desired accuracy, so tighten the infeasibility tolerance to prevent that if it's not what you want.
I'm not sure if CVX is really being updated much these days, but if you switch to CVXPY (or just call SCS directly) then you can use the quadratic version of the solver which will be able to handle the l2 regularization without using a second-order cone, which will likely be faster and more reliable.
Hi,
I am using SCS 3.2.3 via cvx on a windows 10 machine. I was testing SCS on simple ML problems gradually increasing the problem size to see how it performs.
I observed that after certain size (500 features, 80 000 samples), SCS starts to fail to solve l2 regularized logistic regression problems. In addition, after certain size (500 features, 40 000 samples), the final objective value SCS reports & the objective value computed using the optimum parameters returned by cvx starts to become different.
Do you know what's the issue?
Thanks,
Regards,
Caglar Ari
SCS Output:
Calling SCS 3.2.3: 480502 variables, 320001 equality constraints
problem: variables n: 320001, constraints m: 480502
cones: q: soc vars: 502, qsize: 1
e: exp vars: 0, dual exp vars: 480000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
max_iters: 10000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 10, acceleration_interval: 10
lin-sys: sparse-direct-amd-qdldl
nnz(A): 40480002, nnz(P): 0
iter | pri res | dua res | gap | obj | scale | time (s)
250| 5.05e-03 5.70e-03 1.20e-01 -1.74e+00 2.26e+02 1.12e+02
500| 1.20e-03 2.55e-03 2.26e-02 -6.75e-01 2.26e+02 1.55e+02
750| 1.54e-02 5.85e-01 1.58e-02 1.34e-03 2.26e+02 2.01e+02
1000| 1.68e-03 1.35e-03 2.80e-02 -1.21e-01 2.26e+02 2.56e+02
1250| 1.56e-03 1.36e-03 2.83e-02 -2.85e-02 2.26e+02 3.11e+02
1500| 1.52e-03 1.41e-03 2.94e-02 6.00e-02 2.26e+02 3.65e+02
1750| 1.55e-03 1.51e-03 3.14e-02 1.52e-01 2.26e+02 4.18e+02
2000| 1.64e-03 1.68e-03 3.48e-02 2.56e-01 2.26e+02 4.74e+02
2250| 1.83e-03 1.95e-03 4.03e-02 3.86e-01 2.26e+02 5.22e+02
2500| 2.17e-03 2.39e-03 4.96e-02 5.66e-01 2.26e+02 5.68e+02
2750| 2.82e-03 3.22e-03 6.67e-02 8.56e-01 2.26e+02 6.16e+02
3000| 4.36e-03 5.16e-03 1.07e-01 1.48e+00 2.26e+02 6.61e+02
3250| 1.14e-02 1.40e-02 2.89e-01 4.16e+00 2.26e+02 7.05e+02
3425| 2.62e+09 1.78e+08 1.23e+17 3.23e+17 2.26e+02 7.34e+02
status: infeasible
timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s
lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s
objective = inf
Here is the matlab code I used
rand('state',0)
randn('state',0)
sizes = [500 80000];
p = sizes(1,1); % number of features
q = sizes(1,2); % number of samples
%Generate some data
w_true = randn(p,1);
X_tmp = 3*randn(p, q);
X_tmp(p,:)=1; %set the last feature to 1
%
ips = -w_true'X_tmp;
ps = (exp(ips)./(1 + exp(ips)))';
labels = 2(rand(q,1) < ps) - 1;
X_pos = X_tmp(:,labels==1);
X_neg = X_tmp(:,labels==-1);
X = [X_pos -X_neg]; % include labels with data
clear X_tmp ips ps labels;
Xcvx=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
bcvx=[ones(kN1,1);zeros(kN2,1)];
w_init=w_true;
lam=1.0;
fobj_ref=(-bcvx'*full(Xcvx)*w_init+sum(log_sum_exp([zeros(1,q);w_init'full(Xcvx)'])))/q+ lamsum_square(w_init);
cvx_use_solver = 'scs';
lam=1.0;
%Use SCS via cvx for l2 reg logistic
cvx_begin
cvx_precision high
cvx_solver(cvx_use_solver)
variable w(p)
minimize((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q+ lamsum_square(w))
cvx_end
%
fopt_cvx=((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q)+ lamsum_square(w);
fprintf(1,"Cvx objective value: %f\n", fopt_cvx)
%Use truncated Newton for comparison
Xh=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
Xh=full(Xh);
bh=[ones(kN1,1);-ones(kN2,1)];
lambda=1.0;
method='pcg';
ptol=1e-4;
pmaxi=1000;
tic
[w1,status1,history1] = l2_logreg(Xh,bh,lambda,method,ptol,pmaxi);
toc
%Compute the objective used in the truncated Newton code
[mx,nx] = size(Xh);
A = sparse(1:mx,1:mx,bh)Xh;
Aw = Aw1; expAw = exp(Aw); expmAw = exp(-Aw);
fobj_tnt=sum(log(1+expmAw))/mx+sum(lam.*w1.*w1);
%Compare the objective values
fprintf(1,"fobj_cvx: %f\n",fopt_cvx)
fprintf(1,"fobj_tnt: %f\n",fobj_tnt)
function [w,status,history] = l2_logreg(X,b,lambda,method,ptol,pmaxi)
%
% l2-regularized Logistic Regression Problem Solver
%
% L2_LOGREG solves problems of the following form:
%
% minimize (1/m) sum_i log(1+exp(-b_i(x_iw))) + sum_j lambda_iw_j^2,
%
% where variable is w and problem data are x_i, b_i and lambda_i.
%
% INPUT
%
% X : mxn matrix; input data. each row corresponds to each feature
% b : m vector; class label
% lambda : positive n-vector; regularization parameter
%
% method : string; search direction method type
% 'cg' : conjugate gradients method, 'pcg'
% 'pcg' : preconditioned conjugate gradients method
% 'exact': exact method (default value)
% ptol : scalar; pcg relative tolerance. if empty, use adaptive rule.
% pmaxi : scalar: pcg maximum iteration. if empty, use default value (500).
%
% OUTPUT
%
% w : n vector; classifier
% status : scalar; +1: success, -1: maxiter exceeded
% history :
% row 1) phi
% row 2) norm(gradient of phi)
% row 3) cumulative cg iterations
%
% USAGE EXAMPLE
%
% [w,status] = l2_logreg(X,b,lambda,'pcg');
%
% Written by Kwangmoo Koh deneb1@stanford.edu
%------------------------------------------------------------
% INITIALIZE
%------------------------------------------------------------
% NEWTON PARAMETERS
MAX_TNT_ITER = 200; % maximum (truncated) Newton iteration
ABSTOL = 1e-8; % terminates when the norm of gradient < ABSTOL
% LINE SEARCH PARAMETERS
ALPHA = 0.01; % minimum fraction of decrease in norm(gradient)
BETA = 0.5; % stepsize decrease factor
MAX_LS_ITER = 100; % maximum backtracking line search iteration
[m,n] = size(X); % problem size: m examples, n features
if(isempty(pmaxi)) pcgmaxi = 500; else pcgmaxi = pmaxi; end
if(isempty(ptol )) pcgtol = 1e-4; else pcgtol = ptol; end
% INITIALIZE
pobj = Inf; s = inf; pitr = 0 ; pflg = 0 ; prelres = 0; pcgiter = 0;
history = [];
w = zeros(n,1); dw = zeros(n,1);
A = sparse(1:m,1:m,b)*X;
%if (strcmp(method,'cg')||strcmp(method,'pcg')) A2 = A.^2; end
A2 = A.^2;
disp(sprintf('%s %15s %10s %10s %6s %10s %6s',...
'iter','primal obj','stepsize','norg(g)','p_flg','p_res','p_itr'));
%------------------------------------------------------------
% MAIN LOOP
%------------------------------------------------------------
for ntiter = 0:MAX_TNT_ITER
Aw = A*w; expAw = exp(Aw); expmAw = exp(-Aw);
g = -1/m./(1+expAw); h = 1/m./(2+expAw+expmAw);
gradphi = A'g+2lambda.*w;
normg = norm(gradphi);
end
status = -1;
end
%------------------------------------------------------------
% CALL BACK FUNCTIONS FOR PCG
%------------------------------------------------------------
function y = AXfunc(x,A,h,d,p)
y = A'(h.(A*x))+d.*x;
end
function y = Mfunc(x,A,h,d,p)
y = x.*p;
end
The text was updated successfully, but these errors were encountered: