[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCS fails for l2 regularized logistic regression after certain problem size #252

Open
caglarari opened this issue May 9, 2023 · 1 comment

Comments

@caglarari
Copy link

Hi,

I am using SCS 3.2.3 via cvx on a windows 10 machine. I was testing SCS on simple ML problems gradually increasing the problem size to see how it performs.

I observed that after certain size (500 features, 80 000 samples), SCS starts to fail to solve l2 regularized logistic regression problems. In addition, after certain size (500 features, 40 000 samples), the final objective value SCS reports & the objective value computed using the optimum parameters returned by cvx starts to become different.

Do you know what's the issue?

Thanks,
Regards,
Caglar Ari

SCS Output:
Calling SCS 3.2.3: 480502 variables, 320001 equality constraints


       SCS v3.2.3 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012

problem: variables n: 320001, constraints m: 480502
cones: q: soc vars: 502, qsize: 1
e: exp vars: 0, dual exp vars: 480000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
max_iters: 10000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 10, acceleration_interval: 10
lin-sys: sparse-direct-amd-qdldl
nnz(A): 40480002, nnz(P): 0

iter | pri res | dua res | gap | obj | scale | time (s)

 0| 7.66e+03  7.01e+00  3.25e+06 -1.63e+06  1.00e-01  3.48e+01 

250| 5.05e-03 5.70e-03 1.20e-01 -1.74e+00 2.26e+02 1.12e+02
500| 1.20e-03 2.55e-03 2.26e-02 -6.75e-01 2.26e+02 1.55e+02
750| 1.54e-02 5.85e-01 1.58e-02 1.34e-03 2.26e+02 2.01e+02
1000| 1.68e-03 1.35e-03 2.80e-02 -1.21e-01 2.26e+02 2.56e+02
1250| 1.56e-03 1.36e-03 2.83e-02 -2.85e-02 2.26e+02 3.11e+02
1500| 1.52e-03 1.41e-03 2.94e-02 6.00e-02 2.26e+02 3.65e+02
1750| 1.55e-03 1.51e-03 3.14e-02 1.52e-01 2.26e+02 4.18e+02
2000| 1.64e-03 1.68e-03 3.48e-02 2.56e-01 2.26e+02 4.74e+02
2250| 1.83e-03 1.95e-03 4.03e-02 3.86e-01 2.26e+02 5.22e+02
2500| 2.17e-03 2.39e-03 4.96e-02 5.66e-01 2.26e+02 5.68e+02
2750| 2.82e-03 3.22e-03 6.67e-02 8.56e-01 2.26e+02 6.16e+02
3000| 4.36e-03 5.16e-03 1.07e-01 1.48e+00 2.26e+02 6.61e+02
3250| 1.14e-02 1.40e-02 2.89e-01 4.16e+00 2.26e+02 7.05e+02
3425| 2.62e+09 1.78e+08 1.23e+17 3.23e+17 2.26e+02 7.34e+02

status: infeasible
timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s
lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s

objective = inf

Here is the matlab code I used
rand('state',0)
randn('state',0)

sizes = [500 80000];
p = sizes(1,1); % number of features
q = sizes(1,2); % number of samples

%Generate some data
w_true = randn(p,1);
X_tmp = 3*randn(p, q);
X_tmp(p,:)=1; %set the last feature to 1
%
ips = -w_true'X_tmp;
ps = (exp(ips)./(1 + exp(ips)))';
labels = 2
(rand(q,1) < ps) - 1;

X_pos = X_tmp(:,labels==1);
X_neg = X_tmp(:,labels==-1);

X = [X_pos -X_neg]; % include labels with data
clear X_tmp ips ps labels;

Xcvx=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
bcvx=[ones(kN1,1);zeros(kN2,1)];
w_init=w_true;
lam=1.0;
fobj_ref=(-bcvx'*full(Xcvx)*w_init+sum(log_sum_exp([zeros(1,q);w_init'full(Xcvx)'])))/q+ lamsum_square(w_init);

cvx_use_solver = 'scs';
lam=1.0;
%Use SCS via cvx for l2 reg logistic
cvx_begin
cvx_precision high
cvx_solver(cvx_use_solver)
variable w(p)
minimize((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q+ lamsum_square(w))
cvx_end
%
fopt_cvx=((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q)+ lamsum_square(w);
fprintf(1,"Cvx objective value: %f\n", fopt_cvx)

%Use truncated Newton for comparison
Xh=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
Xh=full(Xh);
bh=[ones(kN1,1);-ones(kN2,1)];
lambda=1.0;
method='pcg';
ptol=1e-4;
pmaxi=1000;
tic
[w1,status1,history1] = l2_logreg(Xh,bh,lambda,method,ptol,pmaxi);
toc

%Compute the objective used in the truncated Newton code
[mx,nx] = size(Xh);
A = sparse(1:mx,1:mx,bh)Xh;
Aw = A
w1; expAw = exp(Aw); expmAw = exp(-Aw);
fobj_tnt=sum(log(1+expmAw))/mx+sum(lam.*w1.*w1);

%Compare the objective values
fprintf(1,"fobj_cvx: %f\n",fopt_cvx)
fprintf(1,"fobj_tnt: %f\n",fobj_tnt)

function [w,status,history] = l2_logreg(X,b,lambda,method,ptol,pmaxi)
%
% l2-regularized Logistic Regression Problem Solver
%
% L2_LOGREG solves problems of the following form:
%
% minimize (1/m) sum_i log(1+exp(-b_i(x_iw))) + sum_j lambda_iw_j^2,
%
% where variable is w and problem data are x_i, b_i and lambda_i.
%
% INPUT
%
% X : mxn matrix; input data. each row corresponds to each feature
% b : m vector; class label
% lambda : positive n-vector; regularization parameter
%
% method : string; search direction method type
% 'cg' : conjugate gradients method, 'pcg'
% 'pcg' : preconditioned conjugate gradients method
% 'exact': exact method (default value)
% ptol : scalar; pcg relative tolerance. if empty, use adaptive rule.
% pmaxi : scalar: pcg maximum iteration. if empty, use default value (500).
%
% OUTPUT
%
% w : n vector; classifier
% status : scalar; +1: success, -1: maxiter exceeded
% history :
% row 1) phi
% row 2) norm(gradient of phi)
% row 3) cumulative cg iterations
%
% USAGE EXAMPLE
%
% [w,status] = l2_logreg(X,b,lambda,'pcg');
%

% Written by Kwangmoo Koh deneb1@stanford.edu

%------------------------------------------------------------
% INITIALIZE
%------------------------------------------------------------

% NEWTON PARAMETERS
MAX_TNT_ITER = 200; % maximum (truncated) Newton iteration
ABSTOL = 1e-8; % terminates when the norm of gradient < ABSTOL

% LINE SEARCH PARAMETERS
ALPHA = 0.01; % minimum fraction of decrease in norm(gradient)
BETA = 0.5; % stepsize decrease factor
MAX_LS_ITER = 100; % maximum backtracking line search iteration

[m,n] = size(X); % problem size: m examples, n features

if(isempty(pmaxi)) pcgmaxi = 500; else pcgmaxi = pmaxi; end
if(isempty(ptol )) pcgtol = 1e-4; else pcgtol = ptol; end

% INITIALIZE
pobj = Inf; s = inf; pitr = 0 ; pflg = 0 ; prelres = 0; pcgiter = 0;
history = [];

w = zeros(n,1); dw = zeros(n,1);

A = sparse(1:m,1:m,b)*X;
%if (strcmp(method,'cg')||strcmp(method,'pcg')) A2 = A.^2; end
A2 = A.^2;

disp(sprintf('%s %15s %10s %10s %6s %10s %6s',...
'iter','primal obj','stepsize','norg(g)','p_flg','p_res','p_itr'));

%------------------------------------------------------------
% MAIN LOOP
%------------------------------------------------------------

for ntiter = 0:MAX_TNT_ITER
Aw = A*w; expAw = exp(Aw); expmAw = exp(-Aw);
g = -1/m./(1+expAw); h = 1/m./(2+expAw+expmAw);
gradphi = A'g+2lambda.*w;
normg = norm(gradphi);

phi = sum(log(1+expmAw))/m+sum(lambda.*w.*w);
disp(sprintf('%4d %15.6e %10.2e %10.2e %6d %10.2e %6d',...
            ntiter,phi,s,normg,pflg,prelres,pitr));
history = [history [phi; normg; pcgiter]];

%------------------------------------------------------------
%   STOPPING CRITERION
%------------------------------------------------------------
if (norm(gradphi) < ABSTOL) 
    status = 1;
    disp('Absolute tolerance reached.');
    disp(sprintf('%d/%d',sum(abs((A2'*h)./(2*lambda))<0.5),n));
    return;
end
%------------------------------------------------------------
%       CALCULATE NEWTON STEP
%------------------------------------------------------------
switch lower(method)
    case 'pcg'
    if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
    [dw, pflg, prelres, pitr, presvec] = ...
        pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,@Mfunc,[],[],...
            A,h,2*lambda,1./(A2'*h+2*lambda));
    if (pitr == 0) pitr = pcgmaxi; end

    case 'cg'
    if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
    [dw, pflg, prelres, pitr, presvec] = ...
        pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,[],[],[],...
            A,h,2*lambda,[]);
    if (pitr == 0) pitr = pcgmaxi; end

    otherwise % exact method
    hessphi = A'*sparse(1:m,1:m,h)*A+2*sparse(1:n,1:n,lambda);
    dw = -hessphi\gradphi;
end
pcgiter = pcgiter+pitr;
%------------------------------------------------------------
%   BACKTRACKING LINE SEARCH
%------------------------------------------------------------
s = 1;
for lsiter = 1:MAX_LS_ITER
    new_w = w+s*dw;
    newgradphi = A'*(-1/m./(1+exp(A*new_w)))+2*lambda.*new_w;
    if (norm(newgradphi)<=(1-ALPHA*s)*normg) break; end
    s = BETA*s;
end
if (lsiter == MAX_LS_ITER) break; end
w = new_w;

end
status = -1;
end
%------------------------------------------------------------
% CALL BACK FUNCTIONS FOR PCG
%------------------------------------------------------------
function y = AXfunc(x,A,h,d,p)
y = A'(h.(A*x))+d.*x;
end

function y = Mfunc(x,A,h,d,p)
y = x.*p;
end

@bodono
Copy link
Member
bodono commented May 10, 2023

This is probably just requires tuning eps_abs and eps_rel for more/less accuracy, and reducing eps_infeas to prevent erroneously returning infeasible, which appears to be what has happened. SCS has found a certificate of infeasibility for the conic formulation of the problem to the desired accuracy, so tighten the infeasibility tolerance to prevent that if it's not what you want.

I'm not sure if CVX is really being updated much these days, but if you switch to CVXPY (or just call SCS directly) then you can use the quadratic version of the solver which will be able to handle the l2 regularization without using a second-order cone, which will likely be faster and more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants