8000 CPU version · Issue #2 · szagoruyko/cifar.torch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

CPU version #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
srp1970 opened this issue Aug 11, 2015 · 7 comments
Open

CPU version #2

srp1970 opened this issue Aug 11, 2015 · 7 comments

Comments

@srp1970
Copy link
srp1970 commented Aug 11, 2015

This looks good. How do you run this on CPU? Is this hardcoded to run only on GPUs (NVIDIA)? When I tried removing reference to CUDA and try running on CPU it gives me an error saying:

/home/sanoob/torch/install/bin/luajit: ...ob/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:100: bad argument #1 (field finput is not a torch.FloatTensor)
stack traceback:
[C]: in function 'SpatialConvolutionMM_updateOutput'
...ob/torch/install/share/lua/5.1/nn/SpatialConvolution.lua:100: in function 'updateOutput'
/home/sanoob/torch/install/share/lua/5.1/nn/Sequential.lua:39: in function 'updateOutput'
/home/sanoob/torch/install/share/lua/5.1/nn/Sequential.lua:39: in function 'updateOutput'
/home/sanoob/torch/install/share/lua/5.1/nn/Sequential.lua:39: in function 'forward'
train-cpu.lua:106: in function 'opfunc'
/home/sanoob/torch/install/share/lua/5.1/optim/sgd.lua:43: in function 'sgd'
train-cpu.lua:115: in function 'train'
train-cpu.lua:190: in main chunk

at line

  local outputs = model:forward(inputs)

Should we declare inputs as FloatTensor too?

@szagoruyko
Copy link
Owner

The code is hardcoded to cuda because CPU would be ridiculously slow with this network. However it is very to easy to fix, just all CudaTensors have to be substituted with FloatTensors.

@mentzelos07
Copy link

I have the same problem. I deleted :cuda() from train.lua and :ceil() from provider.lua. I also changed CudaTensors to FloatTensors for train.lua. Still the same error. Is there anything I could do?

@ZENGXH
Copy link
ZENGXH commented Nov 26, 2015

I have the same problem and finally solve by cast the model and criterion into float:

model = model:float()
criterion = criterion:float()

and I do not delete the ceil() in provider.lua under CPU but it still works

@ljk628
Copy link
ljk628 commented Jan 3, 2016

The issue is finally solved by this commit: torch/nn@738057e

@fredowski
Copy link
Contributor

You mention that this is slow on a CPU and report steptimes of 70ms to 500 ms for the CUDA GPU hardware in the blog:

http://torch.ch/blog/2015/07/30/cifar.html

I did a CPU test on a macbook pro with i5 2.9 GHz and Intel Iris 6400 graphics card with lua instead of luajit and default apple clang compiler, so no openmp. The reported CPU load on MacOs is approx. 150% to 180%. I started the model with

th train.lua --type=float

The log looks like this:

==> loading data    
Will save at logs   
==> setting criterion   
==> configuring optimizer   
==> online epoch # 1 [batchSize = 128]  
 [==================== 390/390 ================>]  Tot: 54m22s | Step: 8s264ms  
Train accuracy: 13.05 %  time: 3270.04 s    
==> testing 
Test accuracy:  15.01   
==> online epoch # 2 [batchSize = 128]  
 [==================== 390/390 ================>]  Tot: 53m1s | Step: 8s385ms   
Train accuracy: 23.26 %  time: 3189.90 s    
==> testing 
Test accuracy:  17.66   
==> online epoch # 3 [batchSize = 128]  
 [===========>........ 107/390 .................]  ETA: 40m44s | Step: 8s636ms 

So I have a step time of approx. 8600 ms...

I played around with torch-cl which reduced the cpu load but the fan remained removing lot of heat. So I guess the Iris card was really doing something, but the steptime was only slightly reduced to 7500 ms.

I also tried a gcc5 build with openmp, but I installed the libraries via macports. Therefore I think, the openblas library is not compiled with openmp. The resulting steptime was also around 7500 ms. But the openmp gcc5 build increased the cpu load.

So, maybe I better rent some time on amazon ec2...

@fredowski
Copy link
Contributor

This is a test on a Amazon EC g2.2xlarge but with only the cpu, i.e. not the gpu

ubuntu@ip-172-31-25-86:~/ai/cifar.torch$ th train.lua --type=float
{
  type : "float"
  max_epoch : 300
  weightDecay : 0.0005
  save : "logs"
  momentum : 0.9
  epoch_step : 25
  model : "vgg_bn_drop"
  learningRate : 1
  batchSize : 128
  backend : "nn"
  learningRateDecay : 1e-07
}
==> configuring model   
(SNIP)
==> loading data    
Will save at logs   
==> setting criterion   
==> configuring optimizer   
==> online epoch # 1 [batchSize = 128]  
 [.................... 8/390 ...................]  ETA: 3h22m | Step: 31s858ms  

So from the MacBook Pro i5 with approx. 8s per step, I increased to 31s per step...

This is 4 times slower.

@fredowski
Copy link
Contributor

And now on the Amazon EC g2.2xlarge with

ubuntu@ip-172-31-25-86:~/ai/cifar.torch$ th train.lua --type=cuda
{
  type : "cuda"
  max_epoch : 300
  weightDecay : 0.0005
  save : "logs"
  momentum : 0.9
  epoch_step : 25
  model : "vgg_bn_drop"
  learningRate : 1
  batchSize : 128
  backend : "nn"
  learningRateDecay : 1e-07
}
==> configuring model   
(SNIP)
==> loading data    
Will save at logs   
==> setting criterion   
==> configuring optimizer   
==> online epoch # 1 [batchSize = 128]  
 [========>........... 79/390 ..................]  ETA: 8m41s | Step: 1s676ms   

The steptime is 1676ms! It is getting better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
0