-
Notifications
You must be signed in to change notification settings - Fork 207
linspace and logspace for CUDA Tensors #729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/THC/THCApply.cuh
Outdated
@@ -39,6 +39,71 @@ kernelPointwiseApply1(TensorInfo<Ta, IndexType> a, | |||
} | |||
|
|||
template <typename Op, | |||
typename Ta, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linspace and logspace are 1d and should only really operate on contiguous 1-d tensors, so there's no need to add a version that passes in an index to the operator. This is just duplicating a bunch of code. I'd remove these and just go the simple route.
Instead, you can just add the base pointer to the operator instance itself and your operator can calculate the offset.
A multi-dimensional linear index is not that useful in general to operators, since often you would need the dimension information to make sense of it if you have more than one dimension.
#if defined(THC_REAL_IS_FLOAT) || defined(THC_REAL_IS_DOUBLE) || defined(THC_REAL_IS_HALF) | ||
|
||
void THCTensor_(linspace)(THCState *state, THCTensor *r_, real a, real b, long n) { | ||
THCAssertSameGPU(THCTensor_(checkGPU)(state, 1, r_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then in these functions, you pass the base address of r_
to the operator which is kept as a member, and you either assert that r_
is contiguous, or you call newContiguous
to enforce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! In main Torch7 repo we explicitly required that linspace and logspace also works for non-contiguous Tensors in both code and tests. So just want to make sure, we make it behave differently in CUDA backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make it contiguous and do the freeCopyTo back to the original form (or whatever the corresponding functions are called these days, @soumith can help you, it's been a while since I've done work in cutorch). It will still operate on non-contiguous tensors this way.
I just don't think the overhead of adding a new version of the pointwise operators is justified in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Adding another version of pointwise operator is also my concern when making the change. I would like to go ahead and make the change of these functions. @soumith, may I ask for confirmation with the treatment in non-contiguous Tensors? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do what jeff suggests, i.e.: make the kernel for contiguous tensors, and then you can do a freeCopyTo to copy it back to the result tensor.
You can search the repo for freeCopyTo to see this pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
dfbcc96
to
c884204
Compare
Hi @wickedfoo @soumith , I have updated my PR to make linspace and logspace directly work on contiguous tensors and use freeCopyTo to make it behaves the same as CPU layer. Any chance I can have you guys review it? Thanks! |
ScalarConvert<long,real>::to(n - 1)); | ||
LinspaceOp<real> linspace_method(a, step); | ||
thrust::device_ptr<real> data_(THCTensor_(data)(state, r)); | ||
thrust::tabulate(data_, data_ + n, linspace_method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @soumith and @wickedfoo , just so you aware, as we are only dealing with contiguous tensor in the kernel, there seems to be a bit overkill to use THC_pointwiseApply1, a thrust::tabulate call can be more intuitive (at least from my point of view). Both commits can pass all the tests, let me know which version you would prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thrust::tabulate seems simpler. Let's go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my feeling, too. Before we actually merge, is there anything need to be edited? And should I update the PR to combine two commits together?
9cda4d1
to
56a4ebb
Compare
@soumith I realized that range is still not implemented in cutorch and it's pretty similar to linspace/logspace, I have added it as well. To make it consistent with torch7 API, I make operator LinspaceOp and LogspaceOp (although LogspaceOp doesn't need to be changed) to accept another template parameter as intermediate type for a higher accuracy. I would expect compiler can successfully optimize the case T=accT to avoid an unnecessary type conversion although I haven't checked yet. If not, we can specialize it. A question for logspace, an alternative implementation is to call linspace in the function and then call tpow on it to get logspace. It can avoid LogspaceOp which doesn't sound quite useful in most cases. However I prefer current implementation more because the alternative implementation will launch two kernel call and CUDA user guide explicitly says exp10 is preferable than pow. Let me know what you guys think. I won't add other functionality in this PR to make it clean unless there's anything other changes needed. |
This is great stuff @shenxiul , thank you! |
Any updates on this? Would be very useful thanks for all work! I get |
Adding linspace and logspace functionalities for CUDA Tensors. It wraps THC_pointwiseApply1 to make it works for operators acting on both element and index. The tests for linspace and logspace all passed and doesn't affect other tests.