8000 linspace and logspace for CUDA Tensors by shenxiul · Pull Request #729 · torch/cutorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

linspace and logspace for CUDA Tensors #729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 16, 2017
Merged

Conversation

shenxiul
Copy link

Adding linspace and logspace functionalities for CUDA Tensors. It wraps THC_pointwiseApply1 to make it works for operators acting on both element and index. The tests for linspace and logspace all passed and doesn't affect other tests.

@@ -39,6 +39,71 @@ kernelPointwiseApply1(TensorInfo<Ta, IndexType> a,
}

template <typename Op,
typename Ta,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linspace and logspace are 1d and should only really operate on contiguous 1-d tensors, so there's no need to add a version that passes in an index to the operator. This is just duplicating a bunch of code. I'd remove these and just go the simple route.

Instead, you can just add the base pointer to the operator instance itself and your operator can calculate the offset.

A multi-dimensional linear index is not that useful in general to operators, since often you would need the dimension information to make sense of it if you have more than one dimension.

#if defined(THC_REAL_IS_FLOAT) || defined(THC_REAL_IS_DOUBLE) || defined(THC_REAL_IS_HALF)

void THCTensor_(linspace)(THCState *state, THCTensor *r_, real a, real b, long n) {
THCAssertSameGPU(THCTensor_(checkGPU)(state, 1, r_));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then in these functions, you pass the base address of r_ to the operator which is kept as a member, and you either assert that r_ is contiguous, or you call newContiguous to enforce it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! In main Torch7 repo we explicitly required that linspace and logspace also works for non-contiguous Tensors in both code and tests. So just want to make sure, we make it behave differently in CUDA backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make it contiguous and do the freeCopyTo back to the original form (or whatever the corresponding functions are called these days, @soumith can help you, it's been a while since I've done work in cutorch). It will still operate on non-contiguous tensors this way.

I just don't think the overhead of adding a new version of the pointwise operators is justified in this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Adding another version of pointwise operator is also my concern when making the change. I would like to go ahead and make the change of these functions. @soumith, may I ask for confirmation with the treatment in non-contiguous Tensors? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do what jeff suggests, i.e.: make the kernel for contiguous tensors, and then you can do a freeCopyTo to copy it back to the result tensor.
You can search the repo for freeCopyTo to see this pattern

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@shenxiul shenxiul force-pushed the cuda_linspace branch 4 times, most recently from dfbcc96 to c884204 Compare March 14, 2017 19:02
@shenxiul
Copy link
Author

Hi @wickedfoo @soumith , I have updated my PR to make linspace and logspace directly work on contiguous tensors and use freeCopyTo to make it behaves the same as CPU layer. Any chance I can have you guys review it? Thanks!

ScalarConvert<long,real>::to(n - 1));
LinspaceOp<real> linspace_method(a, step);
thrust::device_ptr<real> data_(THCTensor_(data)(state, r));
thrust::tabulate(data_, data_ + n, linspace_method);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @soumith and @wickedfoo , just so you aware, as we are only dealing with contiguous tensor in the kernel, there seems to be a bit overkill to use THC_pointwiseApply1, a thrust::tabulate call can be more intuitive (at least from my point of view). Both commits can pass all the tests, let me know which version you would prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thrust::tabulate seems simpler. Let's go with that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my feeling, too. Before we actually merge, is there anything need to be edited? And should I update the PR to combine two commits together?

@shenxiul shenxiul force-pushed the cuda_linspace branch 2 times, most recently from 9cda4d1 to 56a4ebb Compare March 16, 2017 03:38
@shenxiul
Copy link
Author

@soumith I realized that range is still not implemented in cutorch and it's pretty similar to linspace/logspace, I have added it as well. To make it consistent with torch7 API, I make operator LinspaceOp and LogspaceOp (although LogspaceOp doesn't need to be changed) to accept another template parameter as intermediate type for a higher accuracy. I would expect compiler can successfully optimize the case T=accT to avoid an unnecessary type conversion although I haven't checked yet. If not, we can specialize it.

A question for logspace, an alternative implementation is to call linspace in the function and then call tpow on it to get logspace. It can avoid LogspaceOp which doesn't sound quite useful in most cases. However I prefer current implementation more because the alternative implementation will launch two kernel call and CUDA user guide explicitly says exp10 is preferable than pow. Let me know what you guys think.

I won't add other functionality in this PR to make it clean unless there's anything other changes needed.

@soumith soumith merged commit acb15e2 into torch:master Mar 16, 2017
@soumith
Copy link
Member
soumith commented Mar 16, 2017

This is great stuff @shenxiul , thank you!

@ragulpr
Copy link
ragulpr commented Mar 21, 2018

Any updates on this? Would be very useful thanks for all work!

I get RuntimeError: _linspace is not implemented for type torch.cuda.FloatTensor and I'm at 0.4.0a0+58af449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0