-
Notifications
You must be signed in to change notification settings - Fork 315
Can't run on v3-8 or v3-32 TPU nodes. #53
Comments
Same issue here. I tried TF 1.15 and TF 1.15.3 and all got the same error message. |
@hytseng0509, found a workaround for now: For the repo maintainers: Some of the README could use some updated language:
I imagine newcomers (especially students!) would appreciate spending their TPU $$$'s actually training instead of installing 10 different TF/TPU setups. |
@mbbrodie Thanks for sharing! How do you config the VM and TPU? I still got the same error message using a VM with GPUs. |
No problem, here's my basic setup:
Obviously, the boot image is meant for PyTorch; you'll probably want to find something that comes with tensorflow-gpu==1.15 installed. However, if go this route, make the following changes:
Because Tensorflow is...well, Tensorflow...your tensorflow-gpu 1.15 install will not actually use the gpu out of the box. cp /usr/local/cuda-10.1/lib64/libcudart.so libcudart.so.10.0 TPU Config
Anyway, you'll likely have more fun bugs to sort out for your particular use case. But this can at least get you started. |
@mbbrodie Thanks for the information! Do you encounter the error message that request the installation of |
You're welcome. And yes, you'll need to pip install Otherwise, make sure you put the renamed libs on your LD_LIBRARY_PATH. |
I have the same error here related to |
Hi, I trained TPU-accelerated GANs from
https://github.com/tensorflow/gan
without any issues, but can't seem to get compare_gan examples to run on GCP TPUs.Here is the general error, which appears whether using ctpu, gcloud, or the online GUI to setup compute resources.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation input_pipeline_task0/TensorSliceDataset: node input_pipeline_task0/TensorSliceDataset (defined at /usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/ops.py:1748) was explicitly assigned to /job:worker/task:0/device:CPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
Any thoughts here?
Is there a specific python/tensorflow version I should use for running compare_gan?
Thanks!
The text was updated successfully, but these errors were encountered: